FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM

Transcription

1 FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM VOLUME 1 Box-Jenkins ARIMA Modeling Inervenion Analysis Transfer Funcion Modeling Oulier Deecion and Adjusmen Exponenial Smoohing Relaed Univariae Mehods by Lon-Mu Liu Gregory B. Hudak in collaboraion wih George E. P. Box Mervin E. Muller George C. Tiao This manual is published by Scienific Compuing Associaes Corp. 913 Wes Van Buren Sree, Suie 3H Chicago, Illinois U.S.A. Copyrigh Scienific Compuing Associaes Corp.,

2 PREFACE This ediion of Forecasing and Time Series Analysis Using he SCA Saisical Sysem iniiaes he replacemen process of he documen eniled The SCA Saisical Sysem: Reference Manual for Forecasing and Time Series Analysis (May 1986). When he replacemen process is complee, he older manual will have been compleely replaced in boh scope and syle by a wo volume se. This manual is Volume I of he se. I encompasses opics relaed o he capabiliies of he UTS Module and he Exended UTS Module of he SCA Saisical Sysem. Hence he conens of his manual replace Chapers 1, 2, 3, 7, and 8 of he 1986 manual enirely. Chapers 4, 5, and 6 of he 1986 manual are sill valid unil he release of Volume II of he new se. In addiion, informaion relaed o he specral analysis capabiliies of he SCA Sysem may be found in SCA Working Paper 115. As noed above, his manual is a complee revision of pars of he 1986 manual. Chaper 4, Linear Regression Analysis, replaces Chaper 8 of he previous manual. The chaper is a modified version of he regression chaper of he documen The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Chapers 5 hrough 8 are a deailed replacemen of Chaper 3 of he 1986 documen. Informaion relaed o he modeling and forecasing of univariae ime series is divided ino chapers on Box-Jenkins ARIMA Modeling and Forecasing (Chaper 5), Inervenion Analysis (Chaper 6), Oulier Deecion and Adjusmen (Chaper 7), and Transfer Funcion Modeling (Chaper 8). Chaper 7 of his manual conains maerial no presen in he 1986 ediion. This new chaper includes much of he curren informaion of he burgeoning research and aciviies associaed wih oulier deecion, adjusmen and esimaion. Chaper 9, Forecasing Using General Exponenial Smoohing, is an upgrade of Chaper 7 of he earlier manual. Examples have been added o illusrae all suppored smoohing mehods. Almos all maerial of he above chapers is presened in a daa analysis form. Tha is, SCA capabiliies, commands, and oupu are presened wihin he conex of a daa analysis. Many conceps relaed o daa analysis are reviewed and explained. Examples have been chosen o demonsrae he use of he SCA Sysem, and o provide some insighs or guidelines for an analysis. Wihin chapers, informaion regarding specific capabiliies and feaures of he SCA Sysem are presened from hose mos frequenly used o hose ha are less commonly employed. All deailed informaion regarding he command srucure of he SCA Sysem is presened a he end of each chaper. This manual is designed o be self-conained. Chaper 1 of his documen provides complee informaion on he conens of all available SCA sofware producs and where specific informaion on various SCA Sysem capabiliies can be found. Chaper 2 provides an overview of he command language of he SCA Sysem. Chaper 3 summarizes useful ploing feaures for modeling ime series. Five appendices provide informaion on he basic

3 use of analyic saemens; daa generaion, ediing and creaion; SCA macro procedures; and seleced uiliy commands. More complee informaion on SCA commands can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. Acknowledgemens The SCA Saisical Sysem was designed and developed by Lon-Mu Liu wih he assisance of he SCA programming saff. Acknowledgemens specific o capabiliies described in oher reference manuals are found in seleced chapers. We are paricularly graeful o Chung Chen of Syracuse Universiy and George C. Tiao of The Universiy of Chicago for heir conribuions relaed o oulier deecion and adjusmen, and Ruey S. Tsay of The Universiy of Chicago for his program relaed o he EACF paragraph. Bovas Abraham of The Universiy of Waerloo and Johannes Ledoler of The Universiy of Iowa conribued grealy o he developmen of he GFORECAST paragraph and providing a review of our 1986 chaper on exponenial smoohing. Houson H. Sokes of The Universiy of Illinois has consanly rendered his programming experise in he developmen of he SCA Sysem. We are also graeful o Ki-Kan Chan and Alan Mongomery for heir programming and esing effors regarding SCA producs on various compuer plaforms. This manual was prepared by Lon-Mu Liu, The Universiy of Illinois a Chicago, and Gregory Hudak, Scienific Compuing Associaes Corporaion. We hank George E.P. Box, George C. Tiao, and Mervin E. Muller, for heir valuable commens and suggesions relaed o various aspecs of he SCA Sysem. We are indebed o he ireless effors of Ching-Te Liu in he enry and ediing of all chapers of his manual. This volume could no have been compleed wihou her complee dedicaion o his projec. Scienific Compuing Associaes Corporaion February, 1992

4 CHAPTER 1 INTRODUCTION The Forecasing and Modeling Package of he SCA Saisical Sysem is comprised of four producs. These producs are: UTS: Univariae ime series analysis and forecasing using Box-Jenkins ARIMA, inervenion and ransfer funcion models. This produc also includes forecasing capabiliies using general exponenial smoohing mehods. Exended UTS: Univariae ime series analysis and forecasing wih auomaic oulier deecion and adjusmen, as well as analysis and forecasing of ime series conaining missing daa MTS: ECON/M: Mulivariae ime series analysis and forecasing using vecor ARMA models Economeric modeling and forecasing using simulaneous ransfer funcion models. This module also provides he seasonal adjusmen procedures X-11, X-11-ARIMA, and a model-based canonical decomposiion mehod. This manual describes he capabiliies of he SCA UTS and Exended UTS producs of he SCA Sysem. Capabiliies described in his manual (and chapers conaining hem) include: Ploing daa: (Chaper 3) Linear regression analysis: (Chaper 4) Box-Jenkins ARIMA modeling: (Chaper 10) Inervenion analysis: (Chaper 6) Oulier deecion and adjusmen: (Chaper 7) Plos of one or more variables over ime, and scaer plos of wo or more variables. Muliple linear regression analysis, he effec of serial correlaion, and dynamic regression Time series analysis and forecasing of a single series using Box-Jenkins ARIMA models. Daa simulaion is also discussed. Modeling and analysis of he effecs of known exernal evens on a single ime series. Descripions of ouliers and mehods for oulier deecion and adjusmen. Also included are forecasing in he presence of ouliers and modeling a ime series ha conains missing observaions.

5 Transfer funcion modeling: (Chaper 8) General exponenial smoohing forecasing: (Chaper 9) Analyic funcions and marix operaions: (Appendix A) Daa generaion: (Appendix B) Time series daa generaion: (Appendix C) Macro procedures: (Appendix D) Uiliy informaion: (Appendix E) Modeling a response variable (series) in he presence of one or more explanaory variables and a serially correlaed disurbance erm. Also presened are special cases of ransfer funcion models; applicaions of ransfer funcion modeling for handling he effecs of rading days and moving holidays; and daa simulaion. Forecasing a nonseasonal series using single and double exponenial smoohing, or Hol's wo parameer mehod. Forecasing a seasonal series using Winers addiive or muliplicaive mehods, seasonal indicaors and harmonic funcions. Relaionships o Box-Jenkins ARIMA models are also discussed. Analyic funcions and marix operaions ha supplemen he SCA Sysem's saisical capabiliies. User specified daa generaion, ediing and oher daa manipulaion of variables ha are no necessarily ime dependen. User specified daa generaion and ediing of ime series daa. Creaion and use of sequences of SCA saemens o eiher perform SCA daa analyses or o augmen SCA capabiliies. Oupu saving and review, managemen of files, inernal workspace (memory), and oher uiliy relaed asks of an SCA session. Mos of he informaion conained in he Appendices is condensed from ha described in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies and The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Seleced informaion regarding he basic use of he SCA Sysem and daa enry can be found in Chaper 2. The informaion in Chaper 2 and he Appendices are designed o provide self-conained documenaion for he use of he SCA-UTS and Exended UTS producs. Whenever possible, maerial in his manual is presened in a daa analysis form. Tha is, SCA Sysem capabiliies, commands, and oupu are usually presened wihin he conex of a daa analysis. Examples have been chosen o boh demonsrae he use of he SCA Sysem and o provide some broad guidelines for forecasing and ime series analysis. One

6 INTRODUCTION 1.3 key reference and source of examples in his manual is he ex Time Series Analysis: Forecasing and Conrol by Box and Jenkins (1970). This ex conains many imporan conceps and properies of forecasing and ime series analysis. 1.1 Forecasing and Time Series Analysis for Business, Indusry and he Public Secor In recen years, business, indusry and he public secor have coped wih he wo-fold problem of providing qualiy goods and services while conending wih limied or shrinking resources. Saisical mehods can provide broad and effecive means o address his problem. In paricular, accurae forecass are necessary for such diverse aciviies as capial budgeing, sales forecasing, marke research, financial planning, and invenory planning and conrol. Saisical modeling and analyses are imporan for such aciviies as undersanding he srucure of a process, price analyses, and impac (or regulaory) analyses. The overall decision making process can benefi grealy from accurae forecasing and modeling ools. Processes of ineres are usually characerized by he response measured for one or more process aribues. In addiion o such responses, we may also have recorded he values or operaing condiions of possibly relaed (explanaory) variables. Saisical mehods are ofen used o consruc models ha employ some, or all, of his informaion. Box (1979a) has noed ha Models... are never rue, bu forunaely i is only necessary ha hey be useful. One key elemen in saisical model building is how o deal wih variaion. Whenever we aemp o learn abou a process, we are faced wih dealing wih he naural variaion ha is presen in i. Such variaion is confounded wih he variaion ha occurs in simply deermining (measuring) he values of all variables relaed o he model. In he case of daa ha are gahered according o some ime order, we also mus accoun for he ime relaed correlaion ha is presen in recorded values. Time series mehods have proven useful for he characerizaion and forecasing of such ime dependen processes. 1.2 Ieraive Model Building and he SCA Saisical Sysem Box has ofen noed (e.g., 1974, 1976, 1979a, 1979b, and 1983) ha saisical analyses or model building are mos effecive when an inducive-deducive approach is used. Observaion and basic knowledge leads o he posulaion of a heory or model. The heory or model is ried and he resuls are reviewed o provide insigh for he modificaion or correcion of he heory or model as necessary. The process coninues unil a saisfacory resul is obained. Wihin he model building process, his is realized as he cycle of iniial model idenificaion (or specificaion), model esimaion, and diagnosic checking. Wih he adven of high-speed compuers, model building can be auomaed by incorporaing sophisicaed rules for decision making. Box (1984) noes ha he and Gwilym Jenkins hough ha i was paricularly imporan no o ry o make he model-building process auomaic and enirely conrolled by he compuer, bu o ensure ha he human brain inervened and conrolled, paricularly a he idenificaion and he diagnosic checking/model

7 modificaion sages. Subsequen experience has (he conends) demonsraed he righness of his idea. This dynamic inducive-deducive approach o model building and analysis is grealy faciliaed by he flexibiliy in he SCA Saisical Sysem allowing is capabiliies o be blended in any logical order for such purposes. The SCA Sysem also provides imporan auomaed capabiliies for model esimaion and modificaion. 1.3 The SCA Sysem The Scienific Compuing Associaes Corporaion (SCA) provides several selfconained modules in is saisical sofware sysem. A presen, he SCA Saisical Sysem includes he SCA-UTS module for univariae ime series analysis and forecasing, he Exended UTS module for univariae ime series analysis and forecasing wih auomaic oulier deecion and adjusmen, he SCA-MTS module for mulivariae ime series analysis and forecasing, he SCA-ECON/M module for economeric modeling and forecasing, he SCA-GSA module for general saisical analysis, and he SCA-QPI module for indusrial qualiy and process improvemen. The capabiliies of oher modules are discussed in oher documens. In addiion o is own unique capabiliies, each module of he SCA Sysem also conains a complee se of SCA fundamenal capabiliies, including daa inpu and oupu, analyic funcions and marix operaions, daa manipulaion and ediing, hisograms and plos, macro procedures and oher uiliy capabiliies. Deails regarding hese capabiliies are also described in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. The modules described above are available as componens in hree saisical packages offered by SCA. These packages and heir componen modules are: General Applicaion Package: GSA Forecasing and Modeling Package: UTS, Exended UTS, MTS, ECON/M and GSA Qualiy Improvemen Package: QPI and GSA In addiion o he saisical modules described above, SCA provides sofware for employing windows and graphics, he SCA Windows/Graphics Package. This package provides an innovaive means o inegrae he compuing power of mainframe compuers and worksaions wih he user-friendly feaures and high-resoluion graphics capabiliies available on personal compuers. The SCA Windows/Graphics Package provides for: A window environmen for he SCA Sysem, Menus o access all SCA capabiliies, Convenien on-line help for SCA capabiliies, and Two-way daa ransfer beween mainframe compuers and a PC. A componen of he SCA Windows/Graphics Package is he PC produc SCAGRAF. SCAGRAF is a Microsof Windows applicaion produc providing such saisical and graphical feaures as:

8 INTRODUCTION 1.5 Single (and muliple) ime series plos and scaer plos, Box-Cox ransformaions, Time series model idenificaion ools, Forecas and oulier plos, Qualiy conrol chars, and Conour plos, Many of he figures in his documen were generaed using SCAGRAF. REFERENCES Box, G.E.P. (1974). Saisics and he Environmen. Journal of he Washingon Academy of Science, 64: Box, G.E.P. (1976). Science and Saisics. Journal of he American Saisical Associaion, 71: Box, G.E.P. (1979a). Some Problems of Saisics and Everyday Life. Journal of he American Saisical Associaion, 74: 1-4. Box, G.E.P. (1979b). Robusness in he Sraegy of Scienific Model Building. Robusness in Saisics (ed. by R.L. Launer and G.N. Wilkenson): New York: Academic Press. Box, G.E.P. (1983). An Apology for Ecumenism in Saisics. Scienific Inference, Daa Analysis, and Robusness (ed. by G.E.P. Box, Tom Leonard and Chien-Fu Wu): New York: Academic Press. Box, G.E.P. (1984). Gwilym Jenkins, Experimenal Design and Time Series. The Colleced Works of George E.P. Box (ed. by George C. Tiao). Belmon, CA: Wadsworh. Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden-Day. (Revised ediion published 1976).

9

10 CHAPTER 2 SYSTEM BASICS Every sofware sysem has is own vocabulary and language o pu user s words ino acion. This chaper provides he basics of he SCA command language and he use of he SCA Sysem. In addiion, informaion concerning he enry of daa o he SCA Sysem is also provided. More complee informaion can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. 2.1 Geing Sared The SCA Sysem is a command driven sysem. Tha is, he Sysem responds o user insrucions (commands) raher han o user chosen opions from a menu. When he SCA Sysem is used hrough he SCA Windows/Graphics Package, a Command Builder creaes necessary commands from menu selecions. In his manner, he SCA Sysem has he same command language a all compuing levels. All command lines mus be followed by a carriage reurn. For easier reading in he remainder of his manual, we shall no explicily display <cr> (carriage reurn) when presening command lines. However, all command lines of he SCA Sysem are preceded wih he symbols -- > as a means o indicae a line enered by he user. The symbols -- > hemselves should no be enered. Mainframe and worksaion compuers To access he SCA Sysem on a mainframe compuer, we ener SCA ( or sca ) If his command does no invoke he Sysem, a local compuer consulan should be conaced regarding he appropriae command. I is possible a compuing cener may have insalled he SCA Sysem under a differen command name. Personal compuers The SCA Sysem is also available for use on personal compuers having a DOS, OS/2 or Macinosh operaing sysem. Wihin he DOS or OS/2 environmen, we firs ener he subdirecory in which he SCA Sysem was insalled. The PC SCA Sysem insallaion guide advises ha he subdirecory be named SCA for DOS operaing sysems and OS2-SCA for OS/2 operaing sysem. Thus ener CD \SCA (or CD \OS2-SCA). To invoke he SCA Sysem in his direcory, ener

11 2.2 SYSTEM BASICS SCA To invoke he SCA Sysem on Macinosh, we can simply double click he SCA icon from he folder in which i is sored. The icon should be creaed when he SCA Sysem is insalled. Sysem heading and promp When he SCA Sysem is appropriaely invoked, a se of shor descripive informaion appears. For example, he heading a an IBM/CMS mainframe sie will be somehing like * * * * * COMPUTER SERIAL NUMBER ( / ) * * * * * THE SCA STATISTICAL SYSTEM ( RELEASE IV.3 ) SCA PRODUCT IDENTIFICATION: GSA, UTS, MTS, ECON/M & QPI HOST COMPUTER OPERATING SYSTEM: IBM/CMS COPYRIGHT , SCIENTIFIC COMPUTING ASSOCIATES. ALL RIGHTS RESERVED RELEASED DATE: 3/ 1/90 SIZE OF WORKSPACE IS SINGLE PRECISION WORDS DATE -- 11/30/90 TIME -- 10:10:43 -- This se of informaion includes SCA release version, produc names, hos compuer and operaing sysem, and workspace (memory) size. The heading informaion is followed by a double dash, --. The double dash is a promp issued by he SCA Sysem. This indicaes we can now ener an SCA command. When he SCA Sysem on a mainframe or worksaion compuer is invoked hrough he SCA Windows/Graphics Package (see he relaed documen SCA Windows/Graphics Package User's Guide for more informaion), he following windows appear on he PC screen.

12 SYSTEM BASICS 2.3 The heading informaion and subsequen SCA oupu are conained in he oupu window SCAOUTP.OTP. SCA commands are enered in he SCA command window, SCAHIST.CMD, or are generaed from menu selecions hrough he SCA Command Builder. The command hisory (i.e., he se of all SCA commands enered) of he SCA session is mainained in his window. Creaing a larger workspace environmen We can designae a larger workspace (memory) size for an SCA session when we invoke he SCA Sysem. This is a useful feaure when we are dealing wih larger daa ses or complex compuaions. The amoun of workspace ha can be designaed may be resriced due o local compuer insallaion consrains or an SCA Sysem consrain, depending on he subscripion level. The maximum workspace size for he SCA Sysem on personal compuers varies beween 30K and 35K words (1K words = 1000 words), while he maximum workspace for he SCA Sysem on mainframe and worksaion compuers usually does no have a specific limi. The designaion of a larger workspace varies somewha beween compuers and operaing sysems. For mos operaing sysems, invoking he SCA Sysem wih SCA n where n is an ineger, will allocae nk words of memory for he session. The insrucion is differen for IBM TSO and CMS operaing sysems where we mus use eiher SCA SIZE(n) (for an IBM TSO operaing sysem)

13 2.4 SYSTEM BASICS or SCA SIZE n (for an IBM CMS operaing sysem) If none of he above insrucions affec he workspace size, i is necessary o check wih a local compuer consulan o deermine wha o do. 2.2 General Synax of Sysem Commands Once we are in he SCA Sysem, we have begun an SCA session. All SCA commands wihin a session are he same across all compuer ypes. These commands are also called saemens. Each saemen is enered afer he -- promp. We can use blanks freely in a saemen o space words, bu blanks canno be used wihin names or numbers. Usually command lines are limied o 72 spaces and mos commands can be wrien in one line. If we need o coninue o anoher line, he curren line mus be ended wih he characer. We refer o he symbol as he coninuaion characer. I mus be he las non-blank characer of any line being coninued. I canno be used as a hyphenaion characer. Tha is, words and numbers canno be divided wih. The SCA Sysem processes a command whenever a line is enered ha does no end wih. Analyic saemens There are wo ypes of saemens ha we can use during an SCA session, analyic or English-like. Analyic saemens are used for mos vecor and marix operaions or manipulaions. These saemens have he general form v = e where e is an expression involving a combinaion of operaors and variable names (he labels used o reain daa in he SCA workspace); and v is a variable name (label) ha will be used o hold resuls. For example, LNY = LN(Y) will ake he naural logarihm of he daa currenly being held in he variable Y and sore he resul ino he variable LNY. The saemen TEMP = INV(A) # B will muliply he marix B by he inverse of he marix A (i.e., resuls ino he variable TEMP. 1 A B), hen sore he A complee lis of SCA analyic funcions and marix operaors can be found in Appendix A. Some examples are also provided. A more deailed discussion regarding analyic saemens can be found in The SCA Sysem: Reference Manual for Fundamenal Capabiliies.

14 SYSTEM BASICS 2.5 English-like saemens English-like saemens (or paragraphs) are used o accomplish mos operaions in an SCA session. These saemens consis of a paragraph name ha can be followed by one or more modifying senences. For example, PRINT VARIABLE IS GROWTH is an English-like saemen. The paragraph name is PRINT and he modifying senence is VARIABLE IS GROWTH. Here he funcion of he saemen is implici in he paragraph nam e. Informaion conained in he single modifying senence is sufficien for he execuion of he command. The firs word of a paragraph mus be a valid paragraph name. This name is hen followed by any number of modifying senences. Senences have no specific order of enry. A senence mus be ended wih a period if anoher senence is o follow. Each line wihin he paragraph, excep for he las line, mus have he coninuaion characer ( ) as is las characer. Modifying senences fall ino wo caegories: required and opional. A senence is opional if here is a defaul condiion (or value) ha can be used during he execuion of he paragraph. An opional senence is used only if we wish o change a defaul condiion. A senence is required if no defaul condiion (or value) exiss. If we omi any required senence, he Sysem will issue promps requesing he informaion omied. For example, suppose here are wo variables in he SCA workspace, TAX and INCOME, each conaining 200 values. If we ener PLOT VARIABLES ARE TAX, INCOME hen he Sysem will produce a scaer plo using all 200 daa pairs (see Chaper 3 for more informaion on scaer plos). If we ener PLOT VARIABLES ARE TAX, INCOME. SPAN IS 1,150 hen he Sysem will produce a scaer plo using only he firs 150 pairs of daa. The senences VARIABLES and SPAN mus be separaed by a period. If we only ener PLOT SPAN IS 1, 150 hen he Sysem will promp us for he variables o be used in he plo, since VARIABLES is a required senence. Mos frequenly used required senence For our convenience, he subjec and verb of he ``mos frequenly used senence'' of a paragraph can be omied provided he senence is he firs senence used afer he paragraph

15 2.6 SYSTEM BASICS name. For example, he VARIABLE senence is he mos frequenly used senence of boh he PRINT and PLOT paragraphs. If we desire, we can omi he words VARIABLES ARE in hese paragraphs. Tha is, he saemen PRINT GROWTH is equivalen o he saemen PRINT VARIABLE IS GROWTH The saemen PLOT TAX, INCOME. SPAN IS 1,150 is processed by he SCA Sysem in he same fashion as he saemen PLOT VARIABLES ARE TAX, INCOME. SPAN IS 1, 150 Noe ha if he saemen PLOT SPAN IS 1, 150. TAX, INCOME is enered, hen an error occurs. The Sysem would inerpre TAX as he firs hree leers of a se nence name and no as variable informaion. Very ofen, he mos frequenly used senence is he only senence specified in a paragraph. The porion of he mos frequenly used senence ha can be omied is highlighed in he synax descripion for every paragraph of he SCA Sysem. 2.3 An Example To illusrae he ypes of commands and using he SCA Sysem, we will examine some daa aken from he ex Saisics for Experimeners by Box, Huner and Huner (1978). The daa, sh own below, are he growh rae (in coded unis) of experimenal ras and he amoun (in grams) of a dieary subsance fed o he ras.

16 SYSTEM BASICS 2.7 Growh rae Dieary supplemen We firs wan o ransmi (or ener) daa ino he Sysem's workspace (memory). There are many ways in which daa can be enered. Complee informaion on he enry of daa ino he SCA workspace is provided in Chaper 3 of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. A summary of some frequenly used mehods for daa enry is given in Secion 10 of his Chaper. In his example we will ener boh columns of daa direcly from he erminal. To ener he growh rae daa we can ener -->INPUT GROWTH Noe ha he use of --> in his documen denoes a line we are enering (and should no be yped). We also mus press he carriage reurn key o end our enry. We have informed he Sysem ha we will be ransmiing daa o i and wan i reained in he Sysem's workspace (memory) under he label GROWTH. Any valid name (see Secion 2.4) can be used as a label for a variable. GROWTH has been chosen since his label is well suied o designae he daa. The Sysem responds wih READY FOR DATA INPUT The -- promp is no displayed because he Sysem is no expecing any sor of insrucion, jus daa. We can ener he daa on one line by enering: --> In order o ell he Sysem ha we are finished enering daa for GROWTH, we now ype -->END OF DATA The Sysem responds wih GROWTH, A 10 BY 1 VARIABLE, IS STORED IN THE WORKSPACE Now we ener he dieary supplemen daa and reain i in he workspace under he label DIET.

17 2.8 SYSTEM BASICS -->INPUT DIET READY FOR DATA INPUT --> >END OF DATA DIET, A 10 BY 1 VARIABLE, IS STORED IN THE WORKSPACE Before we coninue, we can display he daa ha has been ransmied. We do his by enering -->PRINT GROWTH, DIET GROWTH IS A 10 BY 1 VARIABLE DIET IS A 10 BY 1 VARIABLE VARIABLE GROWTH DIET COLUMN--> 1 1 ROW To ge an idea of how growh rae and dieary supplemen are relaed, we display a scaer plo (see Chaper 3) by enering -->PLOT GROWTH, DIET * * I * I I 2 I * I G I R I * O I * W * T I H I I I * DIET

18 SYSTEM BASICS 2.9 We observe ha he effec of he dieary supplemen on he growh rae increases o a peak level, hen falls off. As a resul we may wish o use regression analysis (see Chaper 4) o esimae he model. 2 Y= b0 + b1x+ b2x + error where Y is he growh rae and X is he amoun of dieary supplemen. We do no have he 2 quadraic erm, X a presen, bu we can creae i by using an analyic saemen (see 2 Appendix A). One means o creae X is o ener -->DIET2 = DIET**2 The daa generaed by his command are reained in he workspace under he label DIET2. We are now ready for a regression analysis. We can fi he model above by enering -->REGRESS GROWTH, DIET, DIET2 The oupu generaed from his command is suppressed a his ime. Oher opions are available o us wihin he REGRESS paragraph, for example diagnosic checking, reaining calculaed values and mehods of fiing (see Chaper 4 for more informaion). 2.4 Names and Abbreviaions All daa and models are sored in he SCA workspace (memory). We are required o provide names for all daa and models ha we place in he workspace. Oher names used in an SCA session (i.e., paragraph and senence names) are a par of he Sysem's command language. The names we specify for daa or models can be of any lengh, alhough only he firs eigh characers are inerpreed by he Sysem. The firs characer of a name (label) mus be a leer. The oher characers may be leers, numbers or he underscore symbol, _. Blanks canno be used as par of a name. Examples of valid names ha we may specify are: X, XDATA, X_DATA, X1, SERIES1, SERIES_1, DATASET1, XDCDDEA, S33E45, F55XX_2, INFORMATION_FOR_SERIES_1 Examples of some invalid names are: 1X X DATA X0DATA (he firs characer is no a leer) (blanks are no permied) (he special characer -, hyphen, is no permied)

19 2.10 SYSTEM BASICS Abbreviaion rules All names used in an SCA session can be abbreviaed. Names and labels ha we specify are idenified by he SCA Sysem by heir firs eigh (8) characers only. Hence he name INFORMATION_FOR_SERIES_1 is inerpreed by he SCA Sysem as INFORMAT. The remaining characers are no mainained in memory, bu may be used for readabiliy. Thus, he name INFORMATION_FOR_SERIES_2 is also inerpreed by he Sysem as INFORMAT. As a resul, if we ransmi daa sequenially using hese wo names hen all daa firs sored in he workspace under he label INFORMAT would be overwrien by he laer. All senence names are uniquely defined by heir firs hree characers. Paragraph names are likewise defined, wih a few excepions due o name mulipliciy (e.g., CORNER and CORRELATION). These names may be reduced o he firs four characers. For example, he Sysem inernally inerpres he saemen as -->PLOT VARIABLES ARE WITHHOLDING, INCOME. --> SPAN IS 1, >PLO VAR ARE WITHHOLD, INCOME. SPA IS 1, Reserved Words and Symbols Cerain words and symbols have special meaning o he SCA Sysem. They are summarized below and should only be used in heir special conex. More deails can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. (1) FOR, TO, BY and $ are used o specify an implied lis of argumens. (2) The aposrophe ( ) is used in he idenificaion of characer srings. (3) is a coninuaion symbol. I can also be used wihin macro procedures. (4) -- is inerpreed as an in-line commen when i is specified by he user. (5). specifies eiher a decimal poin or a period. (6) IS, ARE, IN, and ON are used as verbs wihin SCA senences provided hey immediaely follow a senence name. Oherwise, hey are inerpreed as variable names.

20 SYSTEM BASICS 2.11 (7) The exclamaion mark! is used o cancel a saemen when i appears as he las characer of a saemen. 2.6 Obaining On-Line Help The SCA Sysem provides ineracive on-line help on he capabiliies and synax of saemens of he Sysem. To obain help informaion, ener he saemen -->HELP More complee informaion is hen provided. To obain informaion on a specific SCA paragraph, ener -->HELP paragraph-name To erminae a help session on mainframe compuers, ener QUIT. To erminae he help session on a PC, press he ESC key. The Sysem will hen display he promp -- and he user will be a ha posiion in an SCA session where help was requesed. (If he DOS or OS/2 promp C> appears in he PC environmen, ener he command QUIT.) 2.7 Responding o Promps Whenever a required senence of a paragraph is eiher omied or incomplee, he Sysem will promp for informaion i requires. When he Sysem issues promps, i only wans a direc response o is inquiries. For example, if we ener he saemen -->PLOT raher han he saemen -->PLOT TAX, INCOME hen he Sysem will issue a promp for he variable names omied. Alhough he senence ha has been omied is VARIABLES ARE TAX, INCOME, he Sysem does no wan he enry of he ex for his full senence. In issuing a promp, he Sysem knows wha senence has been omied, and i only wans he informaion omied, i.e., TAX and INCOME. The response we need o provide is simply -->TAX, INCOME Promps will coninue unil he Sysem has all he necessary informaion i requires o proceed wih he specified paragraph. If we wish o erminae he promping session, we can do so by enering he insrucion QUIT. In addiion o erminaing he promping session, he QUIT command will also abor he execuion of he paragraph.

21 2.12 SYSTEM BASICS 2.8 Panic Buons Occasionally, we may wan o sop wha is currenly happening and ge back o he basic command level ( -- ). The following are useful panic buons: (1) CTRL-C The execuion of any paragraph can be erminaed by simulaneously holding down he CTRL and C keys (or Break key for IBM MVS and IBM CMS operaing sysems). Oupu may no sop immediaely as some oupu may already have been sen o a prin buffer. In he IBM MVS and IBM CMS environmens, be careful no o ener he Break key coninuously as hree successive enries of he Break key will erminae he SCA session. (2) QUIT The insrucion QUIT will erminae any promping session. This will also erminae he execuion of he specified command. (3)! The exclamaion mark will cancel any saemen, provided i is he las characer of he saemen. For example, suppose we ener he lines -->PLOT TEX, INCOME. --> SPAN IS 1, 30 If we realize we have misspelled TAX as TEX before we ransmi he second line, we can cancel he enire command by ending he second line wih!. 2.9 Ending an SCA Session To exi from an SCA session, ener he command -->STOP 2.10 Enering Daa There are many ways in which daa can be ransmied o he SCA Sysem. This secion presens examples of he mos common ways o ener daa. The SCA paragraph INPUT may be used o ransmi any daa o he SCA Sysem. Oher paragraphs, BINPUT and FINPUT, are also available for special ypes of daa Enering daa from he erminal We will firs demonsrae how o ener daa direcly from a erminal during an SCA session. We will use he wo daa ses presened in Secion 2.3 of his Chaper, growh rae and dieary supplemen. The daa ses are small enough ha we may consider enering he daa direcly from he keyboard. Previously, all he daa of one variable were enered, hen all he daa of he oher were enered. This is called variable by variable daa enry.

22 SYSTEM BASICS 2.13 Alernaively, we could choose o ener boh variables a he same ime by enering he firs pair of daa, hen he second, and so on. This is called case by case daa enry. Enering daa of a single variable To ener he daa for growh rae in a variable by variable fashion and sore he daa in he SCA workspace under he label GROWTH, ener -->INPUT GROWTH This is equivalen o he saemen -->INPUT VARIABLE IS GROWTH in which he complee VARIABLE senence is specified. The Sysem responds wih READY FOR DATA INPUT We now can ener daa using free forma (ha is, daa are separaed by one or more blanks). We can ener all daa on he same line, for example or --> > We can also ener one daa value per line, for example -->73 --> 78 --> 85 --> 90 --> 91 -->87 --> 86 -->91 --> 75 --> 65 or we could ener he daa on muliple lines --> > >

23 2.14 SYSTEM BASICS As soon as we are hrough enering daa, we ener -->END OF DATA (or -->END) This complees he daa enry for he variable GROWTH. The Sysem will hen respond wih he message GROWTH, A 10 BY 1 VARIABLE, IS STORED IN THE WORKSPACE Enering daa for more han one variable Insead of enering he wo daa ses in a variable by variable fashion, we could ransmi boh daa ses simulaneously (i.e., in a case by case fashion) by enering -->INPUT GROWTH, DIET Afer he Sysem promp for daa, we ener he en cases of daa using free forma. Each case mus be on a new line (record). This is, we ener --> > > > > > > > > > >END OF DATA The Sysem will hen respond wih he message GROWTH, A 10 BY 1 VARIABLE, IS STORED IN THE WORKSPACE DIET, A 10 BY 1 VARIABLE, IS STORED IN THE WORKSPACE Each case (or record, or row) is ransmied in free forma, so ha he alignmen shown above is arbirary. Each line of daa can be wrien in any convenien form Opions relaed o he INPUT paragraph When we ener daa from he erminal, he only required senence associaed wih he INPUT paragraph is he VARIABLES senence. Unless informed oherwise, he SCA Sysem assumes he daa of any variable o be in free forma, be a single column vecor, be of single precision, and have no missing values. If we need o change any of hese defaul condiions hen an appropriae modifying senence mus be added.

24 SYSTEM BASICS 2.15 Enering a marix of daa When we ransmi a marix of daa o he SCA Sysem, we need o indicae he number of columns (NCOL) in he marix. The number of rows is deermined from he number of rows of daa enered. For example, suppose he growh rae daa was acually a marix consising of wo columns of daa. The value in he firs column is he growh rae in week 1 and he value in he second column is he growh rae in week 2. To ener he GROWTH daa as a 10 x 2 marix, we may ener -->INPUT GROWTH. NCOL ARE 2. and now ener daa in a case by case fashion afer he Sysem promp, for example --> > > > > > > > > > >END OF DATA The defaul value of NCOL for each variable is 1. If NCOL is changed from 1 for any variable, hen daa mus be ransmied in a case by case fashion as above. For example, if we ener -->INPUT XVECTOR, YMATRIX. NCOL ARE 1, 3. and ener he following daa --> > > >END OF DATA Then XVECTOR will be a 3 x 1 vecor consising of he values 1, 8, and 0; and YMATRIX will be he 3 x 3 marix All values afer he = 4h column of any row are ignored by he Sysem.

25 2.16 SYSTEM BASICS Enering non-numeric daa, he PRECISION senence The SCA Sysem assumes ha all daa ransmied are single precision numeric daa. To aler his defaul, we need o employ he PRECISION senence. For example, suppose dieary daa o be ransmied consis of he ype of die he ra was fed, A, B or C (i.e., characer daa) as well as he above wo weeks worh of growh daa. We can ener he saemen -->INPUT GROWTH, DIET. NCOLS ARE 2, 1. --> PRECISIONS ARE SINGLE, CHARACTER Here wo modifying senences, NCOL and PRECISIONS, are used. NCOL specifies ha he variable GROWTH has wo columns of daa and ha DIET has one column of daa. The PRECISION senence is used o specify ha DIET consiss of characer informaion. Since he defaul condiion of he PRECISION senence was changed for one variable (DIET), we need o specify he appropriae modifier for all variables of he senence. Also noe ha since we were unable o wrie he INPUT saemen enirely on one line, we used he coninuaion symbol, Enering daa from a file In pracice, we do no always ener daa direcly from a erminal. Ofen daa exiss on an exernal fla file. A fla file is one ha can be creaed or edied by a ex edior. Fla files generally conain only one daa se, or one se of case by case daa records. When we ener daa from an exernal file, we need o include he modifying senence FILE in he INPUT paragraph o inform he SCA Sysem ha he daa exiss on a file as well as providing he file's name. If he FILE senence is omied, he Sysem will assume ha he daa will be enered direcly from he keyboard. Specificaion of he FILE senence does no affec oher defaul condiions of he INPUT paragraph (e.g., free forma, single precision, no missing daa). The line END OF DATA is no necessary in he exernal file, as he Sysem will undersand when i encouners he physical end of he file. For example, o ener he single variable GROWTH from file, we ener -->INPUT GROWTH. FILE IS file-name where file-name represens he appropriae name of he file conaining he daa. The acual name will be dependen on he convenions of he compuer environmen we are in. Noe he file name mus be enclosed wihin a pair of single quoes. Oher modifying senences, such as FORMAT, NCOL, and PRECISION can be included as in he case ha daa are ransmied from a keyboard. The FORMAT senence is one ha could be used if he daa have been wrien ono he exernal file according o a specific forma.

26 SYSTEM BASICS 2.17 File name convenions The convenion used o name files varies according o he ype of he compuer and operaing sysem. For example GROWTH.DAT is a valid file name on VAX VMS compuers, GROWTH DATA A1 is a valid file name on IBM CMS compuers, and U01234.GROWTH.DAT is a valid file name on IBM MVS compuers. The file name GROWTH.DAT is also valid on IBM PC's and compaibles operaing under DOS. On PC DOS compuers, a drive may be added o a file name (e.g., A:GROWTH.DAT). If we are on a VAX wih a VMS operaing sysem and our daa are sored in he file GROWTH.DAT, we would ener -->INPUT GROWTH. FILE IS GROWTH.DAT If we are on a PC wih GROWTH.DAT in drive A, we would ener -->INPUT GROWTH. FILE IS A:GROWTH.DAT Noe ha he file name mus be enclosed wihin he pair of single quoes ( ). In he remainder of his documen, we will employ daa se names appropriae in a VAX VMS or PC DOS seing, unless oherwise noed More examples of daa enry This secion provides more examples on daa enry using he INPUT paragraph. In addiion o he INPUT paragraph, he FINPUT and BINPUT paragraphs can be used o access daa ha are sored on exernal files conaining inernal documenaion specific for SCA usage. Informaion on SCA files and relaed paragraphs can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. In he following examples, we do no provide specific daa. described and illusraed when necessary. Insead daa are only (1) Enry of characer and numeric daa from a erminal Three variables will be enered from he erminal in a case by case fashion. The firs variable is a lis of names (las name and firs name). The second and hird are mahemaics and English scores. We need o aler boh he defauls for PRECISION and NCOL as he firs variable is characer daa and has wo columns of daa. An appropriae saemen is -->INPUT NAMES, MATH, ENGLISH. NCOLS ARE 2, 1, 1. --> PRECISIONS ARE CHARACTER, SINGLE, SINGLE (2) Enry of characer and numeric daa from a file Same daa as in (1), bu he daa is on an IBM CMS file TESTDATA DATA A1. An appropriae saemen is

27 2.18 SYSTEM BASICS -->INPUT NAMES, MATH, ENGLISH. NCOL ARE 2, 1, 1. --> FILE IS TESTDATA DATA A1. --> PRECISIONS ARE CHARACTER, SINGLE, SINGLE (3) Specifying a forma for daa Some sales daa have been downloaded from a mainframe compuer o a PC. The name of he file on PC is SALES.DAT. The daa are of one variable. There are 15 years of daa, wih each record having he sales oals (in housands of dollars) for each monh of he year. The daa have been compressed so ha a ypical record on he file looks like Tha is, he sales for January were $95,300, he sales for February $88,200, and so on. We need o include a FORMAT saemen indicaing ha every record has 12 ses of numbers, each number is in a field of 5 characers of he form xxx.x. An appropriae saemen for his daa is -->INPUT SALES. FILE IS SALES.DAT. --> FORMAT IS 12F5.1 (4) Daa having missing daa code as values We will ransmi he same daa as in (3), bu some monhs had missing sales figures. In hose cases he missing daa code ***** appears in he five characer sring for he monh. For example, suppose he hird value of he ypical record is missing. Then his record is ***** In his case he saemen given in (3) is sill appropriae for daa enry. (5) Daa having a numeric subsiue for missing values Same daa as in (4), excep hose missing enries are recorded as We can eiher use he INPUT saemen of (3) above and work wih he value -99.9, or we can redefine o an inernal missing daa code. In he laer case, we can employ he saemen -->INPUT SALES. FILE IS 'SALES.DAT'. --> FORMAT IS '12F5.1'. REDEFINE REFERENCE Box, G.E.P., Huner, W.G., and Huner, J.S. (1978). Saisics for Experimeners. New York: Wiley.

28 CHAPTER 3 PLOTTING DATA Daa displays in various forms are essenial ools in he analyses of a daa se. Ofen he bes way o comprehend daa comes from visual depicions, raher han from exensive saisical analyses. We can immediaely realize he need o accoun for rend or he seasonal behavior of ime series daa hrough a ime plo, a plo of he daa over ime. Relaionships ha may exis beween variables can be discerned hrough scaer plos, plos of one variable agains anoher. Moreover, we may be able o deermine he basic funcional form of relaionships (e.g., linear, quadraic) wih hese plos. We may discover ha i may be more appropriae saisically o analyze he daa in a meric oher han he one in which he daa are recorded. For example, a logarihmic, square roo, or oher ype of ransformaion, may be appropriae. Spurious observaions, or ypographical errors in daa enry, may be quickly spoed in a daa plo. For such reasons, i is imporan ha we should always view daa firs insead of relying on saisical summaries alone. The SCA Sysem provides a number of paragraphs useful in he display of daa. Time plos and scaer plos are discussed in his Chaper. Plos specific o experimenal design and analysis or saisical conrol are found in he SCA reference manual Qualiy and Produciviy Improvemen Using he SCA Saisical Sysem. Hisograms dispersion plos and probabiliy plos are explained in he SCA reference manual The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. 3.1 Ploing Daa Over Time Daa colleced over ime usually embody some ime dependen characerisics. The exac naure of hese characerisics are no always obvious. Some may be suspeced or assumed, such as a rend or seasonal behavior, as occur ofen in business daa. Ohers may be hidden. For example, an experimen may be conduced in which he cuing precision of a ool on meals of various alloy composiions is measured. I may be he case ha he ool is subjec o wear regardless of he meal being cu, hence i may be necessary o include ime as a facor in he analysis. In general, if daa are gahered or recorded in any sor of ime dependen order, i is a good pracice o plo he daa agains ime Plos of a single variable over ime A se of daa from he Commodiy Year Book (1986) will be used o illusrae plos over ime. The daa, lised in Table 3.1, are comprised of monhly observaions, from January 1980 hrough December 1986, of he following prices: (1) The average wholesale price of gasoline (regular grade, leaded) (2) The average price of crude peroleum a wells

29 3.2 PLOTTING DATA The daa are sored in he SCA workspace under he names PGAS and PCRUDE, respecively. A more complee descripion and analysis of hese daa can be found in Chapers 4 and 5. Table 3.1 Gasoline daa Obs. Monh Gasoline Price PGAS Crude oil Price PCRUDE Obs. Monh Gasoline Price PGAS Crude oil Price PCRUDE 1 1/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / Since hese daa are colleced on a monhly basis, we would like o indicae he end of each year of daa. We will plo he PGAS daa using he TSPLOT (Time Series PLOT) paragraph.

30 PLOTTING DATA >TSPLOT PGAS. SEASONALITY IS 12. SYMBOL IS *. TIME SERIES PLOT FOR THE VARIABLE PGAS **** + I ** I I **** I I ** I * *** + I * * ** I I ***** * * I I * ** * * I * *** + I * * * * ** I I * * * * ** I I * * ** ** * * ** I * * * ** ** + I * * * I I* * I I ** I We see he daa are ploed agains a horizonal ime axis. Marks along he axis are a muliples of 12, ha specified in he SEASONALITY senence. The use of he SYMBOLS senence is explained in deail in Secion 3.3, bu is purpose is eviden. Remark: The SEASONALITY senence is a replacemen of he senence, TIC-MARK. In he even your version of he SCA Sysem does no recognize he SEASONALITY senence, i is likely you have an older version of he Sysem. In such a case, please subsiue TIC- MARK for SEASONALITY. The display provided by he TSPLOT paragraph is dependen on he oupu widh available o he SCA Sysem. The SCA Sysem auomaically scales he plo o fi wihin he space available for display, and he TSPLOT paragraph will uniquely represen any daa poin displayed. Consequenly, if he SCA Sysem does no have enough space available o presen he complee ime plo, i will runcae he daa displayed. Since he las daa poins are ofen he mos influenial in forecasing a ime series, he SCA Sysem plos all daa i can from he end of he series forward. Any runcaion of daa occurs a he beginning of he series. The display of he above plo was generaed on a wide screen. The defaul oupu widh assumed by he SCA Sysem is 80 characers. This value is appropriae for virually all oupu devices (erminals, priners, files). This oupu widh can be alered by he PROFILE paragraph (see The SCA Saisical Sysem: Reference Manual For Fundamenal Capabiliies). We can increase he oupu widh o 132 characers (i.e., ha of large compuer paper) by enering PROFILE OWIDTH IS 132 If we are limied o 80 characers of oupu widh, he following display occurs

31 3.4 PLOTTING DATA -->TSPLOT PGAS. SEASONALITY IS 12. SYMBOL IS *. TIME SERIES PLOT FOR THE VARIABLE PGAS **** + I ** I I **** I I ** I * *** + I * * ** I I**** * * I I ** * * I * *** + I * * * ** I I * * * * ** I I * ** ** * * ** I * * * ** ** + I * * * I I * I I ** I If we are confined o a limied oupu space ye desire a plo of he complee series, here are wo hings we may do. One is o plo he series verically raher han horizonally. This may be done using he TPLOT paragraph (shown laer). The second opion is o spli he plo ino pieces using he SPAN senence. We will do his here, by displaying he firs 36 observaions hen he las 36 observaions. Since he range of values may be differen in he wo plos, we will impose a range of 450 o 700. This appears reasonable given he values of he above plo. -->TSPLOT PGAS. SPAN IS 1, 36. SEASONALITY IS > SYMBOL IS *. RANGE IS 450, 700. TIME SERIES PLOT FOR THE VARIABLE PGAS **** + I ** I I **** I I ** I * *** + I * * ** I I ***** * * I I * ** * * I I * * I I I I * I I I I* I I I

32 PLOTTING DATA >TSPLOT PGAS. SPAN IS 37,72. SYMBOL IS *. --> SEASONALITY IS 12, 37. RANGE IS 450, 700. TIME SERIES PLOT FOR THE VARIABLE PGAS I I I I I I I I I I I I * *** + I * * ** I I * * * * ** I I * ** ** * * ** I * * * ** ** + I * * * I I * I I ** I Plos of more han one variable over ime We have several opions available if we wish o display he plos of more han one variable over ime. One opion is o use he TSPLOT separaely for each variable. We can also specify more han one variable in he TSPLOT paragraph. For example, suppose boh PGAS and PCRUDE are specified in TSPLOT. We have -->TSPLOT PGAS, PCRUDE. SEASONALITY IS 12. SYMBOL IS '*'. TIME SERIES PLOT FOR THE VARIABLE PGAS I * I I * *** I I ***** I ** + I * * *** I I * ** I I ********* * I * * * * + I * * * ** ** I I * * ** ** ** I I * * * ** * * ** ** I * ***** * + I* * * I I * I I * I

33 3.6 PLOTTING DATA TIME SERIES PLOT FOR THE VARIABLE PCRUDE ***** + I *** I I **** I I * I * ** + I ****** ** I I * * I I ****************** I *** + I * ** I I *********** I I * I ** + I *** I I ** I I*** I We obain wo separae ime series plos, bu he same range of values is used as he Y axis of boh plos. The SCA Sysem auomaically deermines a range appropriae for all variables involved. We may wish o view he variables in he same display frame. This can be useful in deermining if he values assumed by one variable may be influenced by he values of anoher. Perhaps one series leads anoher in some way. For example, a low value for one series may indicae a low (or high) value of anoher series in a fuure ime period. Similarly, a urn in one series (e.g., a decreasing se of values ha change o increasing) may indicae a subsequen urn in anoher series. The MTSPLOT (Muliple Time Series PLOT) paragraph may be used o display he plos of wo or more series, or variables, over ime on he same frame. Daa are disinguished by leers. Unless we specify our own se of symbols, he symbol A is used o represen he firs variable specified, B for he second, and so on. The symbol * is used if any displayed values are coinciden. We can specify our own symbols by including he SYMBOLS senence in he paragraph. We will display he ime plos of PGAS and PCRUDE in he same frame o illusrae he use of he MTSPLOT paragraph. We will use he symbol X o represen PGAS daa and + for PCRUDE daa. As before, we will also include he SEASONALITY senence. We have increased he display widh o assure plos of he complee daa ses.

34 PLOTTING DATA >MTSPLOT PGAS, PCRUDE. SEASONALITY IS 12. SYMBOLS ARE X, +. TIME SERIES PLOT FOR VARIABLES PGAS AND PCRUDE I X +++ I I X XXX ++++ I I XXXXX + I XX I X X+++*** ++ I I * XX + I I XXXXXXXXX X I X X X X I X + X X XX XX ++ I I X X XX XX ++++**+++++ I I X + X X XX X X XX XX I X XXXXX X + IX +++ X X I I ++ X I I+++ X I The MTSPLOT paragraph can be a useful visual ool if wo variables are slighly ou of synch, or if we wish o display he acual values of a series ogeher wih forecased values (and sandard errors). For more informaion on he laer, see Chaper 5. However, i is possible ha he overlap of he wo or more plos presens a more confusing paern han we may like. Even less useful informaion may be obained when eiher he range of values of one variable dwarf hose of anoher, or if he combined ranges of all variables are exreme Verical ime plos The ime axis for all plos above has been horizonal. This can be convenien for he visual display of a relaively shor series of daa, bu i can be limiing if a daa se is lenghy. As an alernaive, we can choose o have a verical ime axis. This will permi he ime plo of a daa se of any lengh, bu he display will usually run over several pages, or screens. I is advised ha when a verical ime axis is used, he plo should be roued o a priner or o a file. Two paragraphs are provided for ploing daa over a verical ime axis, TPLOT and MTPLOT. We can plo one or more daa ses using TPLOT, and we can display muliple plos on he same ime frame using MTPLOT. MTPLOT offers more clariy han MTSPLOT in is display of muliple plos since more space is available o i. Opions for hese paragraphs are he same as for TSPLOT and MTSPLOT. TPLOT provides us wih an addiional means o display more han one series. If more han one variable is specified, hen all variables will be shown in parallel o one anoher on he display device. For example, consider a ime plo of PGAS and PCRUDE in he same TPLOT paragraph (he display has been edied for presenaion purposes).

35 3.8 PLOTTING DATA -->TPLOT PGAS, PCRUDE. SEASONALITY IS 12. SYMBOL IS X. PGAS PCRUDE I X IX I X IX I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 12 + X 12 + X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 24 + X 24 + X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 36 + X 36 + X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 48 + X 48 + X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 60 + X 60 + X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X I X 72 + X 72 + X

36 PLOTTING DATA 3.9 The advanage in his sor of display is ha concurren observaions are aligned for variables ha may be relaed, bu he individual paern of each series is sill separae from all ohers. A disadvanage is ha he widh of he display device will diminish he resoluion for each series as more series are ploed in parallel. As wih TSPLOT, we can increase he display widh hrough he PROFILE paragraph. Alernaively, we can limi he number of series ha are displayed. I is recommended ha no more han hree or four variables be displayed a one ime, depending on he widh of he display device. There is a cauion ha accompanies his recommendaion. Since he widh of any plo is a funcion of he number of plos being displayed, he widh and resoluion of he display of he ime plo of he same series will be differen if i is ploed alone, wih one oher series, or wih more series. This problem can be resolved easily. Suppose we find ha he resoluion associaed wih he parallel display of hree series is wha we wan, bu we need o plo five differen series. The easies soluion o his problem is o use TPLOT wih any hree of he series, hen use TPLOT again wih he remaining wo series and one of he firs hree ploed. By arificially padding he oal number of series, we have achieved he desired resoluion for all plos ha are displayed. 3.2 Scaer Plos To illusrae plos of one or more variables agains anoher, we will consider a daa se analyzed in Neer, Wasserman, and Kuner (1983, Chapers 8 and 11). The daa came from a sudy of he relaion of bodyfa o riceps skinfold hickness and high circumference of 20 subjecs. The daa are shown in Table 3.2 and are sored in he SCA workspace under he labels, BODYFAT, TRICEPTS, and THIGH, respecively. Subjec Table 3.2 Bodyfa sudy daa Triceps Skinfold Thickness TRICEPS Thigh Circumfrence THIGH Body Fa BODYFAT

37 3.10 PLOTTING DATA We wish o discover he relaionships, if any, ha exis beween BODYFAT and he variables TRICEPS and THIGH. One se of visual represenaions are he individual plos of he values of he BODYFAT variable wih he associaed values of boh he TRICEPS and THIGH variables. These scaer plos may be obained using he PLOT paragraph as follows. -->PLOT BODYFAT, TRICEPS I * * I I * * * I * * * I * ** B I * O I * * D I * Y F I A I * T I * * I * * TRICEPS -->PLOT BODYFAT, THIGH I * * I I * * * I * * * I * * * B I * O I * * D I * Y F I A I * T I ** I * * THIGH The PLOT paragraph provides us wih a display of symbols on an L-shaped frame. The frame is composed of a verical Y-axis for he firs variable specified, BODYFAT, and a horizonal X-axis for he second variable specified, TRICEPS or THIGH. The symbol * is used o indicae a daa poin; ha is, one of he (x,y) pairs displayed. The SCA Sysem auomaically chooses suiable inervals for he values of he axes based on he range of values assumed by he X and Y variables and he amoun of space available for he display. In he plos above, he range for he Y-axis is he same for boh

38 PLOTTING DATA 3.11 plos, since he same variable is used; bu he ranges for he X-axes are differen. The values of TRICEPS range beween and 33.50, and hose of THIGH range beween and We observe wha appears o be a linear relaionship beween BODYFAT and TRICEPS as well as beween BODYFAT and THIGH. For illusraive purposes, we can re-scale he plos so ha he ranges for he axes are he same in boh plos. We can see from he plos, and from Table 3.2, ha he larges value of BODYFAT, he Y variable, is under 30, and he larges value of eiher TRICEPS or THIGH, he X variables, is less han 60. We can consruc plos in which 0.0 is used as he lower end-poin of boh axes and 30.0 or 60.0 is used as he upper end-poin of he Y or X axis, respecively. We can accomplish his by including he RANGE senence as follows: -->PLOT BODYFAT, TRICEPS. RANGE IS Y(0.0,30.0), X(0.0,60.0) I ** I *** I * I **2* * * I * * B I O I * D I * *2 Y F I A T I I I TRICEPS -->PLOT BODYFAT, THIGH. RANGE IS Y(0.0,30.0), X(0.0,60.0) I ** I * ** I * I 2* ** I * * B I O I * D I 2* * Y F I A I T I I THIGH

39 3.12 PLOTTING DATA Now we can observe he daa on he same scales for all variable involved. In he above wo plos he symbol 2 appears several imes. The symbol 2 indicaes here are wo daa poins so close ogeher ha hey canno be shown uniquely. The reason for his is immediae. Since we have imposed an arbirary scale for he X-axis, he resulan daa poins are bunched ogeher a lile more han before. As a resul, all daa pairs canno be displayed disincly. The same inference can be made for he symbols 3, 4,..., 9 should any appear. A hrough Z represen 10 hrough 35 daa poins, and # is used for 36 or more. Oher agging of poins is possible (see Secion 3.3.3). In he plos above, we have ploed exacly one Y variable agains one X variable in he same frame. If we wished o display oher scaer plos, we mus use separae frames. However, we can display muliple plos on he same frame hrough he MPLOT paragraph. To display he scaer plos of BODYFAT agains TRICEPS and BODYFAT agains THIGH on he same frame, we can ener he following. -->MPLOT Y-VARIABLES ARE BODYFAT, BODYFAT. --> X-VARIABLES ARE THIGH, TRICEPS. --> SYMBOLS ARE T, R I RR T T I I R RR T T T I R T R R T T I R 2 2 T B I R T O I R R T T D I R T Y F I A I R T T IR R TT I 2 T T TRICEPS Noe he values of he axes have been deermined auomaically by he SCA Sysem. In addiion, we have disinguished he wo scaer plos by using he symbol T for he daa poins of he firs plo (X-variable is THIGH and Y-variable is BODYFAT), and `R' for he second plo (TRICEPS and BODYFAT). I may appear redundan ha we specified he Y-VARIABLES above as BODYFAT and BODYFAT, bu i was necessary. The MPLOT paragraph does no place any limiaion on he X or Y variables ha can appear on he same frame. For example, we can display he scaer plos of wo disinc Y variables agains wo disinc X variables on he same frame. For he purpose of illusraion, we will display he scaer plos of BODYFAT agains TRICEPS and TRICEPS agains THIGH on he same frame. Here TRICEPS is used as boh an X and a Y variable. he symbols B and T will be used o disinguish he Y variable.

40 PLOTTING DATA 3.13 We will also force he ranges for he X-axis and Y-axis o be 0.0 o 60.0 and 0.0 o 40.0, respecively. -->MPLOT Y-VARIABLES ARE TRICEPS, BODYFAT. --> X-VARIABLES ARE THIGH, TRICEPS. --> SYMBOLS ARE T, B. RANGES ARE Y(0.0, 40.0), X(0.0, 60.0) I I T I T2 2T I BB T T B2B TT 2 I BBBB TT B I 2 B T2 O I B B T D I B T Y B B2 F I A I T I I TRICEPS The SCA Sysem will use he names of he las X and Y variables specified for axes labels. 3.3 Alering Basic Displays The ploing paragraphs of he SCA Sysem are designed so ha we only need o specify he names of he variables involved in order o generae a plo. While he defaul opions aken by a paragraph are sufficien in mos siuaions, oher feaures are available for specific needs. This secion explains and illusraes many of hese feaures Symbols for plos over ime The SCA Sysem displays a symbol o represen a daa poin. In he case of a ime plo, a daa poin is he value of a series a a ime index. Symbols are no conneced o ohers in any way. Specific symbols used are dependen upon he paragraph or hose defined by he user. TSPLOT and TPLOT paragraphs The defaul se of symbols used for daa in he TSPLOT paragraph is 1, 2,..., 9, 0. This se is repeaed as needed. The defaul symbol o designae a daa poin in he TPLOT paragraph is X. If we desire, we can provide an alernaive se of symbols. Symbols we provide for ime plos are usually for he purpose of highlighing he periodic

41 3.14 PLOTTING DATA occurrences of daa. As a resul, we only provide a sequence of symbols for he number of poins ha comprise a period. The symbol se is hen repeaed over and over unil he daa se o be ploed is exhaused. For example, if he daa in a series represen daily observaions recorded on a weekly basis, hen we may specify seven disinc symbols. As a consequence, when he plos are displayed all Mondays will have he same symbol, all Tuesdays will have he same symbol, and so on. Symbols are limied o 0 o 9 and A o Z, hence a maximum period of 36. For our convenience a defaul se of symbols is generaed auomaically in he TSPLOT paragraph ha corresponds o he value specified in he SEASONALITY senence. The defaul symbol se generaed is he firs i symbols from 1, 2,..., 9, 0, A, B,..., Z where i is he value in SEASONALITY IS i. Hence he defaul se generaed for he examples of TSPLOT presened in Secion 3.1 should be 1, 2,..., 9, 0, A, B. This sequence of symbols would be repeaed in he display. However, his defaul symbol se as overridden by our inclusion of he senence SYMBOL IS * The plo of PGAS over ime is now shown wih he Sysem generaed defaul se of symbols. -->TSPLOT PGAS. SEASONALITY IS 12. TIME SERIES PLOT FOR THE VARIABLE PGAS I 78 I I 90AB I I 12 I I 1 3 0A I I B B I I 4 0A 4 6 I I A I I 3 5 B 5 67 I I AB I A 90 + I 9 B 4 I I1 1 I I 23 I

42 PLOTTING DATA 3.15 MTSPLOT and MTPLOT paragraphs When muliple ime plos are displayed on he same frame, he symbol A is used for all daa poins from he firs series, B for he second series, and so on unless we oherwise specify. When a symbol se is specified, he symbols replace A, B, and so on; bu canno be used o indicae observaions of he same period (e.g., day or monh) as in he TSPLOT and TPLOT paragraphs Tic marks, seasonaliy Tic marks appear along he ime axis a specific muliples. The defaul muliple for he TSPLOT and MTSPLOT paragraphs is 10; ha is a 10, 20, 30,.... The defaul muliple for he TPLOT and MTPLOT paragraphs is 5. I is also assumed ha he index for he firs observaion of a series is 1. However, we may wish o specify a differen muliple for he ic mark, as well as a beginning index value. The former is useful when ploing periodic daa such as hourly (24), weekly (7), or monhly (12) observaions (as we did in Secion and above). The SEASONALITY senence provides a new muliple for he ic marks. The laer specificaion is useful in hose cases when he daa se being ploed does no begin a he sar of a period. For example, if a series is of monhly observaions, we may wan ic marks every December. If he daa acually begins in March, hen we wan o associae he firs observaion wih he number 3. In such a case he iniial index for he daa o be ploed may be specified as a second value in he SEASONALITY senence. For example, SEASONALITY IS 12, 3. indicaes a periodiciy of 12, bu he firs daa poin is he 3rd observaion in a period (e.g., March). If he SPAN senence is used in conjuncion wih he SEASONALITY senence, he Sysem will deermine ic-marks and symbols as if he enire daa se is o be ploed, bu only display he plo of he specified span. This was eviden in he TSPLOT of PGAS on page 3.6. For example, if we had enered -->TSPLOT PGAS. SEASONALITY IS 12. SPAN IS 39, 65. hen he plo displayed would have ic-marks a 48 and 60, and he symbol for he firs observaion ploed would be 3. Remark: The SEASONALITY senence is a replacemen of he older senence, TIC- MARK. In he even your version of he SCA Sysem does no recognize he SEASONALITY senence, i is likely you have an older version of he Sysem. In such a case, please subsiue TIC-MARK for SEASONALITY.

43 3.16 PLOTTING DATA Symbols for scaer plos As noed previously, he SCA Sysem displays a symbol o represen a daa poin. For a scaer plo, daa poin is a specific realizaion of a coordinae pair of values. Symbols are no conneced o ohers in any way. Specific symbols used are dependen upon he paragraph or hose defined by he user. PLOT paragraph When a single pair of variables is ploed in a frame, he defaul symbol displayed a any coordinae is *. If wo or more daa poins are required o be displayed a he coordinae, he following symbol is used: 2, 3,..., 9 occurrences : 2, 3,..., 9, respecively; 10,..., 35 occurrences : A,..., Z, respecively; 36 or more occurrences : # In lieu of he symbol *, we can define a variable of of symbolic ags ha are o be used in he display for each daa pair. This agging informaion can be useful o keep rack of occurrences ha share some common rai. For example, in our plos of BODYFAT agains TRICEPS and THIGHS, we may wish o disinguish individuals based on age (under 20, over 20) or race. We may also wish o ag daa recorded according o, or oherwise follow, a periodic paern. The number of symbols conained in he agging variable mus be he same as he number of daa poins displayed. The coordinae pair is represened by he firs symbol of he agging variable, he second pair by he second symbol, and so on. The disinc ags ha are available are he symbol *, he values 2-9, and he leers A-Z. The SCA Sysem makes he following associaion beween he value in he agging variable and he symbol ha is displayed: If he value of agging variable is he symbol displayed is 1 * 2, 3,..., 9 2, 3,..., 9 10, 11,..., 35 A, B,..., Z Values may be repeaed wihin he agging variable. This variable mus be creaed ouside of he PLOT paragraph, eiher by using he INPUT paragraph (see Chaper 2) or by he GENERATE or oher daa ediing paragraphs (see Appendix B). To illusrae he creaion and use of ags, he scaer plo of BODYFAT agains TRICEPS of Secion 3.2 will be displayed. The symbol A will be used o represen he firs 10 cases, and he symbol B will be used o represen he las 10 cases. Firs, we will generae a variable of ags, TAGS, using he GENERATE paragraph. The number 10

44 PLOTTING DATA 3.17 (associaed wih A ) is assigned o he firs 10 values and 11 (associaed wih B ) is assigned o he nex 10 values. -->GENERATE TAGS. NROWS ARE 20. VALUES ARE 10 FOR 10, 11 FOR 10. THE SINGLE PRECISION VARIABLE TAGS IS GENERATED We now use he TAGSET senence wihin he PLOT paragraph. -->PLOT BODYFAT, TRICEPS. TAGSET IS TAGS I B A I I A B B I B A B I A BA B I A O I A A D I B Y F I A I B T I B A I B A TRICEPS The ags show ha he levels of bodyfa and riceps do no seem o be affeced by he order in which measuremens were aken (or recorded). MPLOT paragraph When muliple pairs of variables are displayed on he same frame, he symbol A represens he coordinae of a value from he firs pair of variables, B represens he coordinae of a value from he second pair of variables, and so on. The symbol * is used o represen any overlapped daa poins. No disincions are made regarding which daa poins overlap. For example, he * symbol will be displayed if wo coordinaes of values from he firs pair of variables are he same, if wo coordinaes of values from he second pair of variables are he same, or if he coordinae of a value from he firs pair of variables is he same as he coordinae of a value from he second pair of variables. Hence we may need o employ some cauion in inerpreing he * symbol should i appear. We can designae a specific symbol for each pair of variables, as we did in he MPLOT examples of Secion 3.2. The SYMBOLS senence is used for his purpose.

45 3.18 PLOTTING DATA Scaer plo displays Scaer plos are displayed wih a horizonal X-axis and verical Y-axis. The name of he variable of each axis is also displayed. In he case of muliple plos on he same frame, he names of he las X and Y variables are displayed. Display layous Three ypes of display layous are available. The ype of layou may be changed by using he LAYOUT senence. Available layous (and associaed keywords) are: L-shape (L) Box-ype (BOX) Grid-ype (GRID) -- Axes form o resemble he leer L. This is he defaul. -- L above is compleed o resemble a recangle. -- Cross hach markings are included in a box-ype layou. Markings occur a ic-marks. Tiles for plos A ile can be included wih any plo. The TITLE senence is included in he paragraph wih he desired ile. The ile may be 72 characers or less and mus be enclosed in a pair of aposrophes ( ), To illusrae a box-ype and grid-ype layou, and he use of iles, he scaer plo BODYFAT agains TRICEPS will be shown in boh forms. -->PLOT BODYFAT, TRICEPS. LAYOUT IS BOX. TITLE IS --> SCATTER PLOT OF BODYFAT VS TRICEPS WITH A BOX-TYPE LAYOUT. SCATTER PLOT OF BODYFAT VS TRICEPS WITH A BOX-TYPE LAYOUT I * * I I I I * * * I I * I * * + I * ** I B I * I O I * * I D I * I Y F I I A I * I T I * * I I * * I TRICEPS

46 PLOTTING DATA >PLOT BODYFAT, TRICEPS. LAYOUT IS GRID. TITLE IS --> SCATTER PLOT OF BODYFAT VS TRICEPS WITH A GRID-TYPE LAYOUT. SCATTER PLOT OF BODYFAT VS TRICEPS WITH A GRID-TYPE LAYOUT I I I + I I I I * * I I I I I I I I I *I * * I I I I I * I I I-*-----*-I I I * I ** I I B I I I I * I O I I I * I * I D I I * I I I Y I I I F I I I I I A I I * I I I T I * I* I I I I * * I I I I I I TRICEPS

47 3.20 PLOTTING DATA SUMMARY OF THE SCA PARAGRAPHS IN CHAPTER 3 This secion provides a summary of hose SCA paragraphs employed in his chaper. The synax for each paragraph is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of a paragraph, while he full display presens all possible modifying senences of a paragraph. In addiion, special remarks relaed o a paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are TSPLOT, MTSPLOT, TPLOT, MTPLOT, PLOT, and MPLOT. Legend (see Chaper 2 for furher explanaion) v i r w c : variable name : ineger : real value : keyword : characer daa (mus be enclosed wihin single aposrophes) TSPLOT, TPLOT Paragraphs The TSPLOT paragraph is used o specify he horizonal ime plo of one or more series in separae frames. The TPLOT paragraph is used o display he verical ime plo of one or more series in separae, parallel frames on he display device.

48 PLOTTING DATA 3.21 Synax of he TSPLOT or TPLOT Paragraph Brief synax TSPLOT VARIABLES ARE v1, v2, ---. or TPLOT VARIABLES ARE v1, v2, ---. Full synax TSPLOT VARIABLES ARE v1, v2, ---. (or TPLOT) SEASONALITY IS i1, i2. SPAN IS i1, i2. TITLE IS c. SYMBOLS ARE c1, c2, ---. RANGE IS r1, r2. Required senence: VARIABLE(S) Senences Used in he TSPLOT or TPLOT Paragraph VARIABLES senence The VARIABLES senence is used o specify he names of he series o be ploed. SEASONALITY senence The SEASONALITY senence is used o specify he muliple (i1) a which a ic-mark is prined along he ime axis and he value of he index (i2) of he firs observaion. The defaul value of i1 is 10 and of i2 is 1 (or he lower limi of he SPAN senence if his senence is specified). Specificaion of a seasonaliy will also generae a defaul se of symbols (unless overwrien by he SYMBOLS senence). See Secion 3.3 for a furher explanaion. Noe SEASONALITY replaces he senence TIC-MARK of older versions of he SCA Sysem. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which values will be ploed. The defaul is ha all observaions in he series will be used. TITLE senence The TITLE senence is used o specify he ile for he plo(s). The specified ile mus be enclosed in a pair of aposrophes and have no more han 72 characers. The defaul is ha no ile will be displayed.

49 3.22 PLOTTING DATA SYMBOLS senence The SYMBOLS senence is used o specify a sequence of symbols repeaed in he plo. The defaul symbols used are he firs i characers of he se 1, 2,... 9, 0, A, B,..., Z where i is he disance beween axis ic-marks. The value of i corresponds o he SEASONALITY specified (defaul is i=10). Specificaion of he SYMBOLS senence overrides his defaul se of symbols. RANGES senence The RANGES senence is used o specify he upper and lower limis for he series o be ploed. The defaul are limis deermined auomaically by he SCA Sysem. MTSPLOT, MTPLOT Paragraphs The MTSPLOT paragraph is used o display he ime plo of more han one series on he same horizonal frame. The MTPLOT paragraph is used o display he ime plo of more han one series on he same verical ime frame. Synax for he MTSPLOT or MTPLOT Paragraph Brief synax MTSPLOT VARIABLES ARE v1, v2, ---. or MTPLOT VARIABLES ARE v1, v2, ---. Full synax MTSPLOT VARIABLES ARE v1, v2, ---. (or MTPLOT) SEASONALITY IS i1, i2. SPAN IS i1, i2. TITLE IS 'c'. SYMBOLS ARE 'c1', 'c2', ---. SPAN IS i1, i2. RANGE IS r1, r2. Required senence: VARIABLES

50 PLOTTING DATA 3.23 Senences Used in he MTSPLOT or MTPLOT Paragraph VARIABLES senence The VARIABLES senence is used o specify he names of he series o be ploed. SEASONALITY senence The SEASONALITY senence is used o specify he muliple (i1) a which a ic-mark is prined along he ime axis and he value of he index (i2) of he firs observaion. The defaul value of i1 is 10 and of i2 is 1 (or he lower limi of he SPAN senence if his senence is specified). See Secion 3.3 for a furher explanaion. Noe SEASONALITY replaces he senence TIC-MARK of older versions of he SCA Sysem. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which values will be ploed. The defaul is ha all observaions in he series will be used. TITLE senence The TITLE senence is used o specify he ile for he plo(s). The specified ile mus be enclosed in a pair of aposrophes and have no more han 72 characers. The defaul is ha no ile will be displayed. SYMBOLS senence The SYMBOLS senence is used o specify he SYMBOLS for disinguishing differen series. If his senence is omied, `A' represens he firs series, `B' he second, ec. RANGES senence The RANGES senence is used o specify he upper and lower limis for he series o be ploed. The defaul are limis deermined auomaically by he SCA Sysem.

51 3.24 PLOTTING DATA PLOT Paragraph The PLOT paragraph is used o consruc and display he scaer plo of a single pair of variables or he plos of muliple pairs of variables on separae frames, each frame having he same X and Y scaling. Synax for he PLOT Paragraph Brief synax PLOT VARIABLES ARE v1, v2 Full synax PLOT VARIABLES ARE v1, v2. X-VARIABLES ARE v1, v2, ---. Y-VARIABLES ARE v1, v2, ---. TITLE IS c. SPAN IS i1, i2. TAGSETS ARE v1, v2, ---. RANGES ARE X(r1,r2), Y(r3,r4) LAYOUT IS w. SIZE IS X(i1), Y(i2). TIC-MARK IS X(i1), Y(i2). GRID IS X(i1), Y(I2). Required senences: VARIABLES, or X-VARIABLES and Y-VARIABLES Senences Used in he PLOT Paragraph VARIABLES senence The VARIABLES senence is used o specify he names (labels) of he Y (verical) variable, v1, and X (horizonal) variable, v2. Noe ha when his senence is used, he X- VARIABLE and Y-VARIABLE senences are ignored. I is invalid o specify more han one pair of variable names in his senence. X-VARIABLE senence The X-VARIABLE senence is used o specify he names of he variables o be ploed along he horizonal axis. The number of variables specified in his senence mus be he same as ha in he Y-VARIABLE senence.

52 PLOTTING DATA 3.25 Y-VARIABLE senence The Y-VARIABLE senence is used o specify he names of he variables o be ploed along he verical axis. The number of variables specified in his senence mus be he same as ha in he X-VARIABLE senence. TITLE senence The TITLE senence is used o specify he ile for he plo(s). The specified ile mus be enclosed in a pair of aposrophes and have no more han 72 characers. The defaul is ha no ile will be displayed. SPAN senence The SPAN senence is used o specify he span of indices, from i1 o i2, for which he values of he co-ordinaes will be ploed. The defaul is o plo all cases. TAGSETS senence The TAGSETS senence is used o specify he name(s) of variable(s) conaining he ags o be used in ploing daa. The defaul is none. See Secion for he way he values of he TAGSET variable(s) are convered o symbols. If he TAGSET senence is used, one variable mus be specified for each Y-VARIABLE specified. RANGES senence The RANGES senence is used o specify he upper and lower limis for he X and Y variable values o be ploed. The defaul are limis deermined auomaically by he SCA Sysem. LAYOUT senence The LAYOUT senence is used o specify he layou ype for he axes of he plo. The valid keywords are L for L-shape layou, BOX for box-ype layou, and GRID for gridype layou. The defaul layou is L-shape. SIZE senence The SIZE senence is used o specify he number of characer unis for he widh of he X- axis and Y-axis. The defaul is 50 characers for he X-axis and 30 characers for he Y- axis. TIC-MARK senence The TIC-MARK senence is used o specify he inervals (in number of characer unis) for he prining of ic-marks on he X and Y axes. The defaul is 10 unis for he X-axis and 5 unis for he Y-axis. GRID senence The GRID senence is used o specify he number of ic-marks on each axis wihin a grid for hach markings. This senence can be specified only if he plo layou is GRID. The defaul is 1 for boh X and Y.

53 3.26 PLOTTING DATA MPLOT Paragraph The MPLOT paragraph is used o display he scaer plo(s) as one or more pair(s) of variables on he same frame. Synax for he MPLOT Paragraph Brief synax MPLOT X-VARIABLES ARE v1, v2, ---. Y-VARIABLES ARE v1, v2, ---. Full synax MPLOT X-VARIABLES ARE v1, v2, ---. Y-VARIABLES ARE v1, v2, ---. TITLE IS c. SPAN IS i1, i2. RANGES ARE X(r1,r2), Y(r3,r4). SYMBOLS ARE c1, c2, ---. LAYOUT IS w. SIZE IS X(i1), Y(i2). TIC-MARK IS X(i1), Y(i2). GRID IS X(i1), Y(i2). Required senences: X-VARIABLES and Y-VARIABLES Senences Used in he M PLOT Paragraph X-VARIABLE senence The X-VARIABLE senence is used o specify he names of he variables o be ploed along he horizonal axis. The number of variables specified in his senence mus be he same as ha in he Y-VARIABLE senence. Y-VARIABLE senence The Y-VARIABLE senence is used o specify he names of he variables o be ploed along he verical axis. The number of variables specified in his senence mus be he same as ha in he X-VARIABLE senence. TITLE senence The TITLE senence is used o specify he ile for he plo(s). The specified ile mus be enclosed in a pair of aposrophes and have no more han 72 characers. The defaul is ha no ile will be displayed.

54 PLOTTING DATA 3.27 SPAN senence The SPAN senence is used o specify he span of indices, from i1 o i2, for which he values of he co-ordinaes will be ploed. The defaul is all cases. RANGES senence The RANGES senence is used o specify he upper and lower limis for he X and Y variable values o be ploed. The defaul is all he values. SYMBOLS senence The SYMBOLS senence is used o specify he SYMBOLS ha will represen coordinaes of differen pairs of variables. If no se of symbols is specified, A represens co-ordinaes of he firs pair of variables, and B represens co-ordinaes of he second pair, ec. LAYOUT senence The LAYOUT senence is used o specify he layou ype for he axes of he plo. The valid keywords are L for L-shape layou, BOX for box-ype layou, and GRID for gridype layou. The defaul layou is L-shape. SIZE senence The SIZE senence is used o specify he number of characer unis for he widh of he X- axis and Y-axis. The defaul is 50 characers for he X-axis and 30 characers for he Y- axis. TIC-MARK senence The TIC-MARK senence is used o specify he inervals (in number of characer unis) for he prining of ic-marks on he X and Y axes. The defaul is 10 unis for he X-axis and 5 unis for he Y-axis. GRID senence The GRID senence is used o specify he number of ic-marks on each axis wihin a grid for hach markings. This senence can be specified only if he plo layou is GRID. The defaul is 1 for boh X and Y. REFERENCES Commodiy Year Book (1986). New York: Commodiy Research Bureau. Neer, J., Wasserman, W., and Kuner, M.H. (1983). Applied Linear Regression Models. Homewood, IL: Richard D. Irwin, Inc.

55

56 CHAPTER 4 LINEAR REGRESSION ANALYSIS Regression analysis is a saisical mehod used in modeling relaionships ha may exis beween variables. In a regression analysis, we relae he response of a dependen variable o he values of poenial explanaory variables. We have grea flexibiliy in he choice of such explanaory variables. We may use variables whose values are recorded concurrenly wih he dependen variable, as well as variables provided from oher sources (e.g., governmen saisics, sock prices, ineres rae daa, ec.). Regression models can also be used o incorporae such ime eniies as rends and seasonal indicaors ino a model, bu i is more appropriae o use ime series models in such cases. Once a model is esablished, i may be used o make inferences abou he formulaed relaionships, or o make predicions for fuure responses when he explanaory variables are a designaed levels. Regression mehods provide us wih modeling ools ha: (1) are easily undersandable and presenable; (2) are flexible enough o include various ypes of informaion; and (3) produce resuls (e.g., esimaes, forecass) ha are quanified. The laer is imporan as i permis us o saisically assess he validiy of he model and/or is prediced values, as well as he relaive imporance of componens of he model. As a resul, regression models are popular ools for analysis and forecasing. Tradiional uses of regression have a number of drawbacks. One problem is he blind incorporaion of a flood of explanaory variables in a model. The inclusion of oo many variables wihin a model can obscure he informaion ha may be obained from a more meaningful subse. The explanaory variables may be highly correlaed, which may cause problems in he esimaion of model parameers. However, he mos serious problem in he use of regression models occurs wih ime dependen daa (i.e., daa colleced over ime). Serial correlaion in he error componen of a regression model can resul in a model ha is ineffecual (Granger and Newbold, 1974) or, more likely, incorrec (Box and Newbold, 1971). A brief overview of he linear regression model and he regression analysis capabiliies of he SCA Sysem is presened in his chaper. A more deailed presenaion of opics relaed o he SCA implemenaion of he linear regression model (including compuaional mehods used) may be found in Chaper 9 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. More informaion on he properies of linear models and regression analysis can be found in such exs as Draper and Smih (1981), Neer, Wasserman, and Kuner (1983), Daniel and Wood (1980), Graybill (1961), and Seber (1977).

57 4.2 LINEAR REGRESSION ANALYSIS 4.1 A Brief Overview of Linear Regression Analysis The linear regression model is par of a more general class of linear models. Properies of linear models and regression analysis have been considered by many auhors including Draper and Smih (1981), Seber (1977), Neer and Wasserman (1974), Neer, Wasserman, and Kuner (1983), Searle (1971), Daniel and Wood (1980), Graybill (1961), Rao (1973) and references conained herein. This secion briefly reviews he linear regression model. Informaion regarding various diagnosic checks for a fied regression model is found in Secion The simples ype of relaionships beween variables occurs when he responses for he dependen variable appear o nearly follow a sraigh line when ploed agains he values of a single explanaory variable. In such a relaionship, he prediced value of he dependen variable, Ŷ, can be obained from he linear equaion Ŷ = a+ b X (4.1) where X is an explanaory variable and a and b are esimaed values. We can exend his linear relaion o include more han one explanaory variables wih he equaion Ŷ = b0 + b 1 X1+ b 2 X2 + bmx (4.2) m The general form of he linear regression model can be wrien as where Yj =β 0 +β 1X1j+β 2X2j+ +β mx mj+ε j, j = 1,2,...,n; (4.3) Y j is he h j observaion (rial, case) of a response, or dependen, variable; X ij is he h j h observaion of he i explanaory, or independen, variable (i.e., a variable whose values are known); β, β, β,..., β m are parameers o be esimaed, and ε j is an error erm. The error erms are assumed o be uncorrelaed random variables wih mean zero and 2 unknown variance, σ. The esimaes for parameers in he above equaion, βˆ 0, β ˆ 1,...,β ˆ m, are chosen o minimize he sum of he squared errors, i.e., n SSE = (Y Y ˆ ) j= 1 j j 2

58 LINEAR REGRESSION ANALYSIS 4.3 where Ŷj =β ˆ 0 +β ˆ 1X ˆ ˆ 1j+β 2X2j+ βmxmj (4.4) The esimaes obained in he above manner are referred o as he leas squares esimaes of he regression model. When we use a regression model wih ime dependen daa, he index will be used in lieu of he index j. In his way, we more explicily emphasize he presence of ime, or any ime dependen relaionships, in he model. We may observe ha equaions (4.2) and (4.4) are he same (wih he index j omied). 2 A usual assumpion is ha he error erms follow a normal disribuion (i.e., N(0, σ )). In such a case, he leas squares esimaes for he parameers are also he maximum likelihood esimaes. Noe ha in his chaper we use p o indicae he number of parameers o be esimaed. We observe ha p=m+1 if a consan is included in he model (i.e., β 0 is included in he model) and p=m oherwise. 4.2 A Regression Example The specificaion and esimaion of a linear regression model is easily accomplished using he REGRESS paragraph. To illusrae he use of regression analysis, we will analyze a se of daa peraining o beer disribuion (Mongomery, 1991, page 501). In an effor o analyze he delivery sysem of a beer disribuor, in paricular, he ime required o service a reail oule, he following daa and facors are sudied: (1) The delivery ime (in minues) o service an oule, (2) The number of cases of beer delivered o he oule, and (3) The maximum disance he delivery man mus ravel. The daa are shown in Table 4.1 and are sored in he SCA workspace under he labels DELIVERY, CASES, and DISTANCE, respecively. Observaion Number Table 4.1 Beer delivery ime daa Number of Cases CASES Disance DISTANCE Delivery Time (minues) DELIVERY

59 4.4 LINEAR REGRESSION ANALYSIS We firs plo DELIVERY agains boh CASES and DISTANCE o check if here are any obvious relaionships or unusual occurrences in he daa. -->PLOT DELIVERY, CASES * I I * I * D I E L I I I V I * * E I * * R * Y I* I * I * * I * * * CASES -->PLOT DELIVERY, DISTANCE * I I * I * D I E L I I I V I * * E I* * R * Y I * I * I ** I * * * DISTANCE In he scaer plo beween DELIVERY and CASES, we observe a srong linear relaionship beween he number of cases delivered and delivery ime. However, here appears o be an aberraion from lineariy for he delivery ime when 25 cases are delivered. This corresponds o observaion number 5. No clear paerns are seen in he scaer plo beween DELIVERY and DISTANCE. We now will regress DELIVERY on CASES and DISTANCE. Tha is, we will use he REGRESS paragraph o obain he fied equaion (omiing he ha )

60 LINEAR REGRESSION ANALYSIS 4.5 DELIVERY = b0 + b 1 CASES + b 2 DISTANCE To obain his fi, we specify he dependen and explanaory variables as REGRESS DELIVERY, CASES, DISTANCE The acual REGRESS command is shown below ogeher wih oher modifying (or opional) senences ha will be explained laer. The coninuaion characer () is used o coninue our commands o a second line. -->REGRESS DELIVERY, CASES, DISTANCE. DIAGNOSTICS ARE FULL. --> HOLD RESIDUALS(RESID), FITTED(FIT) We obain he following: REGRESSION ANALYSIS FOR THE VARIABLE DELIVERY PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT CASES DISTANCE CORRELATION MATRIX OF REGRESSION COEFFICIENTS CASES 1.00 DISTANCE CASES DISTANCE S = R**2 = 73.7% R**2(ADJ) = 69.3% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO CASES DISTANCE DIAGNOSTIC STATISTICS: STUDENTIZED CASE OBSERVED STANDARDIZED DELETED COOK'S NO. VALUE RESIDUAL RESIDUAL RESIDUAL DISTANCE LEVERAGE * *

61 4.6 LINEAR REGRESSION ANALYSIS "*" DENOTES AN OBSERVATION WITH A LARGE RESIDUAL A discussion of SCA oupu and regression diagnosic saisics is given in Secion 4.4. The fied equaion from he above regression can be obained from he firs few lines of oupu as DELIVERY = CASES +.46 DISTANCE. The esimaes associaed wih CASES and DISTANCE are saisically significan as heir absolue -values are greaer han 2.15 (he approximae 5% criical level for he sample size). The small -value associaed wih he inercep erm, 0.39, implies ha his esimae canno be disinguished saisically from zero. Hence we may wish o exclude his erm from our model (see Secion 4.2.3). However, before we employ his equaion, we need o check he models's validiy Some diagnosic checks of he model A regression analysis is no complee wihou diagnosic checks of he fi. A more complee discussion of diagnosic checking is given in Secion In an effor o assess he above model's validiy, we requesed a display of a se of diagnosic saisics by including he DIAGNOSTICS senence in he paragraph. By asking for a FULL display, we obain he values of hese diagnosic saisics for all cases. These saisics are meaningful provided here is no serial correlaion in he daa (see Secion 4.3.1) and he sample size is no very large. The value of he sandardized residual, sudenized deleed residual and Cook s disance (see Secion 4.4.2) for case number 5 mark i as a poenial oulier. The values obained using he fied equaion have been reained under he label FIT. The residuals of he fi (i.e., DELIVERY - FIT) are sored in he variable RESID. The residuals should approximae values ha are randomly drawn from a sandard normal disribuion. We can observe he spurious naure of his observaion (case number 5) in he probabiliy plo of he residuals and in he plos of he residual series RESID agains he explanaory variables CASES and DISTANCE (see Secion 4.4.2). In each case here is only one observaion ha leads us o quesion he adequacy of he fied model, observaion 5. -->PPLOT RESID

62 LINEAR REGRESSION ANALYSIS 4.7 NORMAL PROBABILITY I * I I I I * I I * I * + I * I S I ** I C I * I O I ** I R * + E I * I I * I I * I I I * RESID -->PLOT RESID, CASES * * I * * * * I * * * I* * I* * I * I I I R E S I D I I I I * CASES -->PLOT RESID, DISTANCE * * I ** * * I * * * I * * I * * I * I I I R E S I D I I I I * DISTANCE Observing he effec of a spurious observaion Mongomery (1991, page 504) suggess ha a daa recording error could have been made a observaion 5 (DELIVERY enered as 25 insead of 35). However, here was no way o verify his. To observe he effec of a possible recording error, we will recode he value o 35 and re-run he regression analysis. We can recode he value direcly using an analyic assignmen saemen (see Appendix A). -->DELIVERY(5) = 35 -->REGRESS DELIVERY, CASES, DISTANCE. DIAGNOSTICS ARE FULL. --> HOLD RESIDUALS (RESID), FITTED (FIT)

63 4.8 LINEAR REGRESSION ANALYSIS REGRESSION ANALYSIS FOR THE VARIABLE DELIVERY PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT CASES DISTANCE CORRELATION MATRIX OF REGRESSION COEFFICIENTS CASES 1.00 DISTANCE CASES DISTANCE S = R**2 = 96.6% R**2(ADJ) = 96.0% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO CASES DISTANCE DIAGNOSTIC STATISTICS: STUDENTIZED CASE OBSERVED STANDARDIZED DELETED COOK'S NO. VALUE RESIDUAL RESIDUAL RESIDUAL DISTANCE LEVERAGE * "*" DENOTES AN OBSERVATION WITH A LARGE RESIDUAL We observe ha he fied equaion is only slighly changed from o TIME = CASES +.46 DISTANCE TIME = CASES +.39 DISTANCE However, recoding he single poin has an appreciable effec on variance. We see:

64 LINEAR REGRESSION ANALYSIS 4.9 (1) Sandard errors of coefficiens for CASES and DISTANCE are 1/3 of wha hey were previously (resuling in a dramaic change in he -values of he coefficiens); (2) A subsanial change in he amoun of he REGRESSION sum of squares in he ANOVA able (from o ); and hence a 2 (3) Change in R from 73.7% o 96.6%. (Please see Secion for a more complee 2 discussion on he inerpreaion of R.) The probabiliy plo of he residuals reveals no apparen model inadequacy. -->PPLOT RESID NORMAL PROBABILITY I * I I I I * I I * I * + I * I S I 2 I C I * I O I * * I R * + E I * I I * I I * I I I * RESID Similarly, as would be expeced, he plos of RESID agains he explanaory variables CASES and DISTANCE now show no evidence of model inadequacy. Hence i is possible a simple recording error has affeced he resuls of he analysis dramaically. This indicaes he need for a careful diagnosic check of a model (see Secion 4.4.2) An overview of model specificaion in he REGRESS paragraph The SCA Sysem provides a number of ways o specify informaion regarding a regression or a fi of a linear model. This secion describes he mos frequenly used informaion.

65 4.10 LINEAR REGRESSION ANALYSIS Specifying dependen and independen variables The basic informaion required for a regression analysis are he names of he dependen and independen variables. In he above example, DELIVERY was regressed on CASE and DISTANCE. These variables are easily specified by lising heir names immediaely afer he REGRESS command. The firs variable specified is used as he dependen variable. All oher variables are used as regressors in he model. Hence REGRESS VARIABLES ARE DELIVERY, CASES, DISTANCE. or, as we used in abbreviaed form, REGRESS DELIVERY, CASES, DISTANCE. is inerpreed as a regression specificaion of DELIVERY on CASES and DISTANCE. Including a consan erm Whenever we lis he variables involved in a regression, a consan erm is also included. This is he defaul formulaion used by he SCA Sysem. The consan erm is usually imporan in a regression analysis as we ry o deermine if more informaion han mean level alone can be obained from he dependen variable. If we do no wan a consan erm in he regression, we need o add he logical senence NO CONSTANT afer he variable specificaion. For example, if we do no wan a consan in a regression for he beer daa, we need o sae REGRESS DELIVERY, CASES, DISTANCE. NO CONSTANT. 4.3 A Regression Analysis of Financial Daa To illusrae he use of regression analysis for business or financial daa, we consider some daa ses relaed o he sock marke. The daa consis of he following monhly series, each from January 1976 hrough June 1990 inclusive: (1) The monhly average of he Sandard and Poor s 500 sock index, (2) The monhly average of long erm governmen securiy ineres raes (from he Federal Reserve Bullein), and (3) The monhly composie index of leading indicaors (from Business Condiions Diges). The daa are lised in Table 4.2 and are ploed in Figure 4.1 (The plos were creaed using he SCAGRAF program). The daa are sored in he SCA workspace under he labels SP500, LONGTERM and LINDCTR, respecively.

66 LINEAR REGRESSION ANALYSIS 4.11 Table 4.2 Sock marke daa Year Monhly Average of Sandard and Poor's 500 Index (SP500) Year Monhly Average of Longerm Ineres Raes (LONGTERM) Year Monhly Composie Index of Leading Indicaors (LINDCTR)

67 4.12 LINEAR REGRESSION ANALYSIS Figure 4.1 Time Series Plos of Sock Marke Daa We see ha SP500 increases seadily unil observaion 142, a which ime i plummes for hree consecuive periods. This period corresponds o he sock marke crash in Ocober- December Since special modeling consideraions are necessary o handle his period appropriaely (see Chapers 6 and 7), we will resric our regression analysis o he firs 141 observaions. A ime series analysis for he daa over he same daa span is provided in Chaper 8. We will also analyze he naural logarihms of all ime series. The logarihmic ransformaion is frequenly used o achieve a more homogeneous variance in a daa se. In he case of economic daa, i is also employed so ha he parameers in he model can be inerpreed in erms of elasiciy. In his way, we can assess he percen change in he response for a 1% change in an explanaory variable. We can modify he daa using he following sequence of commands: -->LNSP500 = LN(SP500) -->LNLONG = LN(LONGTERM) -->LNLEAD = LN(LINDCTR) -->SELECT LNSP500, LNLONG, LNLEAD. SPAN IS (1,141).

68 LINEAR REGRESSION ANALYSIS 4.13 The plos of LNSP500, LNLONG and LNLEAD are shown in Figure 4.2. We anicipae he effec of long erm ineres raes on he sock index o be negaive, since as he long erm rae increases, invesors end o purchase bonds raher han socks. We also expec ha he sock index should reflec he curren sae of he leading indicaors. The laer may be rue based on hese plos. To explore possible relaionships, a common pracice is o regress he dependen variable on he explanaory variables. Hence, we will regress LNSP500 on boh LNLONG and LNLEAD. Tha is, we will obain he fied equaion LNSP500 = b0 + b 1 LNLONG + b 2 LNLEAD -->REGRESS LNSP500, LNLONG, LNLEAD. DW. HOLD RESIDUALS(RES). REGRESSION ANALYSIS FOR THE VARIABLE LNSP500 PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT LNLONG LNLEAD CORRELATION MATRIX OF REGRESSION COEFFICIENTS LNLONG 1.00 LNLEAD LNLONG LNLEAD S =.1405 R**2 = 83.5% R**2(ADJ) = 83.2% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO LNLONG LNLEAD DURBIN-WATSON STATISTIC =.08

69 4.14 LINEAR REGRESSION ANALYSIS Figure 4.2 Logged Sock Marke Daa (January 1976 hrough Sepember 1987) The fied equaion from he above regression is LNSP500 = LNLONG LNLEAD. The above esimaes (excep ha for LNLONG) are significan a abou he 5% level. The 2 R value (see Secion 4.4.1) is over 83% and he F-value of he regression is highly significan. If we rely on his informaion alone, we may conclude ha we have a good fi. However, a closer inspecion of he fied model will show his is no he case. One concern we may have regarding he fied model is he sign of he parameer esimae associaed wih LNLONG. As noed previously, we expec i o be negaive, and i is no in his fi. Anoher problem is seen in he value of he Durbin-Wason saisic (see Secion and Secion below for more informaion on his saisic). The saisic was requesed wih he inclusion of he logical senence DW in he above paragraph. Is value, 0.08, is a clear indicaion of firs-order serial correlaion in he residual series.

70 LINEAR REGRESSION ANALYSIS 4.15 The residual series (i.e., he difference beween he observed values and hose from he fied equaion) is a crucial series for diagnosic checks of he model. The series, mainained here in he SCA workspace under he label RES, should approximae values ha are drawn randomly from a normal disribuion. Such a series is also known as a whie noise process. Whie noise displays no paern when ploed over ime. However, a disinc paern is sill observable in a ime plo of he residual series RES (see Figure 4.3). Figure 4.3 Time Plo of he Residuals of he Regression of LNSP500 on LNLONG and LNLEAD Serial correlaion The error erms of our linear model (see Secion 4.1) are usually assumed o be serially uncorrelaed in a regression analysis. Tha is, he value of he error associaed wih one observaion should no be relaed o he value of he error of anoher observaion. If we analyze daa ha have been recorded over ime, i is ofen he case ha his assumpion is no rue. This is paricularly rue of business daa (as in his example) and of daa from indusrial experimens ha have no been randomized. If we do no deec he presence of serial correlaion and correc for i, he model esimaes are inefficien and our analysis can be flawed seriously. For a discussion of he problems ha can arise, see Box and Newbold (1971) and Neer, Wasserman, and Kuner (1983, Chaper 13). We can check for serial correlaion in a residual series by using he ACF paragraph (see Chaper 5). The ACF paragraph calculaes a saisic measuring he correlaion presen beween residual a ime (i.e., e ) and he residual ha occurred l ime periods prior o i (i.e., e l ). The value l is known as he lag. The ACF paragraph can be used o calculae and display a sequence of auocorrelaions in he residual series. I is useful o observe he values of he auocorrelaions for a sequence of lags. Auocorrelaions of higher lags may provide us wih meaningful informaion (e.g., a seasonal period). The ACF paragraph will graphically display he calculaed values ogeher wih a se of 95% confidence inervals. To obain he auocorrelaions of he above residual series RES for he firs 12 lags, we can simply ener We obain -->ACF RES. MAXLAG IS 12.

71 4.16 LINEAR REGRESSION ANALYSIS TIME PERIOD ANALYZED TO 141 NAME OF THE SERIES RES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXX+XXXXXXXXXXXXXXXXXXXX IXXXXXX+XXXXXXXXXXXXXXX IXXXXXXXX+XXXXXXXXXXX IXXXXXXXXX+XXXXXXXXX IXXXXXXXXXX+XXXXXXX IXXXXXXXXXXX+XXXX IXXXXXXXXXXX+XX IXXXXXXXXXXXX IXXXXXXXXXXX IXXXXXXXXXX IXXXXXXXX IXXXXXXX + A frequenly used saisic o assess serial correlaion is he Durbin-Wason (DW) saisic. The DW saisic can be used in a es for he presence of a firs order auocorrelaion in he residual series. Inclusion of he senence DW will lead o a display of he DW saisic. As noed before, he value of he Durbin-Wason saisic above is.08. An exac es based on he DW saisic is no always possible. However, abulaed upper and lower bounds for he saisic can be used in one or wo ailed ess (see Secion 3.11 of Draper and Smih, 1981, or Secion 13.3 of Neer, Wasserman and Kuner, 1983). The DW saisic above is significan a he 1% level. This indicaes he presence of serial correlaion in he residual series. This conclusion is more apparen by observing he ACF of he residual series. I is worh noing ha for large samples he DW saisic is approximaely equal o 2 2r 1, where r 1 is he lag 1 auocorrelaion of he residual series. In he above example =.95 and 2-2(.95) =.10; he DW value displayed is.08. r 1 The ACF of he residuals adds imporan informaion ha is missed by he Durbin- Wason saisic. In some siuaions, he DW saisic may imply here is no correlaion presen in he residuals. This can be misleading as he DW saisic is only used o check for firs-order serial correlaion in he residuals. Insead, he ACF provides us wih a sequence of auocorrelaions. This is paricularly imporan when seasonaliy is presen in he daa. Because of he relaionship beween r 1 and he DW saisic, and he fac ha more informaive saisics can be obained from he ACF paragraph, i is no recommended ha he DW saisic be used as he only check for serial correlaion.

72 LINEAR REGRESSION ANALYSIS Adjusmens for serial correlaion If serial correlaion is presen, hen we need o make appropriae accommodaions in our model. There are a number of opions available o us wihin he linear regression framework. For example, if he correlaion is he resul of he presence of a linear, quadraic, or seasonal rend in he series, hen we may be able o incorporae specific ime dependen variables as explanaory variables in our model (see Secion 3.3 of Cryer, 1986). Such remedies are usually no saisfacory. A more effecive adjusmen for serial correlaion may be o aler he model iself. For example, one such mehod is o model he change in a series, raher han he series iself. Tha is, insead of using he recorded (or ransformed) values of he dependen variable (i.e., Y 1,Y 2,Y 3,... ), we use he change from one period o he nex (i.e., Y2 Y 1,Y 3 Y 2,Y 4 Y 3,...). We replace he original series wih one consising of differences, or differenced daa. We also use he differenced series for each of he explanaory variables. We can use he DIFFERENCE paragraph (see Appendix C) o creae hese differenced series. We hen can use he REGRESS paragraph o regress he differenced values of LNSP500 on he differenced values of LNLONG and LNLEAD (he SCA oupu below is edied for presenaion purposes). -->DIFFERENCE LNSP500,LNLONG,LNLEAD. NEW ARE DLNSP500, DLNLONG, DLNLEAD. -->REGRESS DLNSP500, DLNLONG, DLNLEAD. DW. HOLD RESIDUALS(RES). REGRESSION ANALYSIS FOR THE VARIABLE DLNSP PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT DLNLONG DLNLEAD CORRELATION MATRIX OF REGRESSION COEFFICIENTS DLNLONG 1.00 DLNLEAD DLNLONG DLNLEAD S =.0294 R**2 = 20.1% R**2(ADJ) = 18.9% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO DLNLONG DLNLEAD DURBIN-WATSON STATISTIC = 1.70

73 4.18 LINEAR REGRESSION ANALYSIS The fied equaion for his model is DLNSP500 = DLNLONG DLNLEAD. (4.4) All parameer esimaes and he F-raios for he regression are significan. Moreover, he signs of he regression coefficiens have he sense we expec and he Durbin-Wason saisic does no indicae serial correlaion. Oher diagnosic checks of his model suppor is validiy. One check, he ime series plo of he residuals (see Figure 4.4), reveals no apparen paern in he residual series. Noe 2 ha he R value for his model is only abou 20%, ye he model seems o fi well. This is 2 an indicaion of why we should no rely on he R value as a measure of he adequacy of a 2 model. We will examine he R value for his example in more deail in he nex secion and in Chaper 8. Figure 4.4 Time plo of he Residual of he Regression of DLNSP500 on DLNLONG and DLNLEAD Lagged regression In he previous secion, we illusraed one effecive mehod for dealing wih serial correlaion, alering he variables used in he regression model. A beer change may be o include a serially correlaed error erm in he model. Such a change is wihin he framework of ransfer funcion modeling, and is discussed in more deail in Chaper 8. Anoher possibiliy is o consider a lagged regression. In a lagged regression, we broaden he explanaory variables of a model by including lagged values of one or more variables wihin he model. To illusrae his concep, consider he fied equaion used in Secion LNSP500 = b0 + b 1 LNLONG + b2lnlead (4.5) This fied equaion considers only he conemporaneous values of he variables involved (ha is, observaions recorded a he same ime period). We can show his by explicily including ime subscrips in (4.5) o obain LNSP500 = b0 + b 1 LNLONG + b 2 LNLEAD (4.6)

74 LINEAR REGRESSION ANALYSIS 4.19 I is possible ha an explanaory variable may lead he dependen variable. Tha is, he value of he dependen variable may be relaed o values of he explanaory variable ha occur earlier. To allow for such leading relaionships, we could consider regressing he dependen variable on boh conemporaneous and prior observaions of a variable; in effec creaing new explanaory variables by shifing exising ones in ime. For example, we may consider relaing LNSP500 o boh he curren (monhly) value of LNLONG and he value of LNLONG observed one period (monh) ago. We may also do he same for LNLEAD. In such a case, he fied equaion (4.6) becomes LNSP500 = b + b LNLONG + b LNLONG b LNLEAD + b LNLEAD (4.7) We can also allow for oher sysem dynamics by using previously observed values of he dependen variable as one or more explanaory variables. For example, if we add he prior (monhly) value of LNSP500 as an explanaory variable in (4.7) we have LNSP500 = b + b LNLONG + b LNLONG b 3 LNLEAD + b 4 LNLEAD 1+ b 5 LNSP500 1 (4.8) Since lagged regression models can display a level of sysem dynamics, hey are someimes referred o as dynamic regression models. We can obain he above fi by using he LAG paragraph o creae he lagged series (see Appendix C) and he REGRESS paragraph. The above model is discussed in more deail in Chaper 8. In Secion 4.3.2, we fi a regression model using differenced daa for all series. The differenced series can be represened in erms of curren and lagged series. Specifically, for he series used in Secion we have =, DLNSP500 LNSP500 LNSP500 1 DLNLONG LNLONG LNLONG 1 DLNLEAD = LNLEAD + LNLEAD 1, =, and for =2, 3,... (he value for =1 is undefined). If we employ he ime index,, in he fied equaion obained in Secion 4.3.2, we have = + DLNSP DLNLONG.699DLNLEAD. (4.9) We can re-wrie his in erms of a lagged regression as (LNSP500 LNSP500 ) = (LNLONG LNLONG ) (LNLEAD LNLEAD ) 1 1 (4.10)

75 4.20 LINEAR REGRESSION ANALYSIS The equaion given in (4.10) is equivalen o he lagged regression of (4.8) above wih b1 = b 2 =.342 ; b3 = b 4 =.699; and b5 = 1.0. An unresriced fi of (4.8) (shown in Chaper 8) resuls in approximaely hese esimaes. 2 2 A final noe on he R value corresponding o he fied equaion (4.9) and he R value 2 for he fied equaion (4.8). The R value associaed wih (4.9) is abou 20%. However, he 2 2 R value for he equivalen model (4.8) is almos 100%. The difference in he R value is due o variaion in he dependen series (DLNSP500 versus LNSP500) and no he variaion in he residual series (as he residual series for each fied model are virually idenical o one 2 anoher). Hence he R value can be a very misleading saisic Inerpreaion of ransformaions In Secion 4.3.1, he logarihmic ransformaion of all daa was used in he analysis, while he difference of logged values was used in he model of Secion As noed briefly in Secion 4.3.1, he logarihmic ransformaion was used more for how he parameers of he model could be inerpreed, han for any need o achieve a homogeneiy in he variance of he errors (Box and Cox, 1964). Neer, Wasserman and Kuner (1983, page 137) noe ha such a use of he logarihmic ransformaion is ofen preferred by economiss o linearize he relaionship beween he inpu variables and he oupu. In his way, he parameers can be inerpreed as he elasiciy beween he variables. The use of he differences of logged daa in Secion also has a physical inerpreaion. Mahemaically, he analysis of he difference of logged values is essenially he same as he analysis of he percen change of he original series (i.e., no differenced and no logged). This can be confirmed by comparing he firs-order Taylor series approximaion of each represenaion (see page 90 of Abraham and Ledoler, 1983). 4.4 Oher Regression Topics This secion provides an overview of some opics relaed o he SCA REGRESS paragraph and o regression analysis. This maerial may be skipped, and seleced informaion be referenced as needed. The maerial presened, and he secion conaining i, are: Secion Topic Inerpreing SCA oupu Diagnosic checks in regression analysis Saisical measures for spurious and influenial observaions Informaion on oher special regression relaed opics can be found in Secion 9.6 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis.

76 LINEAR REGRESSION ANALYSIS Inerpreing SCA oupu The SCA Sysem generaes and displays imporan informaion regarding a regression. This informaion can be used in several conexs, including inference and predicion. I is imporan o noe ha he validiy of he esimaes of he regression equaion, and any inference or predicion made from a regression, is based on he daa a hand and he validiy of he model being fi. Hence i is imporan o carefully check any model for ouliers or spurious observaions and for deviaions from he assumpions of he model. To illusrae he use of SCA oupu for inference and predicion, we will consider he oupu of he iniial regression of he beer daa (wihou an adjusmen of observaion 5). We will reproduce he oupu in a more complee form. -->REGRESS DELIVERY, CASES, DISTANCE. DIAGNOSTICS ARE FULL. FIT. HOLD RESIDUALS (RESID), FITTED(FIT). REGRESSION ANALYSIS FOR THE VARIABLE DELIVERY PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT CASES DISTANCE CORRELATION MATRIX OF REGRESSION COEFFICIENTS CASES 1.00 DISTANCE CASES DISTANCE S = R**2 = 73.7% R**2(ADJ) = 69.3% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO CASES DISTANCE DIAGNOSTIC STATISTICS: STUDENTIZED CASE OBSERVED STANDARDIZED DELETED COOK'S NO. VALUE RESIDUAL RESIDUAL RESIDUAL DISTANCE LEVERAGE * *

77 4.22 LINEAR REGRESSION ANALYSIS "*" DENOTES AN OBSERVATION WITH A LARGE RESIDUAL FITTED VALUES AND THEIR STANDARD ERRORS: CASE OBSERVED FITTED STD ERR OF NO. VALUE VALUE FITTED VALUE LEVERAGE Esimae of he variaion of he error erms Inferences or predicions drawn from his regression are based on he sample ha is drawn, or he informaion a hand. For example, if we obain anoher sample of 15 observaions for he beer daa, i is likely he fied equaion will change. A key is how much i may change. Hence i is imporan o have some measure of uncerainy (or variaion). In examining he linear regression model, we see a key uncerainy is he variabiliy of wha is 2 sill unexplained afer fiing he model, ha is, he error erm. The smaller σ, in relaion o he uni of measuremen of Y, he more precise our predicion of Y for values of X 1,X 2,...,X m. An esimae of σ, he sandard deviaion of he error erms, is calculaed from he daa. This value, denoed by s, is compued according o s = SSE n p where SSE is he sum of squared errors, n is he number of observaions, and p is he number of parameers esimaed. SSE and (n - p) are displayed in he analysis of variance able on he line labeled RESIDUAL. We see in he iniial fi of he beer daa 2 s = mean square error = /12 = 9.865,

78 LINEAR REGRESSION ANALYSIS 4.23 so ha s = able. 1/2 (9.865) = This value is displayed jus above he analysis of various Parameer inference, ess of significance We can consruc ess of significance of he parameers of our model. The es saisic ha is used is (esimae) (hypohesized value) = (esimaed sandard deviaion of esimae) This saisic is hen compared wih a criical value of he -disribuion wih (n-p) degrees of freedom. The -value displayed by he SCA Sysem is he value associaed wih a es of parameer = 0. In he beer daa example, he -values for boh of he esimaes associaed wih CASES and DISTANCE are significan a he 1% level. Hence hese esimaes are saisically differen from zero. However, he hypohesis ha he inercep is zero canno be rejeced a he 5% level, since he -value is We can also use displayed informaion for ess of oher specific values. For example, o es he hypohesis ha he coefficien of DISTANCE is.5 agains he alernaive i is no, we compue = = =.3004 is no significan a he 5% significance level, so he hypohesis canno be rejeced a his level. Amoun of variaion explained A measure of how well a regression model explains a response variable is in he amoun of he variabiliy of he response variable ha can be aribued o he linear model. 2 This value, R, can be calculaed as 2 (sum of squares due o regression) R = (oal sum of squares, adjused for he mean) These quaniies are all displayed by he SCA Sysem. In he firs regression of he beer example, we had 2 R = / =.7368 = 73.7%

79 4.24 LINEAR REGRESSION ANALYSIS 2 The R value is someimes used as a crierion in choosing he mos appropriae regression 2 model from among subses of possible explanaory variables. Since he R value above does no accoun for he number of parameers presen in a model, i is useful o adjus he value for 2 he number of parameers. This value, R, is calculaed as a 2 n 1 sum of squares due o error R a = 1 n p adjused oal sum of squares In he beer example we have 2 R a = 1 - [(15-1)/(15-3)][ / ] =.6929 = 69.3% This value is displayed as R**2(ADJ). Prediced values from a regression The fied equaion from he beer regression is DELIVERY = CASES DISTANCE To predic he value of DELIVERY for observaion number 1 (CASES = 10, DISTANCE = 30), we would use he above equaion and obain, approximaely, DELIVERY = (10) +.456(30) = By including he logical senence FIT in he REGRESS paragraph, we obain fied values for all cases in our sample. We may also wish o predic he value of DELIVERY a oher plausible combinaions of values for CASES and DISTANCE ha are no par of our sample. For example, if we wish o predic a value of DELIVERY for CASES = 20 and DISTANCE = 30, we would use he fied equaion and obain DELIVERY = (20) +.456(30) = Deviaion of a fied value When he FIT senence is included in he REGRESS paragraph, an esimae of he sandard error of fi is provided for each fied value. For each case we can also obain a confidence inerval for he average value of he response. This inerval is calculaed using he fied value, Y ˆ, he esimaed sandard error of fi, and a value aken from a -able for (n - p) degrees of freedom and he size of he confidence inerval desired. The end poins of he inerval are Ŷ ± (esimaed sandard error of fi) x (abled -value)

80 LINEAR REGRESSION ANALYSIS 4.25 For he beer daa, he abled -value for a 95% confidence inerval is The end poins of confidence inerval for he average value of TIME for he specific realizaion CASES = 10, DISTANCE = 30 (observaion 1) are ± (1.397)(2.179) or and Hence, given he daa, we have a 95% level of confidence ha he average ime of delivery for all siuaions in which 10 cases are delivered o a maximum disance of 30 miles is beween and minues. Predicion inerval for a single fied value The fied value a a poin as calculaed above gives us an indicaion of he average value we could observe for a given realizaion of values of he explanaory variables. We can also consruc a predicion (confidence) inerval for he specific values ha can occur. The inerval is calculaed in he same manner as above, excep he esimae of sandard error is larger. I can be shown his sandard error is ( esimae of sandard error of fied value) + s 2 2 Using his sandard error, he end poins for a 95% predicion (confidence) inerval for he firs observaion are or ± and (1.397) + (3.141) We can also obain predicion (confidence) inervals for poins no in our sample. This can be done by including addiional observaions in all explanaory variables of he regression and giving he response variable he missing value code. For example, suppose we add a 16h observaion o he beer sample wih CASES = 20 and DISTANCE = 30. If we now use he REGRESS command as before including he FIT senence, we will obain he same resuls as before wih he following change in he fied informaion. FITTED VALUES AND THEIR STANDARD ERRORS: CASE OBSERVED FITTED STD ERR OF NO. VALUE VALUE FITTED VALUE LEVERAGE

81 4.26 LINEAR REGRESSION ANALYSIS ***** h We see he fied value lised for he 16 observaion is 33.53, as we calculaed before. The end poins of 95% predicion inerval for his fied value are or ± and (.954) + (3.141) Alhough a predicion and predicion inerval can be obained for any se of values for he explanaory variables, i is imporan o realize he validiy of a predicion is less reliable he furher removed we are from he range of values he explanaory variables assume in he regression. Tha is, alhough i may be reasonable o predic DELIVERY for CASES = 10 and DISTANCE = 30, i is unreasonable o ry o exend a predicion for CASES = 100 or DISTANCE = 75 as hese values are far removed from he range of values used o obain he fied equaion Diagnosic checks of a fied model A careful regression analysis includes more han he specificaion and esimaion of a regression model. A model should be checked carefully o deermine if here are any model inadequacies or deviaions from he assumpions of he model. The REGRESS paragraph can calculae and display several saisics ha are useful in a diagnosic check of a model. In addiion, he residuals from a fi, ha is, he variable consising of he values e = Y Yˆ j = 1,2,...,n. j j j can be reained in he SCA workspace for furher analysis. The analysis of residuals includes, bu is no limied o, various plos of residuals and he examinaion of saisics of he residuals o ascerain if hey are consonan wih posulaed assumpions of he error srucure. This secion reviews useful diagnosic checks ha are readily available wihin he SCA Sysem. A more complee discussion of hese checks can be found in Draper and Smih (1981, Chaper 3) and Neer, Wasserman and Kuner (1983, Chaper 4). Many diagnosic checks are discussed in he secion. I is worh noing ha no all possible diagnosic checks are discussed here, nor is i recommended ha all checks discussed here be used in every analysis. Clearly some checks are more relevan han ohers and ofen he conex of a problem will dicae hose checks ha are worh consideraion. Diagnosic checks can usually be classified as eiher being a check of how well a model fis (i.e., checks for lack of i) or a check of he assumpions of he model. Checks on model assumpions include examinaion for he presence of serial correlaion, checks for a zero mean

82 LINEAR REGRESSION ANALYSIS 4.27 and consan variance in he residuals, and checks on he assumpion of normaliy. When possible, we will indicae he purpose of he diagnosic check discussed. Plos of residuals Lised below are some useful plos of residuals. These plos should be considered, when appropriae, in a regression analysis. Also included below are he names of he SCA paragraph(s) ha can be used o generae he plo. (A) Plos o deec lack of fi Plo agains explanaory variables (PLOT): The plos here can help o reveal any model inadequacy and indicae if any exra erms are needed in he model (e.g., X2 in addiion o X o accoun for a curvilinear relaionship) Plos agains variables no used in model (PLOT): Ploing residuals agains variables excluded from a model could reveal he presence of imporan explanaory variables ha should be included in he analysis (see Neer, Wasserman and Kuner, 1983, page 120). Time series plos: See (B) below (B) Plos o deec serial correlaion Time series plo (TSPLOT): Whenever observaions are recorded in ime order, i is imporan o plo daa over ime. This can reveal ouliers, a variance ha is no consan over ime, or he presence of linear or quadraic rend ha should have been included in he model. A plo over ime is also useful in observing runs of posiive or negaive residual erms, and hus indicaing if serial correlaion is presen in he residual series. Plo of he auocorrelaion funcion (ACF): The ACF can be used o deec hose lag orders a which here is significan serial correlaion. The ACF is a more powerful ool han he Durbin-Wason saisic (see Secion 4.3.1). The ACF can also be used o deec nonsaionariy in he original series (see Chaper 5). (C) Plos o check on mean and variance Plo agains fied values, Ŷ (PLOT): A plo of he residuals agains Ŷ can help reveal ouliers (large residuals) or non-homogenous variance (a variance ha increases wih he level of Ŷ). Plo agains explanaory variables: The plos here can help reveal similar anomalies as a plo agains fied values. In addiion, hese may be useful in deermining specific explanaory variables ha could be involved.

83 4.28 LINEAR REGRESSION ANALYSIS (D) Checks on normaliy Probabiliy plo of residuals (PPLOT): This is a useful visual check of he residuals. If he assumpion of normaliy is valid, a normal or half-normal plo of residuals should yield an approximae sraigh line wih no poin oo far apar from he res. Simple plo of residuals (HISTOGRAM or DPLOT): This is useful as a visual check of he normaliy assumpion and o spo poenial oulying or spurious observaions. Saisics of residuals or fi The SCA Sysem can also calculae and display useful diagnosic saisics of a regression or he residuals of a regression. Lised below is a summary of useful diagnosic saisics and how hey may be obained in he SCA Sysem: (a) Leverage, Cook's disance, sandardized residuals, sudenized deleed residuals (DIAGNOSTICS senence): These are useful in he idenificaion of spurious and influenial observaions. See Secion below for a discussion. (b) Checks on randomness (DW senence, ACF and NPAR paragraphs): The Durbin- Wason saisic (DW) can be used o assess he randomness of residuals. The DW saisic and he auocorrelaion funcion (ACF) are discussed in Secion The nonparameric RUNS es can also be employed o es he randomness of he residuals. (See Chaper 11 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis for informaion on nonparameric ess.) (c) Tess for normaliy (NPAR paragraph): The residuals of he fi can be examined by many nonparameric es saisics o check on goodness of fi. Possible ess are he Kolmogorov-Smirnov or chi-square es. (See Chaper 11 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis for informaion on nonparameric ess.) Saisical measures for spurious and influenial observaions In Secion 4.1, we saw he need o diagnosically check a model o discover a spurious observaion. In his secion we summarize he diagnosic saisics compued in he SCA Sysem ha may be used o help highligh spurious and influenial observaions. These saisics are only appropriae when he sample size is no large and when here is no serial correlaion in he daa. Discussions relaed o he idenificaion of such observaions, and remedial measures, can be found in Neer, Wasserman, and Kuner (1983, Secions 11.5 and 11.6). The inclusion of he DIAGNOSTICS senence in he REGRESS paragraph provides us wih a number of useful saisical measures for he idenificaion of boh spurious and influenial observaions. The compuaional measures used o calculae hese saisics can be

84 LINEAR REGRESSION ANALYSIS 4.29 found in Secion 9.6 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Leverage An oulying or spurious observaion may have lile influence on he fied regression equaion. However, any poin can be very influenial based on is relaive posiion o he oher observaions used in he fi. These observaions should be sudied o see if, in addiion, hey are ouliers. One measure of he imporance of a single observaion is he leverage i has on a fi. A large leverage indicaes he observaion is disan from he cener of he remaining observaions. As a resul he mass of oher observaions ac as a fulcrum for he leverage applied by he single poin. In order o esablish he significance of he leverage value, we may check o see if i is greaer han 2p/n where n is he number of observaions in he regression and p is he oal number of parameers calculaed. This rule of humb is useful in spoing influenial poins (Neer, Wasserman and Kuner, 1983, page 403). In he beer example, 2p/n = 2(3/15) =.40. No case of his example has a leverage value greaer han his cu off value. Alhough an observaion wih a high leverage is imporan in an analysis, an oulier does no need o have grea leverage. The oulier ha was found in his example did no have saisically significan leverage, bu i affeced fied resuls grealy. In any even, we need o be aware of observaions wih grea leverage. Cook s disance An overall measure of he impac of a single observaion on he fi of a regression equaion is given by Cook s disance. If an observaion has a subsanial effec on a fi and is deermined o be spurious or an oulier, hen a decision regarding possible remedial measures is required (see page 409 of Neer, Wasserman and Kuner, 1983, for a discussion). The value of Cook's disance should be compared wih percenage poins of he F(p, n-p) disribuion (n and p are he same as defined above) o deermine is significance. The Cook s disance associaed wih observaion 5 of he beer daa is also no significan a he 5% level of he F(3,12) disribuion. On he basis of his saisic alone, we may conclude ha no remedial measures are required. However, we have seen he consequence of one such measure (ha is, recoding he value of he response from 25 o 35). Sandardized residual The residuals of he fied equaion, Y ˆ i Y i, are usually assumed o approximae a normal or disribuion wih a zero mean. If hese values are divided by heir sandard error, hey should hen be consonan wih he sandard normal or disribuion.

85 4.30 LINEAR REGRESSION ANALYSIS For each observaion, he REGRESS paragraph can display he observed value, Y i, he residual, and he sandardized value of he residual. In he REGRESS paragraph, each residual is sandardized using an esimae of he sandard error based on is leverage and he 2 value s. Residuals sandardized in his manner are also known as sudenized residuals. These values can be compared wih percenage poins of he sandard normal or disribuion. The value of he sandardized residual of observaion 5 of he beer daa, -3.27, is clearly significan. This indicaes he observaion meris furher sudy, or some remedial measure. Sudenized deleed residual As a refinemen o he sandardized (sudenized) residual, we can also calculae he h residual a he j observaion when he fied regression is based on all observaions excep h he j observaion. In his manner he individual observaion canno influence he regression. Residual values obained are appropriaely sandardized and are known as deleed sudenized residuals. Values are compared wih he (n-p-1) disribuion, wih n and p as before. h In he beer daa example, he (11) disribuion is used. The 5 observaion is a clear aberraion (he value is significan a almos all levels) and warrans sudy. I should be noed if wo or more ouliers are almos coinciden, his measure may fail o be useful. Hence i is always imporan o plo he residuals.

86 LINEAR REGRESSION ANALYSIS 4.31 SUMMARY OF THE SCA PARAGRAPH IN CHAPTER 4 This secion provides a summary of he SCA paragraph employed in his chaper. The synax is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of he paragraph, while he full display presens all possible modifying senences of he paragraph. In addiion, special remarks relaed o he paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. In his secion, we provide a summary of he REGRESS paragraph. Legend (see Chaper 2 for furher explanaion) v i r w : variable name : ineger : real value : keyword REGRESS Paragraph The REGRESS paragraph is used eiher (1) o specify and esimae he parameers of a linear model by lising he response (dependen) and explanaory (independen) variables of he model, or (2) o modify and esimae he parameers of an exising model.

87 4.32 LINEAR REGRESSION ANALYSIS Synax of he REGRESS Paragraph Brief synax REGRESS VARIABLES ARE v1, v2, ---. DIAGNOSTICS ARE w. DW. / NO DW. FIT. / NO FIT. HOLD RESIDUALS(v1), FITTED(v1). Required: Lis of variables (i.e., VARIABLES senence) Full synax REGRESS VARIABLES ARE v1, v2, ---. NAME IS v. NO CONSTANT. / CONSTANT. DIAGNOSTICS ARE w. DW. / NO DW. FIT. / NO FIT. HOLD RESIDUALS (v1,v2,---), FITTED(v1,v2,---), ESTIMATE(v), INVXPX(v), MSE(v), ---. SPAN IS i1, i2. WEIGHT IS v. INCLUDE v1, v2, ---. EXCLUDE v1, v2, ---. ANOVA IS w. RIDGE IS v. OUTPUT IS LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w). Required: Lis of variables (i.e., VARIABLES senence) or NAME senence Senences Used in he REGRESS Paragraph VARIABLES senence A lis of variables or he VARIABLES senence is used o lis he dependen and explanaory variables of he regression model. The firs variable specified is used as he dependen variable and all oher specified variables are used as explanaory variables. NAME senence The NAME senence is used o specify a name for he regression model. This is an opional senence when variables (i.e., he VARIABLES senence) are specified. If a

88 LINEAR REGRESSION ANALYSIS 4.33 name is specified, he regression model and relaed informaion will be sored under he specified model name and can be used in subsequen analyses. When an exising model is being modified, variable (i.e., he VARIABLES senence) should no be used. NO CONSTANT senence The NO CONSTANT senence is used o exclude a consan erm from an analysis. The defaul is CONSTANT (ha is, include a consan erm in he analysis). DIAGNOSTICS senence (see Secion 4.4.3) The DIAGNOSTICS senence is used o specify ha diagnosic saisics should be compued and displayed. Valid keywords are FULL and BRIEF. If FULL is specified hen he residual, sandardized residual, sudenized deleed residual, Cook's disance and leverage are compued and displayed for all daa poins. If BRIEF is specified hen he above saisics are displayed for significan values only. DW senence (see Secion 4.3.1) The DW senence is used o specify ha he Durbin-Wason saisic be compued for he residuals of he model. The defaul is NO DW, ha is, no compuaion of he saisic. FIT senence (see Secion 4.4.1) The FIT senence is used o specify he display of fied values of he response variable, and associaed saisics, for all observaions. Also displayed are he sandard error of he fied value and he leverage of he observaion. A fied (prediced) value for poins no in he sample can be compued by including addiional value(s) in all explanaory variables and he missing value code in he response variable. The defaul is NO FIT, no display of fied value informaion. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are p laced in he variable named in parenheses. The defaul is ha no values are reained afer he paragraph is execued. The values ha may be reained are: RESIDUA LS : he residuals of he fied model. The number of variable names specified mus be he same as he number of dependen variable columns in he model. FITTED : he value for each response variable based on he esimaed model. The number of variables specified mus be he same as he number of response variable columns in he model. SEFIT : he esimaed sandard error of fi for each fied value ESTIMATES : he complee se of parameer esimaes INVXPX : he inverse of X X (The produc of INVXPX and MSE yields he esimaed variance-covariance marix of he parameer esimaes.) MSE : he mean square error (marix) of he model LEVERAGE : he leverage of each observaion COOK : he Cook's disance for each observaion SRESID : he sandardized (sudenized) residual value for each observaion SDR : he sudenized deleed residual for each observaion

89 4.34 LINEAR REGRESSION ANALYSIS The following are infrequenly used senences of he paragraph. More informaion regarding heir use may be found in Secion 9.6 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. SPAN senence The SPAN senence is used o specify he span of cases, from i1 o i2, of he response variable and corresponding explanaory variables o be used in he analysis. The senence may be employed for he piecewise fiing of a model. The defaul is all observaions. The SPAN senence canno be used if a model is being re-esimaed. WEIGHT senence The WEIGHT senence is used o specify a variable conaining a weigh for each response observaion. The defaul is 1.0 for each observaion. The WEIGHT senence canno be used if a model is being re-esimaed. INCLUDE senence The INCLUDE senence is used o modify a previously defined model by specifying hose response and explanaory variables o be included in he analysis. Noe ha he INCLUDE and EXCLUDE senence are muually exclusive in he same paragraph. EXCLUDE senence The EXCLUDE senence is used o modify a previously defined model by specifying hose response or explanaory variables o be excluded from he analysis. Noe ha he INCLUDE and EXCLUDE senences are muually exclusive in he same paragraph. ANOVA senence The ANOVA senence is used o obain differen analysis of variance ables. The keyword may be PARTIAL (for parial sum of squares), SEQUENTIAL (for sequenial sum of squares), BOTH, or NONE. The defaul is SEQUENTIAL. The parial sum of squares able shows how each explanaory variable of a regression conribues o he oal sum of squares if all oher facors in he model are included. The sequenial sum of squares able shows he conribuion o he oal sum of squares of each facor in he regression model, assuming each facor is fied in he sequenial order specified in he VARIABLES senence. RIDGE senence The RIDGE senence is used o specify he name of a vecor of q values conaining he ridge consans for a ridge regression analysis, where q is he order of he correced X X marix (ha is he marix derived using deviaions from sample means as enries in he X marix. The correced X X marix does no conain elemens relaed o he consan erm as each elemen is subraced by a mean correcion value. Noe ha q = p-1 if he model has a consan erm, and q = p if he model does no have a consan erm). The defaul is 0.0 for all ridge consans, ha is, no ridge consrains. OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo sage procedure. Firs, a basic LEVEL of oupu

90 LINEAR REGRESSION ANALYSIS 4.35 (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu prined are: BRIEF : SUMMARY and ESTIMATES NORMAL : SUMMARY, ESTIMATES, and RCORR DETAILED : SUMMARY, ESTIMATES, RCORR, CORR, COVAR, and AIC where he reserved words (and keywords for PRINT, NOPRINT) on he righ denoe: SUMMARY : he summary of all variables in regression analysis which include sample mean, sandard deviaion, and coefficien of variaion RCORR : he correlaion marix for he esimaes of he regression coefficiens CORR : he correlaion ma rix for all variables in he regression analysis COVAR : he covariance marix for he esimaes of he regression coefficiens ESTIMATES : he esimaes of he regression coefficiens AIC : Akaike's Informaion Crierion and Schwarz' Informaion Crierion (for more informaion, please see Secion 9.6 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis) REFERENCES Ab raham, B. and Ledoler, J. (1983). Saisical Mehods for Forecasing. New York: Wiley. Box, G.E.P. and Cox, D.R. (1964). An Analysis of Transformaions. Journal of he Royal Saisical Sociey, B, 26: Box, G.E.P., and Newbold, P. (1971). Some Commens on a Paper of Coen, Gomme, and Kendall. Journal of he Royal Saisical Sociey, A, 134: Cook, R.D. (1977). Deecion of Influenial Observaions in Linear Regression. Technomerics 11: Cryer, J.D. (1986). Time Series Analysis. Boson: Duxbury Press. Daniel, C., and Wood, F.S. (1980). Fiing Equaions o Daa. 2nd ediion. Wiley. New York: Draper, N.R., and Smih, H. (1981). Applied Regression Analysis. 2nd ediion. New York: Wiley. Granger, C.W.J. and Newbold, P. (1974). Spurious Regressions in Economerics. Journal of Economerics 2: Graybill, F.A. (1961). An Inroducion o Linear Saisical Models, Vol. 1. New York: McGraw-Hill. Mongomery, D.C. (1991). Design and Analysis of Experimens. 3rd ediion. New York: Wiley.

91 4.36 LINEAR REGRESSION ANALYSIS Neer, J., and Wasserman, W. (1974). Applied Linear Saisical Models. Homewood, IL: Richard D. Irwin, Inc. Neer, J., Wasserman, W., and Kuner, M.H. (1983). Applied Linear Regression Models. Homewood, IL: Richard D. Irwin, Inc. Pankraz, A. (1991). Forecasing wih Dynamic Regression Models. New York: Wiley. Rao, C.R. (1973). Linear Saisical Inference and Is Applicaions. 2nd ediion. New York: Wiley. Searle, S.R. (1971). Linear Models. New York: Wiley. Seber, G.A.F. (1977). Linear Regression Analysis. New York: Wiley.

92 CHAPTER 5 BOX-JENKINS ARIMA MODELING AND FORECASTING In he previous chaper, we observed he inadequacy of regression models in he presence of serial correlaion. Tha is, when a variable mainains a memory of is pas, any model of he daa mus incorporae his memory. This phenomenon is likely o occur whenever daa are colleced in a ime sequence. A se of daa generaed or obained sequenially over ime is known as a ime series. Modern ime series analyses and applicaions are usually model based. There are many differen ypes of models used for ime series analysis. One popular class of models has become known as Box-Jenkins ARIMA (auoregressive-inegraed moving average) models. These models are popular for many reasons including: (1) heir adapive abiliy o represen a wide range of processes wih a parsimonious model; (2) heir abiliy o be exended o permi modeling in he presence of exernal evens (inervenions) or muliple exogenous sochasic variables (i.e., ransfer funcion models); and (3) a well esablished procedure for modeling has been developed. Some of he exs and reference sources for hese models include Box and Jenkins (1970), Abraham and Ledoler (1983), Pankraz (1983), Vandaele (1983), Granger and Newbold (1987), Cryer (1986), Wei (1990), and references conained herein. 5.1 Box-Jenkins Modeling ARIMA models employ a combinaion of linear operaors for he represenaion of a ime series. This ype of represenaion has a long hisory, and may be raced o Yule (1921, 1927), Slusky (1937) and Wold (1938). The landmark conribuion of Box and Jenkins (1970) was o boh consolidae he models and mehodologies ha had exised and, more imporanly, provide a cohesive framework for model building. As a resul, hese models are ofen referred o as Box-Jenkins ARIMA models, or even Box-Jenkins models. Box and Jenkins (1970) proposed an ieraive procedure for modeling a ime series. This ieraive modeling approach encompasses hree phases:

93 5.2 ARIMA MODELING AND FORECASTING (1) Idenificaion, in which we examine characerisics and saisics of a ime series and aemp o relae hem o hose of specific models; (2) Esimaion, in which we esimae he parameers of he enaively idenified model(s) using he daa a hand; and (3) Diagnosic checking, in which we examine he esimaed model(s), and residuals of he fied model(s), o see if he model(s) make sense and are consonan wih our assumpions. Afer an appropriae model is deermined, we may use i for forecasing, conrol or simply o beer undersand he srucure of he ime series. We will firs consider wo examples using non-seasonal series o beer undersand he Box-Jenkins modeling procedure and ARIMA models. A seasonal example is provided in Secion 5.3. The ARIMA model can be exended o incorporae deerminisic impacs (inervenions) on a series; o creae an effecive procedure o deec ouliers and adjus for heir effecs; and o model a dependen series in he presence of exogenous explanaory variables and a serially correlaed error erm. These opics are discussed in Chapers 6-8, respecively Example: Series A of Box and Jenkins (1970) As an illusraion of he Box-Jenkins modeling procedure, we will consider a daa se of Box and Jenkins (1970). The daa, Series A, consis of 197 concenraion readings (one every wo hours) of an unconrolled chemical process. The daa are lised in Table 5.1, and are sored in he SCA workspace under he name SERIESA. Table 5.1 Series A of Box and Jenkins (1970): Concenraion readings of a chemical process (Daa read across he line)

94 ARIMA MODELING AND FORECASTING 5.3 The firs aspec of a ime series analysis, and almos all saisical analyses, is o plo he daa. Here i would be informaive if we plo he daa as i occurs in ime, ha is, a ime plo. We can use he TSPLOT or TPLOT paragraph (see Chaper 3) or he ime plo capabiliy of SCAGRAF (see The SCA Graphics Package User's Guide) for his purpose. An SCAGRAF plo of SERIESA is given in Figure 5.1. Figure 5.1 Series A of Box and Jenkins (1970): Concenraion readings of a chemical process From his plo, we noe ha he series seems o drif upwards slighly, hen downwards, and hen upwards again. Because of his drif, we may observe a differen mean level for he series, depending on where we compue i. Hence we may conclude ha he series does no have a fixed mean level appropriae for he enire daa span. This is an indicaion of a nonsaionary behavior in he ime series. In order o proceed wih he idenificaion sage of he analysis, we need o acquire a working knowledge of ARIMA models and noaion. If you are familiar wih ARIMA models and he backshif operaor, you may wish o skip he nex secion The univariae ARIMA model We wish o mach he characerisics of our series wih hose of one or more auoregressive-inegraed moving average (ARIMA) models. We have a ime series, Z, = 1, 2,..., n (here n is 197). An auoregressive-moving average (ARMA) model has he form Z φz φ Z φ Z = C+ a θa θ a θ a p p q q (5.1) where { a } is a sequence of random errors ha are independenly and idenically disribued wih a normal disribuion, N(0, σ 2 a ). If we inroduce he backshif operaor, B, where BZ = Z ; B Z = B(BZ ) = Z ; and so on, we can rewrie (5.1) as or Z φbz φ B Z φ B Z = C+ a θba θ B a θ B (5.2) 2 p p 1 2 q q a

95 5.4 ARIMA MODELING AND FORECASTING 2 p 2 q (1 φ1b φ2b φ pb )Z = C + (1 θ1b θ2b θqb )a (5.3) We can abbreviae (5.3) furher by wriing i as where φ (B)Z = C +θ (B)a (5.4) φ (B) = (1 φb φ B φ B ), and 2 p 1 2 p θ (B) = (1 θb θ B θ B ). 2 q 1 2 q This is known as an ARMA(p,q) model. The value p denoes he order of he auo-regressive operaor φ (B), and q denoes he order of he moving average operaor θ (B). The model in (5.4) can also be expressed as Z θ(b) =µ+ a, (5.5) φ (B) where µ = C/(1 φ φ φ ) is he mean of he saionary ime series. The mahemaical 1 2 p properies or requiremens of he above models are no discussed here. For a more deailed discussion of hese properies see Box and Jenkins (1970). Relaionship o a regression model The ARMA(p,q) model of a ime series is closely relaed o a regression model of he series. In Chaper 4 we noed a way o incorporae serial correlaion in a model is hrough a lagged regression; ha is a regression of a series on is own pas. We could wrie such a lagged regression model as (omiing he consan erm for noaional convenience): Z =πz π Z π Z + a, (5.6) or, afer moving all Z erms o he lef-hand side of he equaion and employing he backshif operaor, we have where π (B)Z = a, (5.7) π (B) = (1 πb π B π B ) Depending upon he naure of he series, we may have a large number of parameers o esimae here. The number of parameers can be grealy reduced if we can approximae π (B) as a quoien of polynomials, say φ (B)/ θ (B) for some choice of p and q. In his manner, we may approximae (5.7) as φ(b) Z = a θ(b) (5.8)

96 ARIMA MODELING AND FORECASTING 5.5 Muliplicaion of boh sides of (5.8) by θ (B) yields he ARMA(p,q) model as shown in (5.4). If he series is no saionary (i.e., has no fixed mean level), hen he auoregressive porion of he ARMA(p,q) model mus include a saionary inducing operaor. For a nonseasonal series, his is mos frequenly accomplished hrough a differencing operaor (or produc of differencing operaors) of he form (1-B). Tha is, insead of modeling he nonsaionary series, we model he series Z (1 B)Z = Z Z. 1 Physically his corresponds o modeling he change in he series raher han he series iself. Usually only a single differencing operaor is required. On rare occasions in he modeling of non-seasonal series, he operaor may need o be repeaed, say d imes. The model we hen consider is an auoregressive-inegraed moving average model of he form or φ = +θ. (5.9) d (B)(1 B) Z C (B)a θ(b) =µ+ φ (B) d (1 B) Z a wih µ= C/(1 φ φ φ ). The model of (5.9), and is equivalen represenaion, is 1 2 p also known as an ARIMA(p,d,q) model Model idenificaion In he model idenificaion sage, we ry o deermine appropriae orders for p, d, and q of he ARIMA(p,d,q) model. We may no be able o deermine a unique model (i.e., a unique se of values for p, d, and q), bu we may be able o resric our sudy o a limied number of models. I may also be he case ha no all he auoregressive and moving average parameers of an ARIMA(p,d,q) model are required. For example, if p=3, i may be he case ha he lag 2 parameer is zero. We can deermine significance during he esimaion and diagnosic checking sages. Deermining wheher or no o difference he daa We have already saed ha from is plo, SERIESA may no be saionary. If his is rue, we may expec o difference he series a leas one ime. We can confirm he saionariy or non-saionariy of SERIESA by compuing he auocorrelaion funcion (ACF) of he series. The auocorrelaion funcion measures he correlaion of he observaions wihin a ime series a various lags. For any posiive ineger l, he lag l auocorrelaion is he correlaion beween Z and Z 1. The auocorrelaion funcion, ACF, is a sequence of hese auocorrelaions from lag 1 hrough a specified lag order. If a series is nonsaionary, hen is

97 5.6 ARIMA MODELING AND FORECASTING ACF will be posiive and high for a number of lags; and ofen decreases slowly o zero. To compue and display he sample ACF of our series, we may ener -->ACF SERIESA. MAXLAG IS 12. TIME PERIOD ANALYZED TO 197 NAME OF THE SERIES SERIESA EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXX+XXXXXXXXXXX IXXX+XXXXXXXX IXXXX+XXXXX IXXXX+XXXX IXXXXX+XX IXXXXX+XXX IXXXXX+XXXX IXXXXXX+X IXXXXXX+X IXXXXXX IXXXXX IXXXX + We obain summary informaion of our daa and he display of he ACF for lags 1 hrough 12. We limied he oal number of lags o be compued by including he MAXLAG senence in he paragraph (he defaul is 36 lags). The ACF informaion is given in wo forms. I is lised, ogeher wih he sandard error of each esimae, and i is also ploed. A Q-value is also presened in he lis of values. We will defer discussion of his saisic unil Secion We noe ha alhough here are no exremely large values in he ACF (i.e., values near 1), all values are posiive and decrease very slowly. This behavior and he previous ime plo suppor he need o difference he series (i.e., o incorporae a d value of a leas 1). We will include he differencing operaor (1-B) in he remaining modeling of his series. Obaining iniial orders for p and q If he differenced series is saionary we can use is sample ACF and sample parial auocorrelaion funcion (PACF) o deermine orders for p and q. We have previously discussed he meaning of he ACF. The PACF is a relaive measure of he imporance of adding erms in a lagged regression of a saionary ime series. Tha is, he sample PACF can be obained by sequenially fiing

98 ARIMA MODELING AND FORECASTING 5.7 Z = C+φ 11Z 1+ a Z = C+φ Z +φ Z + a Z = C+φ Z +φ Z +φ Z + a and reaining he esimae of he las erm of each fi. Hence φ 11 is a measure of he effec of including a firs-order lagged erm in a model; φ 22 is a measure of he effec of including a second-order lagged erm in he model given he model conains a firs-order erm; φ 33 is a measure of he effec of adding a hird-order erm when firs and second order lagged erms are already presen; and so on. The esimae of φ ll ypically has a value beween -1 and 1, and can be inerpreed as he correlaion beween Z and Z l afer accouning for he effecs due o Z 1, Z 2,..., Z l+ 1. Thus he se of esimaes of φ 11, φ 22,... is referred o as he sample PACF of he series. Z As we may infer from he way ha values are compued, he sample PACF provides direc informaion on he order of auoregressive operaor (i.e., p) provided q=0. Alernaively, he ACF provides direc informaion on he order of he moving average operaor (i.e., q) if p=0. More precisely, if a series can be represened as a pure AR or MA process, we observe he following: ACF PACF MA(q) Cus off afer lag q Dies ou in an exponenial or sinusoidal fashion AR(p) Dies ou in an exponenial Cus off afer lag p or sinusoidal fashion By cu off we mean ha he sample ACF or PACF has only a few low order significan auocorrelaions. Typically we judge ha an auocorrelaion is significan if i is greaer (in absolue value) han wice of is sandard error. We can compue he sample ACF and PACF for he firs-order differenced SERIESA by using he ACF and PACF paragraphs separaely, or by simply enering -->IDEN SERIESA. DFORDER IS 1. MAXLAG IS 12. The DFORDER senence specifies he order of differencing we desire (see he noe in Secion 5.4.1). As in he ACF paragraph, he MAXLAG senence is used o resric he number of lags o compue for he sample ACF and PACF o 12 (he defaul is 36). We obain he following:

99 5.8 ARIMA MODELING AND FORECASTING 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 197 NAME OF THE SERIES SERIESA EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I XXXXXX+XXXI I XXI I XXI XI IXXXX XXI IX IX XI XXI + PARTIAL AUTOCORRELATIONS ST.E I XXXXXX+XXXI X+XXXI XXXXI XXXI X+XXXI X+XXXI I XI I IX I XXI + We see ha he ACF cus off afer he firs lag and he PACF decays exponenially. These resuls appear o indicae ha an ARMA model wih p=0 and q=1 may be appropriae. Hence, we have enaively idenified SERIESA as an ARIMA(0,1,1) model. Mixed ARIMA models We have relaively simple and effecive ools o deermine he order of differencing, d, and p (or q), if we have a pure auoregressive (or pure moving average) model, afer any

100 ARIMA MODELING AND FORECASTING 5.9 necessary differencing. If boh p and q are no zero, hen he idenificaion of he model can be more difficul if only sample ACF and PACF of a series are available for use. Box and Jenkins (1970) provide some informaion on how o deermine he orders of p and q from reading he sample ACF of a saionary series. However, his approach is usually no very effecive in pracice. Tsay and Tiao (1984) inroduced a unified approach o he idenificaion of boh he mixed saionary and nonsaionary ARMA model. They consruc and display a able of values, called he exended auocorrelaion funcion (EACF), o sugges he maximum orders of p and q for an appropriae ARMA(p,q) model. The able of values can be summarized in a condensed form by replacing hose values ha are wihin wo sandard errors of zero by an O (o indicae no differen from zero), and by an X oherwise. The order of p and q can hen be deermined by finding a posiion ( p 0, q 0 ) in he able so ha all values in he able are 0 for he (i,j) coordinaes in he riangular region where i = p0 + k, and j q0 + k, k = 0, 1, 2,.... To illusrae he EACF, we will consruc he able for he firs-order differenced SERIESA. To do his, we simply ener -->EACF SERIESA. DFORDER IS 1. 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 197 NAME OF THE SERIES SERIESA EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X O O O O O O O O O O O O (P= 1) X O O O O O X O O O O O O (P= 2) X O O O O O X O O O O O O (P= 3) X O O O O O O O O O O O O (P= 4) X X O O O O O O O O O O O (P= 5) X O O O X O O O O O O O O (P= 6) O O O X O O X O O O O O O We obain he same summary informaion as in he previous IDEN oupu, a sample EACF able wih values displayed, and a simplified EACF able. We may observe ha a

101 5.10 ARIMA MODELING AND FORECASTING riangular region of 0 values appears o emanae from he verex where p=0 and q=1. We have highlighed his region by hand. There are wo significan values in his region. We can observe from he able of EACF values, hese values barely exceed wo sandard errors. In general, he EACF resuls suppor our previous conclusion regarding he order of his model, i.e., an ARIMA(0,1,1) model. We noed above ha he EACF can be used for nonsaionary series as well. To illusrae his, we will compue he EACF for he original series, SERIESA. -->EACF SERIESA TIME PERIOD ANALYZED TO 197 NAME OF THE SERIES SERIESA EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X X X X X X X X X O O O O (P= 1) X O O O O O O O O O O O O (P= 2) X X O O O O X O O O O O O (P= 3) X O O O O O X O O O O O O (P= 4) X O O O O O O O O O O O O (P= 5) X X O O O O O O O O O O O (P= 6) X O O O X O O O O O O O O The iniial summary informaion is he same as ha for he ACF of he original series. Now he riangle of insignifican values appears o emanae from p=1, q=1. This resul is consisen wih our ARIMA(0,1,1) model as he differencing operaor (1-B) can be viewed as he AR operaor (1- φ B) wih φ =1. Hence he EACF, ACF, and PACF can be used o validae one anoher. Due o sampling flucuaions, he condensed EACF able may no always provide clear cu paerns as shown above. However, i may indicae a few possible ses of candidaes for p and q. We should no be concerned by his lack of uniqueness, since he purpose of he idenificaion sage is o merely sugges a few reasonable models for us o pursue.

102 ARIMA MODELING AND FORECASTING Model specificaion and esimaion Now ha we have enaively idenified an ARIMA(0,1,1) model for our series, we need o esimae he model. This requires wo seps. Firs, we need o specify he model using he TSMODEL paragraph. Once he model is specified, we can esimae he model using he ESTIM paragraph. We have deermined ha we will specify a model having a differencing erm and one moving average parameer. However, should we also include a consan erm in he model? Use of a consan erm here indicaes we believe here may be a rend in he series. Our ime plo did no indicae he presence of any definiive rend. We can also examine he summary saisics provided in he IDEN or EACF display of he differenced series. As par of he summary, we are provided wih an esimae of he mean of he (differenced) series, is sandard error and he associaed -value. This esimae is obained assuming no serial correlaion. We see he -value here is.0774, which does no warran he inclusion of a consan erm. Alhough we are no including a consan erm here, whenever we are in doub i is ofen wise o include a consan erm. We can hen le he daa decide wheher he consan is significan or no. Omiing a consan erm, when one is required, will affec our analysis more adversely han including a consan erm when here is no need. Model specificaion We wan o hen specify he following model: (1 B)Z = (1 θb)a We can specify his model by enering -->TSMODEL NAME IS MODELA. MODEL IS SERIESA((1-B)) = (1-THETA*B)NOISE We need o provide a model in he SCA workspace wih a name (label) so ha we can refer o i laer. Individual names are required since we can mainain more han one model in he workspace in he same SCA session. Noe ha a model name mus be disinc from any variable name. As a resul, we canno call he model SERIESA, as ha is he name of our daa. We call our model MODELA in he above model specificaion. We can use he TSMODEL paragraph laer in our SCA session o modify his model. However, if we use he MODEL senence again in he TSMODEL paragraph wih his name, he newer specificaion will compleely replace he informaion held under he model name. The model specified in he MODEL senence is a virual ranscripion of (5.10), wih one excepion. The differencing operaor (1-B) is specified o he righ of our series name, and no on he lef as in (5.10). This convenion permis he SCA Sysem o disinguish auoregressive operaors from descripive modifiers of he series.

103 5.12 ARIMA MODELING AND FORECASTING The label THETA used in he specified model is arbirary. We have chosen i here for convenience. The SCA Sysem permis us o simulaneously mainain and modify many models. Parameer names are used o disinguish and mainain curren values of parameers. Afer we esimae he above model, he esimae of θ will be mainained in he workspace under he label THETA. Since no variable named THETA exiss currenly, he SCA Sysem will now creae one and assign i he iniial value We see his in he model summary ha follows he above model specificaion. -->TSMODEL MODELA. MODEL IS SERIESA((1-B)) = (1-THETA*B)NOISE SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- MODELA VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SERIESC RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 THETA SERIESA MA 1 1 NONE.1000 The senence name and verb NAME IS have been omied above since he NAME senence is he mos frequenly used required senence (see page 2.6) of he TSMODEL paragraph. We do no always need o be elaborae in he specificaion of a model, as he SCA Sysem only requires informaion on he order of parameers o be esimaed, or differencing operaors used. Eiher of he following can be used o describe he model of (5.10): -->TSMODEL MODELA. MODEL IS SERIESA(1) = (1-THETA*B)NOISE. (5.11) -->TSMODEL MODELA. MODEL IS SERIESA(1) = (1)NOISE. (5.12) In (5.11), he differencing operaor is reduced o he order of he B operaor, ha is, 1. If we ener (5.11), he same model summary as given above will occur. In (5.12), we also reduce he moving average operaor o simply (1). This indicaes only a firs-order erm is presen in he moving average operaor. If we ener (5.12) we will obain he same summary as above, excep he parameer esimae will be held inernally since no label for he MA parameer is specified. Model esimaion To esimae he above model we may simply ener -->ESTIM MODELA. HOLD RESIDUALS(RESIDA). The HOLD senence is included so ha residuals are mainained in he workspace for he purpose of subsequen diagnosic checking. We obain

104 ARIMA MODELING AND FORECASTING 5.13 ITERATION 1, USING STANDARD ERROR = ITER. OBJ. PARAMETER ESTIMATES E E E E E E ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS RELATIVE CHANGE IN (OBJECTIVE FUNCTION)** D-04 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-02 THE RECIPROCAL CONDITION VALUE FOR THE CROSS PRODUCT MATRIX OF THE PARAMETER PARTIAL DERIVATIVES IS D+01 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- MODELA VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SERIESA RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 THETA SERIESA MA 1 1 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+02 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+00 RESIDUAL STANDARD ERROR E+00 We are provided wih a summary of how our parameers change during he nonlinear esimaion process, he reason he esimaion procedure ended, and a summary of he esimaed model. We see our esimae of THETA is.7015 wih a -value of The - value indicaes ha he esimae is clearly significan. The variance of he residuals, ha is, he variaion in he series ha is sill no accouned for afer our modeling effors, is This resuls in a sandard error of abou.319. The sandard error of our original series (see he ACF summary saisics) is.398. Consequenly, (.319/.398)2, or 64%, of he variaion of he series is sill unexplained. This is refleced in he R-square value of.360 (i.e., ). Esimaion algorihms for MA parameers The ARMA parameer esimaes obained above are maximum likelihood esimaes, i.e. esimaes ha maximize a likelihood funcion. This funcion may be reasonably approximaed by a condiional likelihood funcion as discussed in Box and Jenkins (1970). The SCA Sysem also adops an approximaion o he likelihood funcion ha incorporaes a

105 5.14 ARIMA MODELING AND FORECASTING more exac likelihood funcion as shown in Hillmer and Tiao (1979). Wih n observaions Z 1,..., Z n, boh approaches compue he likelihood funcion on he basis of he sochasic srucure of n-p observaions, p i i j j i= 1 j= 1 q Z = C + φz θ a + a, = p + 1,...n where Z,..., Z 1 p are regarded as fixed. The wo mehods differ in ha he condiional likelihood algorihm assumes a p =... = a p q 1 = 0 while he exac likelihood algorihm compues esimaes for hose values. Hence his exac approach is exac for MA parameers only. The condiional and exac algorihms do no affec he esimaes of a pure AR process. Anderson (1971) shows ha such esimaes have desirable properies; hence a more exac esimae is no required. A Gauss-Marquard nonlinear leas-squares mehod (MACC 1965) is used o perform parameer esimaion in he SCA Sysem. The objecive funcion o be minimized and displayed in he esimaion summary is he sum of squared residuals in he condiional mehod; and is he sum of squared residuals plus an adjusmen erm in he exac mehod. Deails are shown in Hillmer and Tiao (1979). The exac algorihm is compuaionally more burdensome, bu i can appreciably reduce he biases in esimaing he moving average parameers θ j s under he condiional approach, especially when some of he roos of θ (B) are near he uni circle (e.g., seasonal ARIMA models). I is usually good pracice o employ he exac algorihm whenever an MA parameer is presen (in paricular, in a seasonal model). The mos efficien way o employ he exac esimaion mehod is o firs esimae a model using he defaul condiional mehod. Then we can re-esimae he model using he exac mehod. The advanage in doing so is ha he condiional mehod will provide a good saring poin from which he exac mehod may begin. We can accomplish his easily in he SCA Sysem since each model mainains a memory of he las esimae of a parameer. We will now employ he exac mehod, saring from he curren esimae for he MA parameer. We simply ener -->ESTIM MODELA. METHOD IS EXACT. HOLD RESIDUALS(RESIDA). We obain he following (he SCA oupu is edied for presenaion purposes): SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- MODELA VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SERIESA RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 THETA SERIESA MA 1 1 NONE

106 ARIMA MODELING AND FORECASTING 5.15 TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+02 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+00 RESIDUAL STANDARD ERROR E+00 We noe ha here has been virually no change in he resuls. This is due o he fac ha he esimae for θ, 0.7, is no near he uni circle Diagnosic checks of he model The final sage of model building is o diagnosically check he model we have esimaed. In checking our model(s) we may ask: (1) Is he model saisically consonan wih our assumpions? (2) Does he model make sense? The laer is bes answered by an individual who knows he daa. Ofen when wo or more models lead o approximaely he same resuls (e.g., explanaion of variaion or forecass), he bes model may be he one ha is mos inerpreable. Diagnosic checks of model assumpions can be quanified saisically. The mos basic assumpion made in ARIMA models is ha he errors a 's are independenly and normally disribued. Such a serially independen series is also referred o as a whie noise series. If checks show his assumpion is no rue, hen our model is no adequae and needs o be modified. If he assumpion is correc, hen he residuals of our model should approximae a serially independen sample and follow a normal disribuion wih zero mean and consan variance. We can check our residuals in a number of ways. The mos comprehensive check is a ime plo of he residuals. The plo of he residuals from his fi is shown in Figure 5.2. Figure 5.2 Residuals from an ARIMA(0,1,1) fi of SERIESA No apparen paern is presen in he plo, bu wo poins (a =43 and =64) appear o sick ou from he res. These poins may be spurious observaions, or ouliers. Ouliers are

107 5.16 ARIMA MODELING AND FORECASTING discussed in more deail in Chaper 7. The variaion of he residuals appears o be he same over ime. Anoher diagnosic check of he fied model is he ACF of he residual series. If he residuals approximae whie noise, hen no auocorrelaions should be significan. We can check his by compuing he ACF of our residual series by enering -->ACF RESIDA. MAXLAG IS 12. TIME PERIOD ANALYZED TO 197 NAME OF THE SERIES RESIDA EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXX I XXXI XXXI XXXI I IXXXX IX IX I XXXI XXXI + From he summary saisics we see he mean of he residuals is no disinguishable from zero (since he -value is no significan). In addiion, all compued ACF values are wihin wo sandard errors of zero. We also are provided wih a crude global check on he residuals, a pormaneau es, he Ljung-Box Q saisic (1978). This value, provided in he ACF able in he Q row, represens a scaled sum of squares of he compued ACF values. I is scaled so ha we can use a χ2 disribuion, wih ( l -p-q) degrees of freedom, o deermine is significance. For l =12, he Q value 20.0 is marginally significan a he 5% level for a χ2 disribuion wih =10 degrees of freedom. We may also wish o check if we have overfi he series. Tha is, if some esimaes are no saisically differen from zero, we may be able o omi hem from our model. Here, we have only one parameer in he model, and i is significan, as noed above. As a final check of he model, we may also wish o es quaniaively o see if here are any spurious residuals ha may have affeced our fi; and if so, how o correc for hem. We have already spoed wo poenial ouliers in he residual plo. A more complee discussion

108 ARIMA MODELING AND FORECASTING 5.17 on ouliers, and mehods o deec and adjus for ouliers, is provided in Chaper 7. The normaliy assumpion for ARIMA models, assuming no ouliers exised, ypically is saisfied Forecasing an esimaed model Once we have deermined ha we have an adequae fi, we can forecas he series using he FORECAST paragraph. To forecas SERIESA using our esimaed model, we can ener -->FORECAST MODELA. NOFS ARE 12. NOTE: THE EXACT METHOD FOR COMPUTING RESIDUALS IS USED FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN We are provided wih 12 forecass, ogeher wih he sandard error of each forecas. The senence NOFS was included o limi he number of forecass o 12. If he senence is omied, hen 24 forecass are produced. I may appear unusual ha all forecass are he same value, ye he sandard error of he forecas increases. However, a brief examinaion of he model used provides necessary explanaions. The model we have is (approximaely) or (1 B)SERIESA = (1.7B)a SERIESA = SERIESA 1+ a.7a 1. This model saes ha he value for SERIESA a any ime period is he observed value from he period before plus a weighed amoun of he errors ha occur a boh he exising and prior period. Hence he value for =198 (he firs value beyond our daa span) would be SERIESA198 = SERIESA197 + a 198.7a197 We know he value of our las observaion, SERI. ESA 197 ; bu wha abou he a s? We can use he value of he residual series a =197 (i.e., ) as a surrogae for, bu he â 197 a 197

109 5.18 ARIMA MODELING AND FORECASTING bes guess we can make for a is is assumed mean value, 0. Therefore he forecas for SERIESA 198 is 198 SERIESA = SERIE.7aˆ 198 SA Our model also saes ha he value for =199 (he second value beyond our daa span) would be SERIESA = SERIE + a.7a. 199 SA Now none of he values on he righ-hand side of he equaion are exacly known o us. The bes choice of a value for SERIESA is he value we have jus forecased (for =198). The bes value we can use for a or a is he mean value, 0. Therefore he forecas for SERIESA 199 is he same as SERIESA 198. Similarly, he bes forecas we can provide for each successive ime period is he value made for he previous forecas. This value will always be he forecas made for SERI ESA 198. Hence all of he forecass are he same for his paricular model. This may no be he case for oher models. The increasing value for he sandard error of he forecas is direcly relaed o wha we do no know, and are forced o use, for each ime period. For =198, a 198 is unknown and hence he sandard error of he forecas is he sandard error of he noise sequence (since we use he mean level 0 for a 198 ). This value is.3174, he residual sandard error of our model. For =199, we need o accoun for a 198, a199 and he weighs assigned o hem. Hence he error increases. For subsequen periods we need o accoun for he wo error erms and heir associaed weighs (as before), as well as he error accumulaing by using he same value for he forecas. Thus, he error coninues o increase. The formal saisical derivaions for he forecass and sandard errors from any ARIMA model are discussed below. Calculaions of forecass and forecas sandard errors Forecass and he sandard errors of forecass are obained based on he values hrough he forecas origin, he fied ARIMA model, and he residuals from he fied model. Suppose observaions Z 1, Z 2,... are available up o ime and i is desired o forecas fuure observaions Z + l, l 1. Forecass calculaed in he SCA Sysem are he minimum mean squared error (MMSE) forecass so ha he forecas for Z +l is he condiion expecaion of Z +l based on all informaion o ime. I can be shown ha he MMSE forecas, Ẑ( l), can be recursively compued using where Z( ˆ l) = C+φ Z( ˆ l 1) + +φ Z( ˆ p) E(a ) θ E(a ) 1 p l + l 1 + l 1 θ E( a ) q Ẑ (j) = Z for j 0, + j + l q

110 ARIMA MODELING AND FORECASTING 5.19 E (a + j) = a +j for j 0, E (a ) 0 for j>0, + j = In pracice, neiher he parameer values nor he values of he error sequence are known. Hence we use he esimaed parameer values and he corresponding residual sequence in heir place. The residuals used in he FORECAST paragraph are hose derived using he EXACT likelihood mehod unless we direc oherwise. Assuming ha he whie noise sequence for he model has a variance e( l) = Z ˆ + l Z( l) is normally disribued wih zero mean and variance V( e(l )) = 2 σ a, he error. The ψ 's are coefficiens of he linear polynomial ψ (B), such ha φ (B) ψ (B) = θ (B). In pracice, he values for he ψ 's are deermined from he esimaed parameer values, and he residual sandard error is use for σ a. l 1 i= 0 ψ σ 2 2 i a 5.2 A Second Example: Sales Daa As a second illusraion of ARIMA model building, we consider a series of sales daa. The daa, par of Series M of Box and Jenkins (1970), consis of 150 observaions and are lised in Table 5.2. These daa are modeled ogeher wih a series of leading indicaors by Box and Jenkins (1970, Secion ). We will also presen his model in Chaper 8. However, here we will model he sales daa alone. The daa are sored in he SCA workspace under he label SALES. A ime series plo of SALES (produced by SCAGRAF) is shown in Figure 5.3. Table 5.2 Sales daa of Series M of Box and Jenkins (1970) (Daa read across he line)

111 5.20 ARIMA MODELING AND FORECASTING Figure 5.3 Sales daa of SERIES M of Box and Jenkins (1970) In he previous example, here was a quesion of wheher he series was saionary or no. The plo here clearly depics he nonsaionariy of SALES. Alhough differencing is warraned, we will firs compue he ACF of he original series o confirm i. -->ACF SALES. MAXLAG IS 12. TIME PERIOD ANALYZED TO 150 NAME OF THE SERIES SALES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXX+XXXXXXXXXXXXXXXXXXXXX IXXXXXX+XXXXXXXXXXXXXXXXX IXXXXXXXX+XXXXXXXXXXXXXXX IXXXXXXXXX+XXXXXXXXXXXXX IXXXXXXXXXXX+XXXXXXXXXX IXXXXXXXXXXXX+XXXXXXXXX IXXXXXXXXXXXXX+XXXXXXX IXXXXXXXXXXXXX+XXXXXXX IXXXXXXXXXXXXXX+XXXXX IXXXXXXXXXXXXXXX+XXX IXXXXXXXXXXXXXXX+XXX IXXXXXXXXXXXXXXXX+X The ACF of SALES has large values and decays very slowly. This behavior is ypical of a nonsaionary series and indicaes ha we should difference he series. We now compue he sample ACF and PACF of (1-B)SALES by enering -->IDEN SALES. DFORDER IS 1. MAXLAG IS DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 150

112 ARIMA MODELING AND FORECASTING 5.21 NAME OF THE SERIES SALES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXX+XXXX IXXX+XXX IXXXX+X IXXXX+X IXXXX IXXX IXX IXXX I I IXXX I + PARTIAL AUTOCORRELATIONS ST.E I IXXX+XXXX IXXX+X IXXX IXXX I I XI IXX XXXI XI IXXX XXI + Boh he ACF and he PACF appear o die ou. This join paern is ypical of a mixed ARMA model. In paricular, he paern above is consisen wih ha of an ARMA model wih p = 1 and q = 1. However, o beer idenify enaive orders for p and q, we will employ he sample EACF by enering -->EACF SALES. DFORDER IS 1. 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 150 NAME OF THE SERIES SALES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES

113 5.22 ARIMA MODELING AND FORECASTING MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X X X X O O O O O O O O O (P= 1) X O O O O O O O O O O O O (P= 2) X O O O O O O O O O O O O (P= 3) X X O O O O O O O O O O O (P= 4) O O X X O O O O O O O O O (P= 5) X O O O O O O O O O O O O (P= 6) X X O O O O O O O O O O O We are visually drawn o wo possible riangular regions ha define p and q. One emanaes from he verex where p=1 and q=1 (highlighed by hand), and anoher emanaes from he verex where p=0 and q=4. The laer choice for p and q is less parsimonious han he former and is no suppored by he sample ACF and PACF. As a resul, we will use an ARIMA(1,1,1) model for SALES. We will also include a consan erm in he model as he - value of he mean for he differenced series is well over 3 (specifically, 3.56). We can specify his model as follows -->TSMODEL SALESM. MODEL IS (1 - PHI*B)SALES(1) = CONST + (1 - TH*B)NOISE SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE TH SALES MA 1 1 NONE PHI SALES AR 1 1 NONE.1000 We will now use he condiional likelihood algorihm o esimae his model. The SCA oupu has been edied for presenaion purposes. -->ESTIM SALESM

114 ARIMA MODELING AND FORECASTING 5.23 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE TH SALES MA 1 1 NONE PHI SALES AR 1 1 NONE TOTAL SUM OF SQUARES E+05 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+03 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+01 RESIDUAL STANDARD ERROR E+01 Modifying a previously specified model We see ha he esimaes of he AR and of he MA parameers are boh significanly differen from zero (since heir -values are large). However, a -value of 1.28 indicaes he esimae of he consan is no saisically differen from zero a he 5% level. As a resul, we would like o re-esimae he above model, bu wihou he consan erm. We can delee he consan erm from an ARIMA model in wo ways. The mos direc mehod is o re-specify he model enirely. We need o do his whenever we wish o add or delee AR or MA parameers in he model. By using he same names for hose parameers ha are reained in he model, we will begin esimaion using he curren esimaes for he parameers. We can also delee he consan erm from a model by including he senence DELETE CONSTANT in he TSMODEL paragraph. In his example we may ener -->TSMODEL SALESM. DELETE CONSTANT. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH SALES MA 1 1 NONE PH SALES AR 1 1 NONE

115 5.24 ARIMA MODELING AND FORECASTING To add a consan erm o a model, we mus compleely re-specify he model using he TSMODEL paragraph. Consraining ARMA parameers We can use he TSMODEL o specify any consrains we wish o place on he esimaion of parameers. If we include he FIXED-PARAMETER senence in he TSMODEL paragraph, we can specify he names of parameers ha we wish o remain a heir currenly specified values during esimaion. For example, in his example we could fix he AR parameer o.8344 in subsequen esimaions by including he senence FIXED-PARAMETER IS PHI. in he TSMODEL paragraph. A parameer can be fixed o any value in his manner. This may require he use of an analyic saemen (see Appendix A) o define a value and he use of he logical senence UPDATE wihin he TSMODEL paragraph o clear a model's memory of he parameer value and rese i o anoher. For example, if we wished o mainain he value of PHI as.80 during remaining esimaions, we could sequenially ener -->PHI = >TSMODEL SALESM. FIXED-PARAMETER IS PHI. UPDATE. Noe ha if he logical senence UPDATE is no specified, he value for PHI will remain a is previously esimaed value, which was This is rue if we ry o modify any parameers in he model. In addiion o holding any parameers a fixed values, we can consrain one or more parameers o be equal o one anoher during esimaion. The CONSTRAINT senence is used for his purpose. For example, if we wish o re-esimae he above model wih he AR parameer equal o he MA parameer, we can ener -->TSMODEL SALESM. CONSTRAINT IS (PHI, TH). All parameers whose names are specified wihin he same parenheses are held equal during esimaion. More han one se of consrains can be specified, wih commas used o separae ses of parenheses, bu a parameer label can be only specified once. In addiion, if we use he same label o represen wo or more parameers of he model, hese parameers will be auomaically held equal o one anoher during model esimaion. Once a consrain is placed on a parameer, eiher fixed a a paricular value or held equal o one or more parameers, he consrain remains in place during all subsequen esimaions. A consrain can only be removed by re-specifying he model using he MODEL senence of he TSMODEL paragraph. We will now re-esimae he model for SALES wihou a consan erm. The exac likelihood algorihm is used, and residuals are held in he SCA workspace under he label RES afer esimaion. Again, he SCA oupu is edied for presenaion purposes.

116 ARIMA MODELING AND FORECASTING >ESTIM SALESM. METHOD IS EXACT. HOLD RESIDUALS(RES) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH SALES MA 1 1 NONE PHI SALES AR 1 1 NONE TOTAL SUM OF SQUARES E+05 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+03 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+01 RESIDUAL STANDARD ERROR E+01 The parameer esimaes change only slighly. The sandard error of he residuals is approximaely We can compare his value wih he sandard error of our original series, (see he summary saisics for he ACF of SALES). Hence, he resulan R value is 2 almos 100%. The high R value is misleading since he variaion of he modeled series is compared o ha of he original series. Since our series is nonsaionary, variaion is reduced mainly by differencing. We can observe ha he sandard error of he differenced series is abou 1.44 (see he summary saisics for eiher he IDEN or EACF paragraph for he 2 differenced series). Hence he R aribuable o differencing is abou (1.44 / 21.41) =.995. The R for he differenced series is approximaely (1.34 /1.44) =.13. In ARIMA modeling, R is meaningful only if he series is saionary. We now need o check he fied model. The ime series plo of he residuals (no shown here) reveals no apparen paerns or aberraions. We can obain he sample ACF of he residual series by enering -->ACF RES. MAXLAG IS 12. TIME PERIOD ANALYZED TO 150 NAME OF THE SERIES RES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q

117 5.26 ARIMA MODELING AND FORECASTING I XI I XI IXX XI I XXI IXX XXXI XXI IXXX XI + The ACF appears o be clean. We can hen forecas from he fied model by enering -->FORECAST SALESM. NOFS ARE 12. NOTE: THE EXACT METHOD FOR COMPUTING RESIDUALS IS USED FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN Unlike he forecass for SERIESA, he forecass obained here are no all he same. The forecass have a gradual upward rend. This is consisen wih he behavior of SALES as shown in Figure 5.3 (excep for he period around 84 hrough 96 where he sales increased grealy). 5.3 Modeling Seasonal Time Series In he previous secions, we found we could adequaely model a nonseasonal ime series hrough he use of ARIMA models. However, we ofen encouner siuaions in which a ime series exhibis some periodic or seasonal paern. For example, daa recorded monhly may exhibi similar behavior from year o year; ha is, a seasonaliy of period 12. Daa recorded quarerly may have 4 as is seasonaliy, and daa recorded hourly may have 24 as is periodiciy. In such siuaions, seasonal ARIMA models need o be employed o accoun for any seasonal paern presen in he series.

118 ARIMA MODELING AND FORECASTING 5.27 To illusrae he modeling of a seasonal ime series, we will consider Series G of Box and Jenkins (1970). The daa represen he oals of inernaional airline passengers (in housands) for he period January 1949 hrough December 1960, inclusive. The daa are lised in Table 5.3, and are sored in he SCA workspace under he label SERIESG. Table 5.3 Series G of Box and Jenkins (1970): Monhly oals (in housands) of inernaional airline passengers, January December 1960 Year Jan Feb Mar Apr May Jun Jul Aug Sep Oc Nov Dec Figure 5.4 Series G of Box and Jenkins (1970) Model idenificaion A ime series plo of SERIESG (using SCAGRAF) is shown in Figure 5.4. We observe boh a disinc seasonaliy in he daa and he presence of a rend. As a resul of he rend, we are cerain ha he series does no have a fixed mean level. In addiion, he variabiliy of he daa seems o increase over ime. In order o sabilize his variabiliy, a ransformaion of he daa seems warraned. The logarihmic ransformaion is useful when he variabiliy appears o be proporional o he mean. We can use an analyic saemen (see Appendix A) o ransform he daa. We will sore he ransformed daa under he name LNAIRPAS. -->LNAIRPAS = LN(SERIESG) A ime series plo of LNAIRPAS is shown in Figure 5.5. The series LNAIRPAS sill exhibis a rend and seasonaliy, bu we seem o have sabilized he variabiliy over he lengh of he series.

119 5.28 ARIMA MODELING AND FORECASTING Figure 5.5 LNAIRPAS, he naural logarihm of SERIESG We expec ha LNAIRPAS is no saionary. This is confirmed when we compue and display he sample ACF of he series. -->ACF LNAIRPAS TIME PERIOD ANALYZED TO 144 NAME OF THE SERIES LNAIRPAS EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS I IXXX+XXXXXXXXXXXXXXXXXXXX IXXXXXX+XXXXXXXXXXXXXXX IXXXXXXXX+XXXXXXXXXXXX IXXXXXXXXX+XXXXXXXXXX IXXXXXXXXXX+XXXXXXXX IXXXXXXXXXXX+XXXXXXX IXXXXXXXXXXXX+XXXXX IXXXXXXXXXXXX+XXXXX IXXXXXXXXXXXXX+XXXX IXXXXXXXXXXXXXX+XXXX IXXXXXXXXXXXXXX+XXXX IXXXXXXXXXXXXXXX+XXX IXXXXXXXXXXXXXXX+XX IXXXXXXXXXXXXXXXXX IXXXXXXXXXXXXXXX IXXXXXXXXXXXXXX IXXXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXX IXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXX IXXXXXXXXXXX IXXXXXXXXXX IXXXXXXXXX IXXXXXXXX IXXXXXXXX +

120 ARIMA MODELING AND FORECASTING IXXXXXXX IXXXXXXX IXXXXXXX IXXXXXXXX IXXXXXXXX IXXXXXXXX + The ACF has a slow die-ou paern ha is indicaive of a nonsaionary series. Differencing is required. However, because he daa is seasonal, we may wonder if he proper differencing operaor is (1-B) or (1-B12). We can examine he sample ACF for using each of hese differencing operaors. The oupu is edied for presenaion purposes.

121 5.30 ARIMA MODELING AND FORECASTING -->ACF LNAIRPAS. DFORDER IS 1. 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 144 NAME OF THE SERIES LNAIRPAS EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS I IXXX+X XXXI XXXXI XXXX+XXXI XXI IX XXXI XXX+XXXXI XXXI XXXI IXXXXX IXXXX+XXXXXXXXXXXXXXXX IXXXXX XXXI XXXI XXXXXXXI XI I XXXI XXXXXXXXI XXXI XXI IXXXXX IXXXXXXX+XXXXXXXXXX IXXXXX XXXI XXXI XXXXXI XXI I XXXI XXXXXXXI XXXI XI IXXXX IXXXXXXXXX+XXXXXX -->ACF LNAIRPAS. DFORDER IS DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 144 NAME OF THE SERIES LNAIRPAS EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS I IXXX+XXXXXXXXXXXXXX IXXXXX+XXXXXXXXXX IXXXXXX+XXXXX IXXXXXXX+XXX IXXXXXXX+XX IXXXXXXXX IXXXXXX IXXXXX IXXXX I XXXI XXXXXXI XXXXI XXXXI XXI XXXXI XXI XXXI XXXXI XXXXI XXXI XXI I XI XXXI XXI XXXI XXXXI XXXXXI XXXXXI XXXXXI XXXXI XXXXXXI XXXXXXI XXXXXXXI XXXXXXI + Clearly he use of (1-B) alone does no remove he effecs of nonsaionariy from he daa, since he ACF a lags 12, 24, 36 (and so on) exhibi he same slow die-ou behavior as he ACF of he original series. Seasonal differencing is warraned. However, he seasonally differenced series alone is no saionary as indicaed by he slow decay of is ACF.

122 ARIMA MODELING AND FORECASTING 5.31 In order o achieve saionariy here, we need o employ boh a nonseasonal and a seasonal differencing operaor in he muliplicaive form (1-B)(1 B 12 ). We can specify hese operaors and obain he sample ACF of he differenced series by enering -->ACF LNAIRPAS. DFORDERS ARE 1, DIFFERENCE ORDERS (1-B ) (1-B ) TIME PERIOD ANALYZED TO 144 NAME OF THE SERIES LNAIRPAS EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS I XXXXX+XXXI IXXX XXXXXI IX IX IX XI I IXXXX XXI IXX XXXXX+XXXXI IXXXX XI IXXXX XXXI IXX I I XXXI IX XXI IXXXXXX I XXXI IX XI IX I XI XI IXXXXX XXXI IXX XXXXI I + The sample ACF has significan negaive values a lags 1 and 12. Many exs provide guides for he paern of he ACF for many ypes of seasonal models. These include Appendix 9.1 of Box and Jenkins (1970), Secion 6.2 of Abraham and Ledoler (1983),

123 5.32 ARIMA MODELING AND FORECASTING Secion 4.4 of Vandaele (1983), and Secion 10.2 of Cryer (1986). The above paern is indicaive of a muliplicaive MA(1) and MA(12) model, ha is, (1 - θ1b)(1 - θ12b12). Frequenly he sample ACF of an appropriaely differenced series provides raher definiive informaion for he idenificaion of a seasonal model. In some siuaions, however, he sample ACF's may no provide a clear-cu model for he ime series. Liu (1989) provided an idenificaion mehod employing a filering echnique for such siuaions. The EACF is no effecive for he idenificaion of seasonal ime series. Muliplicaive seasonal models Muliplicaive seasonal ARIMA models are ofen described as (p,d,q)x(p,d,q) s models, where s is he seasonaliy, and P, D, and Q are he orders of he seasonal componens. This muliplicaive seasonal model can be expressed as: p s p s d (1 φ1b φpb )(1 Φ1B ΦpB )(1 B) Z (5.13) q s Qs = C + (1 θb θ B )(1 θb θ B )a 1 p 1 q The values of he differencing orders, d and D, of his model are usually eiher 0 or 1. The values of P and Q are also usually 0 or 1. We have enaively idenified a muliplicaive (0,1,1)x(0,1,1) 12 model for he logged airline daa. This paricular model (1 B)(1 B )Z = (1 θb)(1 θ B )a (5.14) has become known as he airline model and has been shown o be very useful in modeling many seasonal ime series. Unforunaely his model is ofen mis-used. One common misake in ARIMA modeling is o over-difference he original series, which auomaically leads o an airline model Model specificaion and esimaion The -value of he mean (agains zero) for he muliplicaively differenced series is no significan. Thus, we have enaively idenified he model of he form in (5.14) where Z is he naural log of SERIESG (i.e., LNAIRPAS). We can specify his model by enering -->TSMODEL NAME IS AIRLINE. MODEL IS --> LNAIRPAS(1,12) = (1 - THETA1*B)(1 - THETA12*B**12)NOISE. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE

124 ARIMA MODELING AND FORECASTING THETA1 LNAIRPAS MA 1 1 NONE THETA12 LNAIRPAS MA 2 12 NONE Noe we have specified our differencing operaors (1-B)(1- B ) as (1,12). This is consisen wih he specificaion of DFORDERS in he ACF, PACF, IDEN and EACF paragraphs. We could also specify hese operaors as ((1-B)(1-B**12)) if we desire. Since he model AIRLINE consiss enirely of MA parameers, i is pruden o use he exac likelihood algorihm for final esimaion. We will firs esimae our airline model using he condiional mehod by simply enering -->ESTIM AIRLINE SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 THETA1 LNAIRPAS MA 1 1 NONE THETA12 LNAIRPAS MA 2 12 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-02 RESIDUAL STANDARD ERROR E-01 We may observe ha he MA parameer esimaes,.3776 and.5728, do no indicae ha eiher of he MA facors have roos close o he uni circle. However, we will sill employ he exac esimaion mehod and reain he residuals (in he variable RESID) afer he fi by enering -->ESTIM AIRLINE. METHOD IS EXACT. HOLD RESIDUALS(RESID). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 THETA1 LNAIRPAS MA 1 1 NONE THETA12 LNAIRPAS MA 2 12 NONE

125 5.34 ARIMA MODELING AND FORECASTING TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-02 RESIDUAL STANDARD ERROR E-01 The fied model is, approximaely, (1 B)(1 B )LNAIRPAS = (1.40B)(1.56B )a (5.15) The parameer esimaes are significan based on heir -values; and he variance of he residual series (i.e., he variaion sill unexplained) is The variance afer (1-B)( B ) differencing is (.0457) (see he ACF summary). Hence we have reduced variaion by abou 36% afer differencing Diagnosic checks of he airline model A ime plo of he residual series (no shown here) does no reveal any gross abnormaliies alhough some unusual poins appear o be presen. These ouliers are discussed in more deail in Chaper 7. We can compue and display 24 lags of he sample ACF of he residuals. We see he sample ACF of he residuals is clean. The oupu is edied for presenaion purposes. -->ACF RESID. MAXLAG IS 24. TIME PERIOD ANALYZED TO 144 NAME OF THE SERIES RESID EFFECTIVE NUMBER OF OBSERVATIONS AUTOCORRELATIONS I I IX XXXI XXXI IX IXX XXI XI IXXX XXI I XI I IX IX XXXXI IX I XXXI XXI +

126 ARIMA MODELING AND FORECASTING XI XI IXXXXX I Oher Time Series Topics This secion provides a brief overview of opics relaed o ime series analysis or he execuion of SCA paragraphs relaed o ARIMA modeling. Much of he maerial presened in his secion can be considered advanced or of occasional use. As a consequence, his secion can be skipped, and seleced opics can be referenced as necessary. The maerial presened, and he secion conaining i are: Secion Topic Use of differencing operaors Missing daa Simulaion of an ARIMA model Model idenificaion using he smalles canonical correlaion (SCAN) able Inverse auocorrelaion funcion Noaional shorhands Ploing forecass wih confidence limis Pi and Psi weighs of a specified model Use of differencing operaors Someimes we may find i necessary o use differencing operaors o achieve saionariy. Differencing wihin he usual ARIMA(p,d,q) model is in he form d (1 B) (5.16) In fac, a wider array of saionary inducing operaors is available. The SCA Sysem exends he represenaion of (5.16) o ha of d1 d2 d3 dk (1 B )(1 B )(1 B ) (1 B ) (5.17) where d1, d2,..., dk are referred o as differencing orders. The represenaion in (5.17) gives us greaer flexibiliy in he ype of differencing we wan o use. However, his flexibiliy can lead o some quirks in he specificaion of d when his value is greaer han 1. For example, suppose we wish o analyze a double differenced series. Here we wan o 2 analyze (1 B) of a series. Suppose we specify DFORDER IS 2 in he ACF, PACF, IDEN, IACF (see Secion 5.4.4) or EACF paragraph; or we include he differencing operaor (2) wihin he MODEL senence of he TSMODEL paragraph. The

127 5.36 ARIMA MODELING AND FORECASTING SCA Sysem will inerpre i as single differencing of order 2 and will base is compuaions 2 using he differencing operaor (1 B ). In order o specify he operaor DFORDER IS 1, 1 2 (1 B), we need o specify in an idenificaion paragraph, or he differencing operaor (1,1) in he MODEL senence of he TSMODEL paragraph. Alhough his may seem a bi complicaed for he specificaion of d in a (p,d,q) model, a (p,d,q) model does no allow for he differencing operaor 4 1 (1 B)(1 B )(1 B 2 ) while i can be handled direcly in he SCA Sysem. The orders of he above differencing operaors should be specified as 1, 4, 12. We can also difference a ime series ouside he SCA paragraphs presened in his chaper. The DIFFERENCE paragraph (see Appendix C) can be used o generae a new ime series hrough differencing. However, use of his paragraph is no advisable in ypical ime series analyses using he SCA Sysem Missing daa The SCA Sysem provides us wih a degree of flexibiliy in he modeling of a ime series ha conains coded missing daa. Missing daa affec he usual compuaions employed for model idenificaion and esimaion. As a resul, we are presened wih hree possible opions when we wish o model a series conaining missing daa. We can (1) Employ SCA idenificaion and esimaion paragraphs as usual and accep he defaul condiions aken by he paragraphs; (2) Replace all missing daa by some appropriae values before modeling he series; or (3) Use hose SCA paragraphs ha make necessary compuaional adjusmens for missing daa. Ordinarily, if missing daa are presen in a ime series and we do no recode he daa, hen he ACF, PACF, IDEN, EACF and ESTIM paragraphs will proceed as follows. The firs occurrence of non-missing daa and he nex occurrence of a missing daa poin are noed inernally. Only daa wihin his span are used in he calculaion of he paragraphs. If we wan o use he enire span of daa, hen we may replace all missing daa by some appropriae values. We can do his using an SCA daa ediing paragraph (see Appendix B) or an analyic saemen (see Appendix A). Appropriae values for missing daa migh consis of

128 ARIMA MODELING AND FORECASTING 5.37 (1) he average of all observaions in a saionary series, (2) he average of wo adjacen observaions, (3) he average of all observaions wih he same periodiciy for nonsaionary series ha exhibis a disinc seasonal componen bu no rend, or (4) he average of wo adjacen observaions wih he same periodiciy for a nonsaionary series ha exhibis a disinc seasonal componen and rend. The PATCH paragraph can be used o accomplish he above (described in Appendix C). The ACF and PACF paragraphs will also make necessary compuaional adjusmens for missing daa if we include he logical senence MISSING in he ACF or PACF command. For example, if he series SALES conains missing daa, we can compue he appropriae ACF by enering a command such as -->ACF SALES. MISSING. MAXLAG IS 15. A precise mehod o esimae he values of missing daa in a ime series is employed by he OESTIM paragraph. This paragraph and he mehod involved are discussed in more deail in Chaper 7. If we do no use he OESTIM paragraph, hen we need o recode or pach he missing daa before esimaing he parameers of a ime series model Simulaion of an ARIMA model The simulaion of daa is ofen beneficial for boh daa analyses and scienific research. Simulaed daa can provide us wih a beer undersanding of various saisical mehods, especially when mehods are eiher ad hoc or difficul o undersand analyically. In addiion, simulaed daa provide a means o ascerain he sensiiviy of an analysis, especially in he sudy of deparures from disribuional assumpions. The SIMULATE paragraph can be used o generae daa according o a ime series model. The paragraph can also be used o generae daa according o a disribuion. More informaion on he laer can be found in Chaper 12 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. We can employ he SIMULATE paragraph and he TSMODEL paragraph o simulae daa ha follows a univariae ime series model. In his secion we discuss he simulaion of an ARIMA model. The simulaion of ransfer funcion models is discussed in Chaper 8. The TSMODEL paragraph is used o specify he ime series model he daa should follow, and he SIMULATE paragraph generaes boh he noise series of he model as well as he series iself. To illusrae his, we will simulae he following AR(1) model (1.75B)X = a,

129 5.38 ARIMA MODELING AND FORECASTING 2 where σ a = 2.5. We will sore he daa in he variable XDATA. Firs, we will specify he AR(1) model using he TSMODEL paragraph. We will give he model he name XSIM and use XDATA as a dummy name wihin he MODEL senence. We also include he logical senence SIMULATION o indicae ha his model may be used for simulaion purposes. -->TSMODEL NAME IS XSIM. MODEL IS (1 -.75*B)XDATA = NOISE. --> SIMULATION. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- XSIM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED XDATA RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST 1 0 NONE XDATA AR 1 1 NONE.7500 We now will use he SIMULATE paragraph o specify he model being used for simulaion, he number of values o simulae, and he noise process. The daa are sored in he variable XDATA. -->SIMULATE MODEL IS XSIM. NOBS ARE 200. NOISE IS N(0.0, 2.5). THE UNIVARIATE TIME SERIES XDATA IS SIMULATED USING MODEL XSIM The senence NOISE IS N(0.0, 2.5) specifies he noise sequence should have a normal disribuion wih mean 0.0 and variance 2.5. We can now check he daa simulaed. The 2 mean and variance of an AR(1) process wih φ =.75, C=5.0 and σ a = 2.5 are as follows µ x = C /(1 φ ) = 5/(1.75) = 20.0 σ =σ /(1 φ ) = 2.5/(1 (.75) ) x a σx 2.39 In addiion, he ACF of he daa should be (.75) l, l = 1, 2,... ; and he PACF of he daa should be.75 for l = 1; and be 0 for l = 2, 3,.... We can compue and display hese saisics using he IDEN paragraph (no shown here). We find he sample saisics o be in reasonable agreemen wih he heoreic values. We can also esimae an AR(1) model. The resuls are shown below.

130 ARIMA MODELING AND FORECASTING 5.39 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- XMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED XDATA RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE PHI XDATA AR 1 1 NONE TOTAL SUM OF SQUARES E+03 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+03 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+01 RESIDUAL STANDARD ERROR E+01 2 The esimaed values of C, φ, and σ x are 6.55, 0.69 and 2.54, respecively. These are in reasonable accord wih he rue value. Seed values Simulaed daa are derived from a sequence of pseudo random numbers. These pseudo random numbers are creaed by a random number generaor. The generaor requires an iniial seed value from which o generae is firs value. The random number generaor creaes boh a random number and a new seed for he nex value. If no iniial seed is specified in he SIMULATE paragraph, he defaul value of is used as he seed. Unless we provide a seed value, he same sequence of pseudo random numbers will be used in every model simulaion. The SEED senence may be included in he SIMULATE paragraph o eiher specify a specific iniial seed value or he name of a variable ha sores he seed value. For example, he previous SIMULATE command could have been -->SIMULATE MODEL IS XSIM. NOBS ARE 200. NOISE IS N(0.0, 2.5). SEED IS GSEED. If he variable GSEED is undefined, he defaul value is used in he simulaion of he normal daa. Afer simulaion, he value las creaed as a seed value is sored in GSEED. This seed can be used for subsequen simulaions. I is worh resaing ha i is imporan o use he SEED senence when generaing more han one daa se. If he SEED senence is no employed, hen he same iniial seed value (i.e., ) will be used for each daa se. If we employ he SEED senence, in he manner used above, hen a new iniial seed will be used for each new daa se.

131 5.40 ARIMA MODELING AND FORECASTING Omiing daa from he beginning of a simulaed sequence When simulaing a ime series, simulaed daa are ofen used in he calculaion of subsequen simulaed values. In such cases, he recursive relaionship being used may be more valid laer in he simulaed sequence. Thus we may wish o creae more daa han he number we acually desire and remove he excess from he beginning of he sequence. This is an unobrusive rule ha can be applied in he simulaion of daa from any disribuion or model. The OMIT senence is used o delee a specified number of simulaed values from he beginning of he sequence. Coninuing wih he curren example, if we wish o simulae a oal of 200 observaions while omiing he firs 50 values creaed, we may ener -->SIMULATE MODEL IS XSIM. NOBS IS 250. NOISE IS N(0.0, 2.5). --> SEED IS GSEED. OMIT 50. Noe ha 250 values are simulaed, as specified in he NOBS senence. However, only he las = 200 are acually sored in XDATA. Use of a variable name We did no use a variable name in he above SIMULATE paragraph as we had embedded he name in he MODEL senence of he TSMODEL paragraph. If we use a variable name in he SIMULATE paragraph, hen he simulaed daa will be sored under he name specified. For example, if we had specified -->SIMULATE YDATA. MODEL IS XSIM. NOBS ARE 250. OMIT > NOISE IS N(0.0, 2.5). hen he simulaed daa would be sored in he variable YDATA. The variable XDATA (used in he model XSIM) remains unchanged, or undefined if i has no been creaed previously Model idenificaion using he smalles canonical correlaion (SCAN) able In Secion we discussed he exended auocorrelaion funcion (EACF) and is use in he deerminaion of he maximum orders of an ARMA(p,q) model. Tsay and Tiao (1985) also provide anoher approach for deermining he orders of a mixed ARMA(p,q) model. Like he EACF mehod, he approach can be used for boh saionary and nonsaionary series. The approach proposed by Tsay and Tiao (1985) uilizes canonical correlaion and he smalles eigenvalue for a compued marix. A able of saisics is derived. Each saisic is a funcion of he smalles eigenvalue of a marix derived from he auocovariance of a series and he sample variance of he auocorrelaion of a ransformaion of he series. The wo-way able ha summarizes he resuls is called he smalles canonical correlaion (SCAN) able. We employ he able o deermine possible values for p and q by searching for a corner of insignifican values of hese saisics. Tha is, we ry o deermine a value of p and q so

132 ARIMA MODELING AND FORECASTING 5.41 ha he compued saisic is insignifican for i p and j q. As in he case of he EACF, a simplified able is produced in which he symbol O is displayed o indicae a posiion where he saisic is insignifican, and he symbol X is displayed oherwise. To illusrae he use of he SCAN able, we will consruc he able for SERIESA used previously in Secion 5.1. A ha ime we found an ARIMA(0,1,1) model o be appropriae. This means ha an ARMA(1,1) model would be idenified for SERIESA, and an ARMA(0,1) model would be idenified for he series (1-B)SERIESA. To obain he SCAN able for SERIESA, we can simply ener -->SCAN SERIESA We obain he following: TIME PERIOD ANALYZED TO 197 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) THE SCAN TABLE (NORMALIZED BY 1% CHI-SQUARE CRITICAL VALUES): Q: SIMPLIFIED SCAN TABLE (1% LEVEL): Q: : X X X X X X X 1: X O O O O O O 2: O O O O O O O 3: O O O O O O O 4: O O O O O O O 5: O O O O O O O 6: X O O O O O O A corner of zeros (highlighed by hand) is seen in he simplified scan able beginning a i=1 (p) and j=1 (q). Thus he model ARMA(1,1) is idenified. We can obain he SCAN able for he firs-order differenced SERIESA (i.e., (1- B)SERIESA) by enering -->SCAN SERIESA. DFORDER IS 1. 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 197 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE)

133 5.42 ARIMA MODELING AND FORECASTING THE SCAN TABLE (NORMALIZED BY 1% CHI-SQUARE CRITICAL VALUES): Q: SIMPLIFIED SCAN TABLE (1% LEVEL): Q: : X O O O O O O 1: X O O O O O O 2: O O O O O O O 3: O O O O O O O 4: X O O O O O O 5: X O O O O O O 6: O O O O O O O Here he corner of insignifican saisics begins a i=0 (p) and j=1 (q). Hence he ARIMA(0,1,1) model idenified previously is confirmed using he SCAN able. The smalles canonical correlaion approach for a single series can also be exended o a vecor (muliple) ime series model. Deails regarding his approach may be found in Tiao and Tsay (1985) Inverse auocorrelaion funcion Throughou his chaper we have employed he ACF, PACF or EACF o help idenify one or more enaive models for a ime series. Anoher ool used for enaive model idenificaion is he sample inverse auocorrelaion funcion (IACF). More complee informaion on he usage of he inverse auocorrelaion funcion can be found in Cleveland (1972) and Chafield (1979). The inverse auocorrelaion funcion is someimes used as an alernaive o he PACF for model idenificaion. The IACF of an ARMA model is he same as ACF for he model when he AR and MA operaors are reversed. As a resul he IACF has properies similar o he PACF, and is use (in erms of cu off and die ou paerns) is he same as he PACF Noaional shorhands Wihin his documen, ime series models are usually specified using a longhand noaion in he MODEL senence of he TSMODEL paragraph. Tha is, he ARIMA model under consideraion is virually ranscribed in he MODEL senence wih labels replacing Greek symbols. Such a specificaion is useful when simple models are specified or for he convenience in reviewing he compuer oupu associaed wih various models or series.

134 ARIMA MODELING AND FORECASTING 5.43 When he SCA Sysem is used more frequenly, or when ime series models become more complex, i is useful o have a shorhand noaion available for model specificaion. To illusrae such noaion, consider he ARIMA model (1 φ1b φ2b )(1 φ3b )(1 B )Z = (1 θ1b)(1 θ2b )a. (5.18) If he series involved in (5.18) is sored in he SCA workspace under he label ZDATA, hen a longhand ranscripion of (5.18) could be (1 PHI1* B PHI2 * B**2)(1 PHI3* B**12)ZDATA((1 B**12)) =(1-THETA1*B)(1-THETA2*B**12)NOISE (5.19) The basic informaion used by he SCA Sysem from (5.19) are he orders of he backshif operaors in each auoregressive, differencing, or moving average operaor and he labels associaed wih all parameers. In fac, he labels are no essenial unless we wish o mainain parameer esimaes wihin variables or if consrains are used on parameers. As a resul, he expression (1, 2)(12)ZDATA(12) = (1)(12)NOISE (5.20) is equivalen o (5.19) provided all parameers are o be esimaed wihou any consrain. Clearly, (5.20) is a erser way o specify he same basic model bu he clariy of (5.19) is sacrificed. I may be a concern ha if he shorhand noaion of (5.20) is used, hen specific iniial parameer esimaes could no be specified nor subsequenly modified. However, his is no he case as he AR and MA operaors in his shorhand allow he more general form (orders of backshif operaors; parameer values or labels) The porion parameer values or labels allows for eiher specific numeric values or labels of variables holding he iniial esimaes. Hence he following shorhand expression corresponds o (5.19) exacly (1,2; PHI1, PHI2)(12; PHI3)ZDATA(12) = (1; THETA1)(12; THETA2)NOISE. (5.21) The more complee shorhand expression in (5.21) may be more complicaed o use han longhand noaion for simple low order models. However, his noaion is very useful when a model or operaor conains many parameers. For example, he above noaion can be used o specify he expression as φ1 φ2 φ3 φ4 φ 5 = a (1 B B B B B )Z (1 TO 5; PHI1 TO PHI5)ZDATA = NOISE. The shorhand noaion is used frequenly in he specificaion of ransfer funcion models (see Chapers 6 and 8, respecively).

135 5.44 ARIMA MODELING AND FORECASTING Ploing forecass wih confidence limis I is ofen valuable o plo forecas values of a ime series along wih he original series. In addiion, ploing he confidence limis of he forecass provides us wih informaion on he poenial variabiliy of hese forecass. In order o plo forecass, we need o creae forecass (using he FORECAST paragraph), possibly modify series using analyic funcions or daa ediing capabiliies (see Appendices A and B), and hen plo he resulan daa (using eiher he capabiliies of SCAGRAF or hose described in Chaper 3). As an example, suppose we wan o plo 12 forecas values of SALES from he model we derived in Secion 5.2. In addiion, suppose we wan o display he 90% confidence inervals of he forecass. The esimaed model is in he SCA workspace under he label SALESM. To forecas he series and reain he forecass and heir sandard errors we can ener -->FORECAST SALESM. NOFS ARE > HOLD FORECASTS(FCSTSALE), STD_ERR(STDSALE). NOTE: THE EXACT METHOD FOR COMPUTING RESIDUALS IS USED FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN Our forecass are now in he variable FCSTSALE and he sandard errors are in STDSALE. We can save he variables SALES, FCSTSALE and STDSALE on a file and use SCAGRAF o consruc a plo of he forecass (wih or wihou he original series). This plo is shown in Figure 5.6.

136 ARIMA MODELING AND FORECASTING 5.45 Figure 5.6 Forecas plo for SALES using ARIMA(1,1,1) model. The plo is of only he las porion of SALES. Forecass (), confidence inervals (0) To accomplish he same ype of plo wihin he SCA Sysem, we need o perform a few simple seps. For example, he upper and lower confidence limis for a 90% confidence inerval can be compued using he following wo analyic saemens (see Appendix A) UPPER = FCSTSALE *STDSALE LOWER = FCSTSALE *STDSALE We can plo he forecass and confidence inervals direcly by using he MTSPLOT paragraph (see Chaper 3) and enering -->MTSPLOT LOWER, FCSTSALE, UPPER. SYMBOLS ARE -, +, -. The symbols -, +, and - are specified here o represen he lower confidence limi, forecased value, and upper confidence limi, respecively. We obain he following display: TIME SERIES PLOT FOR VARIABLES LOWER, FCSTSALE, AND UPPER I - I I - - I I - I I - I I - I I I I - I I - - I I - I I - I I - - I I - - I

137 5.46 ARIMA MODELING AND FORECASTING If we would like o plo he forecass on he same frame as he original series, we need o append each of he above hree variables o SALES. We can accomplish his hrough he JOIN paragraph (see Appendix B). -->JOIN SALES, LOWER. NEW IS SALELOW. -->JOIN SALES, UPPER. NEW IS SALEUPP. -->JOIN SALES, FCSTSALE. NEW IS SALEFORE. We may now employ MTSPLOT as before Pi and psi weighs of a specified model An ARIMA model, for example φ (B)Z =θ (B)a, may be rewrien in wo oher forms. One form is in erms of he presen and pas values of he series and he curren shock (noise) o he sysem. In he oher form, he curren daa value is wrien in erms of he presen and pas values of shock. In he former, he model above may be wrien as where π (B)Z = a, 2 π (B) = 1 π1b π2b. The coefficiens of he linear polynomial π (B) saisfy he relaionship π(b) θ (B) =φ (B). The coefficiens of π(b), or pi-weighs, indicae he relaive imporance (weigh) of pas observaions in predicing he fuure and how he curren value of he series may be derived from pas values and he curren shock. The pi-weighs may also be used in forecasing fuure values. where The model above can also be wrien as Z =ψ (B) a, 2 ψ (B) = 1+ψ 1B+ψ 2B +. The coefficiens of he linear polynomial ψ (B) are such ha ψ(b) φ (B) =θ (B). The coefficiens of ψ (B), or psi-weighs, indicae how he curren value of he series may be derived from he noise series. The psi-weighs are used in he calculaion of he variance of he error in forecased values (see Secion 5.1.6) and may also be used in he updaing of forecass (Box and Jenkins, 1970).

138 ARIMA MODELING AND FORECASTING 5.47 Boh he pi and psi-weighs of a specified univariae model may be obained using he WEIGHT paragraph. In addiion, he ransfer funcion weighs (impulse response weighs, see Chaper 8) of a ransfer funcion model ha has been specified previously may be calculaed (see Secion 8.7.8). Examples To illusrae he WEIGHT paragraph, we will compue he pi and psi-weighs for he final models fied o he SALES daa used in Secion 5.2 and he airline daa of Secion 5.3. The models fied o hese series are in he SCA workspace under he labels SALESM and AIRLINE, respecively. To compue 24 pi and psi-weighs using he model held in SALESM, we may ener -->WEIGHT SALESM. PIWEIGHTS IN SALESPI. PSIWEIGHTS IN SALESPSI. --> MAXIMUM IS 24. The MAXIMUM senence is specified o limi he number of weighs o 24 (he defaul is 100). The values sored in SALESPI are π 0, π 1, π 2,..., π 23 for he model in SALESM ( π 0 = 1). Similarly, he values sored in SALESPSI are ψ0, ψ1,..., ψ 23 for he same model ( ψ 0 = 1). We can use he PRINT paragraph o prin he values compued. -->PRINT SALESPI. NO LABEL. FORMAT IS '5F10.4' E E E E E E E-05 -->PRINT SALESPSI. NO LABEL. FORMAT IS '5F10.4' In like manner we can compue 50 pi and psi-weighs (i.e., π 0 hrough π and ψ hrough ) corresponding o he airline model of Secion 5.3 by enering ψ 49 -->WEIGHT AIRLINE. PIWEIGHTS IN AIRPI. PSIWEIGHTS IN AIRPSI. --> MAXIMUM IS 50. The pi weighs are compued from π θ θ =, (B)(1 1B)(1 12B ) (1 B)(1 B ) and he psi-weighs are compued from 49 0

139 5.48 ARIMA MODELING AND FORECASTING ψ(b)(1 B)(1 B ) = (1 θ B)(1 θ B ) The values are prined below -->PRINT AIRPI. NO LABEL. FORMAT IS '5F10.4' E E E E E E E E >PRINT AIRPSI. NO LABEL. FORMAT IS '5F10.4'

140 ARIMA MODELING AND FORECASTING 5.49 SUMMARY OF THE SCA PARAGRAPHS IN CHAPTER 5 This secion provides a summary of hose SCA paragraphs employed in his chaper. The synax for many paragraphs is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of a paragraph, while he full display presens all possible modifying senences of a paragraph. In addiion, special remarks relaed o a paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are ACF, PACF, IDEN, EACF, SCAN, IACF, TSMODEL, ESTIM, FORECAST, SIMULATE and WEIGHT. Legend (see Chaper 2 for furher explanaion) v i r w : variable or model name : ineger : real value : keyword

141 5.50 ARIMA MODELING AND FORECASTING ACF Paragraph The ACF paragraph is used o compue he sample auocorrelaion funcion of a ime series. The paragraph also displays some descripive saisics including he sample mean, sandard deviaion and a -saisic on he significance of a consan erm. The sample ACF may also be compued wihin he IDEN paragraph. Synax for he ACF Paragraph Brief synax ACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. Required senence: VARIABLE Full synax ACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. SPAN IS i1, i2. HOLD ACF(v), SDACF(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he ACF Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none.

142 ARIMA MODELING AND FORECASTING 5.51 MAXLAG senence The MAXLAG senence is used o specify he maximum order of sample ACF o be compued. The defaul is 36. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is execued. The values ha may be reained are: ACF SDACF : he sample ACF of he series : he sandard deviaions of he sample ACF for he series OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu prined are: BRIEF : VALUE NORMAL : VALUE, PLOT, CI, LBQ where he keywords on he righ denoe: VALUE PLOT CI LBQ : values of he sample ACF : plo of he sample ACF : plo of he 95% confidence inerval for he sample ACF : values of he Ljung-Box Q saisics (Ljung and Box 1978) for he sample ACF for each lag

143 5.52 ARIMA MODELING AND FORECASTING PACF Paragraph The PACF paragraph is used o compue he sample parial auocorrelaion funcion of a ime series. The paragraph also displays some descripive saisics including he sample mean, sandard deviaion and a -saisic on he significance of a consan erm. The sample PACF may also be compued wihin he IDEN paragraph. Synax for he PACF Paragraph Brief synax PACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. Required senence: VARIABLE Full synax PACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. SPAN IS i1, i2. HOLD PACF(v), SDPACF(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he PACF Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( (1 B ) ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none.

144 ARIMA MODELING AND FORECASTING 5.53 MAXLAG senence The MAXLAG senence is used o specify he maximum order of sample PACF o be compued. The defaul is 36. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is execued. The values ha may be reained are: PACF : he sample PACF of he series SDPACF : he sandard deviaions of he sample PACF for he series OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and associaed oupu are: BRIEF : VALUE NORMAL : VALUE, PLOT, CI where he keywords on he righ denoe: VALUE PLOT CI : values of he sample PACF : plo of he sample PACF : plo of he 95% confidence inerval for he sample PACF

145 5.54 ARIMA MODELING AND FORECASTING IDEN Paragraph The IDEN paragraph can be used when performing he enaive idenificaion of a series or in he diagnosic checking of a residual series. The paragraph is used o co-ordinae he compuaion of he sample ACF (auocorrelaion funcion) and PACF (parial auocorrelaion funcion) of a univariae ime series. If only he sample ACF is desired, i may be compued using he ACF paragraph; similarly for he sample PACF. All hree paragraphs also display some descripive saisics including he sample mean, sandard deviaion and a -saisic on he significance of a consan erm. Synax for he IDEN Paragraph Brief synax IDEN VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. Required senence: VARIABLE Full synax IDEN VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. SPAN IS i1, i2. HOLD ACF(v), PACF(v), SDACF(v), SDPACF(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he IDEN Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary-inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none.

146 ARIMA MODELING AND FORECASTING 5.55 MAXLAG senence The MAXLAG senence is used o specify he maximum order of sample ACF and PACF o be compued. The defaul is 36. SPAN senence The SPAN senence is used o specify he span of ime indices, i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be re ained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is execued. The values ha may be reained are: ACF : he sample ACF of he series PACF : he sample PACF of he series SDACF : he sandard deviaions of he sample ACF for he series SDPACF : he sandard deviaions of he sample PACF for he series OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and associaed oupu are: BRIEF : VALUE NORM AL : VALUE, PLOT, CI, LBQ where he keywords on he righ denoe: VALUE PLOT CI LBQ : values of he sample ACF or PACF : plo of he sample ACF or PACF : plo of he 95% confidence inerval for he sample ACF or PACF : values of he Ljung-Box Q saisics (Ljung and Box 1978) for he sample ACF for each lag

147 5.56 ARIMA MODELING AND FORECASTING EACF Paragraph The EACF paragraph is used o compue he sample exended auocorrelaion funcion. The paragraph produces a able useful in deermining he order of a mixed saionary or nonsaionary ARMA process. Synax for he EACF Paragraph Brief synax EACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. Required senence: VARIABLE Full synax EACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS AR(i1), MA(i2). SPAN IS i1, i2. OUTPUT LEVEL(w), PRINT(w1, w2, ---) NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he EACF Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none. MAXLAG senence The MAXLAG senence is used o specify he maximum auoregressive (AR) and moving average (MA) orders o be compued and displayed. The defaul maximum AR order is 6 and maximum MA order is 12.

148 ARIMA MODELING AND FORECASTING 5.57 SPAN senence The SPAN senence is used o specify he span of ime indices, i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORM AL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF NORMAL DETAILED : TABLE : TABLE, VALUES : TABLE, VALUES, EAR where he keywords on he righ denoe: VALUE TABLE EAR : values of he able derived from he sample EACF : display of he condensed summary able for he series : he compued exended auoregressive coefficiens for he series SCAN Paragraph The SCAN paragraph is used o compue and display he smalles canonical correlaion (SCAN) able developed by Tsay and Tiao (1985). The SCAN able is useful in deermining he order of a mixed saionary or nonsaionary ARMA process (see Secion 5.4.4). Synax for he SCAN Paragraph Brief synax SCAN VARIABLE IS v. DFORDERS ARE i1, i2, ---. Required senence: VARIABLE

149 5.58 ARIMA MODELING AND FORECASTING Full synax SCAN VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS AR(i1), MA(i2). SPAN IS i1, i2. OUTPUT LEVEL(w), PRINT(w1, w2, ---) NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he SCAN Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none. MAXLAG senence The MAXLAG senence is used o specify he maximum auoregressive (AR) and moving average (MA) orders o be compued and displayed. The defaul maximum AR order is 6 and maximum MA order is 12. SPAN senence The SPAN senence is used o specify he span of ime indices, i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF : TABLE NORMAL : TABLE, VALUES where he keywords on he righ denoe: VALUES : he values of he SCAN able

150 ARIMA MODELING AND FORECASTING 5.59 TABLE : display of he condensed SCAN able IACF Paragraph The IACF paragraph is used o compue he sample inverse auocorrelaion funcion of a ime series (see Secion for more informaion). The paragraph also displays some descripive saisics including he sample mean, sandard deviaion and a -saisic on he significance of a consan erm. Synax for he IACF Paragraph Brief synax IACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. Required senence: VARIABLE Full synax IACF VARIABLE IS v. DFORDERS ARE i1, i2, ---. MAXLAG IS i. SPAN IS i1, i2. HOLD IACF(v), SDIACF(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---). NOPRINT(w1, w2, ---). Required senence: VARIABLE Senences Used in he IACF Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he series o be analyzed. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on he series when differencing is he saionary inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none.

151 5.60 ARIMA MODELING AND FORECASTING MAXLAG senence The MAXLAG senence is used o specify he maximum order of sample ACF o be compued. The defaul is 36. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reaine d in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is execued. The values ha may be reained are: IACF SDIACF : he sample IACF of he series : he sandard deviaions of he sample IACF for he series OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo-sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and associaed oupu are: BRIEF : VALUE NORMAL : VALUE, PLOT, CI where he keywords on he righ denoe: VALUE PLOT CI : values of he sample IACF : plo of he sample IACF : plo of he 95% confidence inerval for he sample IACF

152 ARIMA MODELING AND FORECASTING 5.61 TSMODEL Paragraph The TSMODEL paragraph is used o specify or modify a univariae ARIMA model. The paragraph is also used for he specificaion or modificaion of an inervenion or ransfer funcion model. The synax descripion for hese usages is provided in Chapers 6 and 8, respecively. For each model specified in a TSMODEL paragraph, a disinguishing label or name mus also be given. A number of differen models may be specified, each having a unique name, and subsequenly employed a a user's discreion. Moreover, he label also enables he informaion conained under i o be modified. Synax for he TSMODEL Paragraph Brief synax TSMODEL NAME IS model-name. MODEL IS model. Required senence: NAME Full synax TSMODEL NAME IS model-name. MODEL IS model. DELETE CONSTANT. FIXED-PARAMETERS ARE v1, v2, ---. CONSTRAINTS ARE (v1,v2,---), ---, (v1,v2,---). VARIANCE IS v. SHOW./NO SHOW. CHECK./NO CHECK. ROOTS./NO ROOTS. SIMULATION./NO SIMULATION. UPDATE./NO UPDATE. Required senence: NAME Senences Used in he TSMODEL Paragraph NAME senence The NAME senence is used o specify a unique label (name) for he model specified in he paragraph. This label is used o refer o his model in oher ime series relaed paragraphs or if he model is o be modified. MODEL senence The MODEL senence is used o specify a univariae Box-Jenkins ARIMA model.

153 5.62 ARIMA MODELING AND FORECASTING DELETE senence The DELETE senence is used o delee he consan erm from an exising ARIMA model. Once he consan erm is deleed, i can only be re-insered using he MODEL senence. FIXED-PARAMETER senence The FIXED-PARAMETER senence is used o specify he parameers whose values will be held consan during model esimaion, where v s are he parameer names. See Secion 5.2 for a brief discussion of his senence. The defaul condiion is ha no parameers are fixed. CONSTRAINT senence The CONSTRAINT senence is used o specify ha he parameers wihin each pair of parenheses will be consrained o have he same value during model esimaion. See Secion 5.2 for a brief discussion of his senence. The defaul condiion is ha no parameers are consrained o be equal. VARIANCE senence The VARIANCE senence is used o specify a variable where he value of he noise variance is or will be sored. If a value for he variable is known, his value will be used as iniial variance in esimaion and he final esimaed value of he variance will be sored in his variable for fuure esimaion or in forecasing. Oherwise he variance is calculaed from he residual series derived from he specified model and parameer esimaes. Noe ha he SCA Sysem designaes an inernal variable for he VARIANCE senence so ha he specificaion of his senence is opional. SHOW senence The SHOW senence is used o display a summary of he specified model. The defaul is SHOW. The summary includes series name, differencing (if any), span for daa, parameer labels (if any) and curren values for parameers. CHECK senence The CHECK senence is used o check wheher all roos of he AR, MA, and denominaor polynomials lie ouside he uni circle. The defaul is NO CHECK. ROOTS senence The ROOTS senence is used o display all roos of he AR, MA and denominaor polynomials. The defaul is NO ROOTS. SIMULATION senence The SIMULATION senence is used o specify ha he model will be used for simulaion purposes. Ordinarily his senence is no specified. See Secion or for more deails. The defaul is NO SIMULATION. UPDATE senence The UPDATE senence is used o specify ha parameer values of he model are updaed using he mos curren informaion available. The defaul is NO UPDATE. In he defaul

154 ARIMA MODELING AND FORECASTING 5.63 case, parameer values are updaed only afer execuion of he ESTIM paragraph raher han immediaely. ESTIM Paragraph The ESTIM paragraph is used o conrol he esimaion of he parameers of an ARIMA model. Synax of he ESTIM Paragraph Brief synax ESTIM MODEL IS v. HOLD RESIDUALS(v). Required senence: MODEL Full synax ESTIM MODEL IS v. METHOD IS w. STOP-CRITERIA ARE MAXIT(i), LIKELIHOOD(r1), ESTIMATE(r2). SPAN IS i1, i2. HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: MODEL Senences Used in h e ESTIM Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model o be esimaed. The label mus be one specified in a previous TSMODEL paragraph. METHOD senence The METHOD senence is used o specify he likelihood funcion used for model esimaion. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is CONDITIONAL.

155 5.64 ARIMA MODELING AND FORECASTING STOP senence The STOP senence is used o specify he sopping crierion for nonlinear esimaion. The argumen, i, for he keyword MAXIT specifies he maximum number of ieraions (defaul is i=10); he argumen, r1, for he keyword LIKELIHOOD specifies he value of he relaive convergence crierion on he likelihood funcion (defaul is r1=0.0001); and he argumen, r2, for he keyword ESTIMATE specifies he value of he relaive convergence crierion on he parameer esimaes (defaul is r2=0.001). Esimaion ieraions will be erminaed when he relaive change in he value of he likelihood funcion or parameer esimaes beween wo successive ieraions is less han or equal o he convergence crierion, or if he maximum number of ieraions is reached. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: RESIDUAL : he residual series FITTED : he one-sep-ahead forecass (fied values) of he series VARIANCE : variance of he noise OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF NORMAL DETAILED : esimaes and heir relaed saisics only : RCORR : ITERATION, CORR, and RCORR where he keywords on he righ denoe: ITERATION CORR RCORR : he parameer and covariance esimaes for each ieraion : he correlaion marix for he parameer esimaes : he reduced correlaion marix for he parameer esimaes (i.e., a display in which all values have no more han wo decimal places and hose esimaes wihin wo sandard errors of zero are displayed as dos,. ).

156 ARIMA MODELING AND FORECASTING 5.65 FORECAST Paragraph The FORECAST paragraph is used o compue he forecas of fuure values of a ime series based on a specified ARIMA model. The FORECAST paragraph requires he curren esimae of he variance σ2 o compue sandard errors of forecass. The variance for he esimaed model is always sored inernally during he execuion of he ESTIM paragraph, bu he inernal esimae is overwrien a each subsequen execuion of a ESTIM paragraph for he same model. The FORECAST paragraph has oher senences available, no described below. These are used in he forecasing of inervenion and ransfer funcion models and are described in Chapers 6 and 8, respecively. Synax of he FORECAST Paragraph Brief synax FORECAST MODEL IS v. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. Required senence: MODEL Full synax FORECAST MODEL IS v. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. JOIN. /NO JOIN. METHOD IS w. HOLD FORECASTS(v1,v2,---), STD_ERRS(v1,v2,---). OUTPUT PRINT(w), NOPRINT(w). Required senence: MODEL Senences Used in he FORECAST Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model for he series o be forecased. The label mus be one specified in a previous TSMODEL paragraph. NOFS senence The NOFS senence is used o specify for each ime origin he number of ime periods ahead for which forecass will be generaed. The number of argumens in his senence

157 5.66 ARIMA MODELING AND FORECASTING mus be he same as ha in he ORIGINS senence. The defaul is 24 forecass for each ime origin. ORIGINS senence The ORIGINS senence is used o specify he ime origins for forecass. The defaul is one origin, he las observaion. JOIN senence The JOIN senence is used o specify ha he forecass calculaed should be appended o he variable of he model relaive o he specified origin. If more han one origin is specified only he las will be used. The defaul is NO JOIN. METHOD senence The METHOD senence is used o specify he likelihood funcion used for he compuaion of he residual series employed in forecasing. The keyword may be CONDITIONAL for he condiional likelihood, or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is EXACT. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: FORECASTS STD_ERRS : forecass for each corresponding ime origin : sandard errors of he forecass a he las ime origin OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for various saisics. The defaul condiion is PRINT(FORECASTS); ha is, o display forecas values for each ime origin. To suppress his, specify NOPRINT(FORECASTS).

158 ARIMA MODELING AND FORECASTING 5.67 SIMULATE Paragraph The SIMULATE paragraph is used o generae daa according o a user specified univariae ime series model. A univariae ime series model mus have been specified previously using he TSMODEL paragraph. The paragraph is also used o generae daa according o a user specified disribuion. More informaion on his can be found in Chaper 12 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Synax for he SIMULATE Paragraph SIMULATE VARIABLE IS v. MODEL IS model-name. NOISE IS disribuion (parameers) or VARIABLE(v). NOBS IS i. SEED IS i OMIT IS i. Required senences: MODEL, NOISE and NOBS Senences Used in he SIMULATE Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he variable o sore he simulaion resuls. The senence is no required if a univariae ime series is generaed. If he senence is no specified, he variable name used in he MODEL senence of he TSMODEL paragraph is used o sore he resuls. MODEL senence The MODEL senence is used o specify he name (label) of he model o be simulaed. The model may be an ARIMA model specified in a TSMODEL paragraph. The senence SIMULATION mus also appear in he TSMODEL paragraph. NOISE senence The NOISE senence is used o specify he noise sequence for he simulaed ime series model. Eiher he disribuion for generaing he noise sequence or he name of a variable conaining values o be used as he sequence is specified. The following disribuions can be used: U(r1,r2) : uniform disribuion beween r1 and r2 N(r1,r2) : normal disribuion wih mean r1 and variance r2 MN(v1,v2): mulivariae normal disribuion wih mean vecor v1 and covariance marix v2. Noe ha v1 and v2 mus be names of variables defined previously. NOBS senence The NOBS senence is used o specify he number of observaions o be simulaed.

159 5.68 ARIMA MODELING AND FORECASTING SEED senence The SEED senence is used o specify an ineger or he name of a variable for saring he random number generaion. When a variable is used, he seven digi value is used as a seed if i is no defined ye, or he value of he variable is used if he variable is an exising one. Afer he simulaion, he variable conains he seed las used. The number of digis for he seed mus no be more han 8 digis. The defaul is OMIT senence The OMIT senence is used o specify he number of observaions o be omied a he beginning of he simulaed daa. WEIGHT Paragraph The WEIGHT paragraph is used o compue he pi and psi weighs of an ARIMA ime series model. I can also be used o compue he impulse response weighs of a ransfer funcion model (see Secion 8.7.8). Synax of he WEIGHT paragraph WEIGHT MODEL model-name. PIWEIGHTS IN v. PSIWEIGHTS IN v. MAXIMUM IS i. CUTOFF IS r. Required senences: MODEL Senences Used in he WEIGHT Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he ARIMA model for which pi or psi-weighs are o be compued. The label mus be he one specified in a previous TSMODEL paragraph. PIWEIGHTS senence The PIWEIGHTS senence is used o specify he name of he variable o sore he piweighs associaed wih he ARIMA model. PSIWEIGHTS senence The PSIWEIGHTS senence is used o specify he name of he variable o sore he psiweighs associaed wih he ARIMA model. MAXIMUM senence The MAXIMUM senence is used o specify he maximum number of weighs o be compued. The defaul is 100 for all weighs o be compued.

160 ARIMA MODELING AND FORECASTING 5.69 CUTOFF senence The CUTOFF senence is used o specify a cuoff value o limi he number of weighs ha will be sored. The las weighs sored represens he las value greaer han or equal o (in absolue value) he cuoff value. Noe ha he specificaion of a cuoff value will cause he variables ha sore he weighs o have differen lenghs. The defaul cuoff value is 0; ha is, all weighs will be sored. REFERENCES Abraham, B., and Ledoler, J. (1983). Saisical Mehods for Forecasing. New York: Wiley. Anderson, T.W. (1971). The Saisical Analysis of Time Series. New York: Wiley: Box, G.E.P., and Jenkins, G.H. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden Day. (Revised ediion published 1976). Chafield, C. (1979). Inverse Auocorrelaions. Journal of he Royal Saisical Sociey 142: Cleveland, W.S. (1972). The Inverse Auocorrelaions of a Time Series and Their Applicaions. Technomerics 14: Cryer, J.D. (1986). Time Series Analysis. Boson: Duxbury Press. Granger, C.W.J., and Newbold, P. (1987). Forecasing Economic Time Series. New York: Academic Press. Hillmer, S.C., and Tiao, G.C. (1979). Likelihood Funcion of Saionary Muliple Auoregressive Moving Average Models. Journal of he American Saisical Associaion 74: Liu, L.-M. (1989). Idenificaion of Seasonal ARIMA Models Using a Filering Mehod. Communicaion in Saisics A 18: Ljung, G.M., and Box, G.E.P. (1978). On a Measure of Lack of Fi in ime Series Models. Biomerika 65: MACC (1965). GAUSHAUS -- Nonlinear Leas Squares. Madison, WI: Madison Academic Compuing Cener, Universiy of Wisconsin. Pankraz, A. (1983). Forecasing wih Univariae Box-Jenkins Models: Conceps and Cases. New York: Wiley. Slusky, E. (1937). The Summaion of Random Causes as he Source of Cyclic Processes. Economerica 5: 105 (ranslaion of original 1927 Russian paper). Tiao, G.C. and Tsay, R.S. (1985). A Canonical Correlaion Approach o Modeling Mulivariae Time Series. American Saisical Associaion 1985 Proceedings of he Business and Economic Saisics Secion:

161 5.70 ARIMA MODELING AND FORECASTING Tsay, R.S. and Tiao, G.C. (1984). Consisen Esimaes of Auoregressive Parameers and Exended Sample Auocorrelaion Funcion for Saionary and Non-saionary ARMA Models. Journal of he American Saisical Associaion 79: Tsay, R.S. and Tiao, G.C. (1985). Use of Canonical Analysis in Time Series Model Idenificaion. Biomerika 72: Vandaele, W. (1983). Applied Time Series Analysis and Box-Jenkins Models. New York: Academic Press. Wei, W.W.S. (1990). Time Series Analysis: Univariae and Mulivariae Mehods. Redwood Ciy, CA: Addison-Wesley. Wold, H. (1938). A Sudy in he Analysis of Saionary Time Series (2nd ed. 1954). Uppsala: Almquis and Wicksell. Yule, G.U. (1921). On he Time-Correlaion Problem wih Special Reference o he Variae Difference Correlaion Mehod. Journal of he Royal Saisical Sociey 84: Yule, G.U. (1927). On a Mehod of Invesigaing Periodiciies in Disurbed Series, wih Special Reference o Wölfer s Sunspo Numbers. Philosophical Transacions of he Royal Sociey of London, Series A, 226:

162 CHAPTER 6 INTERVENTION ANALYSIS Time series are ofen affeced by various exernal evens such as major corporae, poliical or economic policy iniiaives or changes; echnological changes; work soppages; sales promoions; adverising; and so forh. These exernal evens are commonly known as inervenions. When such inervenions are known o us, we may eiher wish o evaluae he effec of hese exernal evens or o incorporae he inervenions ino our ime series model o possibly improve parameer esimaes or forecass. In his chaper, we discuss inervenion analysis (or impac analysis) and how he SCA Sysem can be employed for such analyses. The SCA Sysem also has capabiliies for he analysis of a ime series when inervenions, or he imings for inervenions, are unknown o us. Such an analysis is an aspec of oulier deecion and adjusmen, and is discussed in Chaper The Inervenion Model Tradiionally, if a ime series was subjeced o an inervenion a a paricular ime period, say T, is effec in changing he mean level of he series was deermined by using a wo-sample -es. The mean level in he pre-inervenion period was conrased wih ha afer he inervenion occurred. Box and Tiao (1965) showed ha he -es is no appropriae in he case of serially correlaed daa. Moreover, an inervenion may no be a sep change, which is he basic assumpion of he wo-sample -es. Box and Tiao (1975) provided a procedure for analyzing a ime series in he presence of known exernal evens. This procedure has become known as inervenion (or impac) analysis. In heir approach, a ime series is represened by wo disinc componens: an underlying disurbance erm, and he se of inervenions on he series. In he case of a single inervenion, he form of he inervenion model is ω(b) Y = C+ I + N δ(b) (6.1) I is a binary indicaor vecor (ha is, a vecor assuming he values 0 or 1) ha defines he period of he inervenion. The erm (ω(b)/δ(b)) is a characerizaion of he effec(s) of he inervenion and will be discussed laer. The erm N is called he disurbance, which can be expressed as ω(b) N = Y C I δ (B). (6.2) We assume ha N may be modeled as an ARIMA process as defined in he previous chaper. In he case ha here are no exogenous evens, hen he model for Y reduces o he

163 6.2 INTERVENTION ANALYSIS ARIMA models discussed previously. The model given in (6.1) can be direcly exended o include more han one inervenions. To illusrae equaions (6.1) and (6.2), consider he SALES daa of he previous chaper. There are 150 observaions in his daa se. Suppose ha a srike occurred in he monh represened by = 120, and a new se of governmenal regulaions affecing sales wen ino effec beginning a monh = 135 and saying in effec hereafer. There are wo inervenions. They can be defined as follows and I I 1 2 = = 1, =120 0, oherwise 0, prior o =135 1, afer =135 The form of he inervenion model in his case is ω (B) ω (B) Y = C+ I + I2 + N. (6.3) δ (B) (B) δ2 In he absence of any inervenions, as was he case in Chaper 5, an adequae model for he daa was found o be (1 φb)(1 B)Z = (1 θ B)a. (6.4) We may hen wish o consider using an ARIMA(1,1,1) model as a model for N. The srucure of he polynomials used in each inervenion period is dependen on he ype of inervenion indicaor used and he posulaed effec of inervenion, as will now be discussed. 6.2 Characerizaions for an Inervenion Two differen ypes of inervenions were described in he example above. The srike (defined by I 1 ) was in effec for one ime period only. The governmen regulaions (defined by ) remained in effec once hey were insiued. I 2 An indicaor variable represening an inervenion ha akes place for one ime period (T) only is called a pulse funcion. I is usually represened as P, where T is he ime ha he inervenion occurs (i.e., has he value 1). In he example above, T = 120. An indicaor variable represening an inervenion ha remains in effec beginning from (T) a paricular ime period is called a sep funcion. This variable is usually represened as S, where T is he ime ha he inervenion begins. In he example above, T = 135. The pulse and sep funcions are he mos common characerizaions for he inervenion scenarios. As noed above, he response o an inervenion is characerized by he raional polynomial

164 INTERVENTION ANALYSIS 6.3 ω(b). δ(b) The operaor in he numeraor, ω (B), represens he impac(s) of he inervenion and he lengh of ime (delay) i akes he impac(s) o be refleced in he ime series. For example, he effec of a srike may only be in he ime period in which i occurred, while he effec of an adverising campaign may affec he curren ime period and have a residual effec on he nex period. Hence we may use he characerizaion ω (B) =ω 0 o indicae a conemporaneous (same ime) effec; ω (B) =ω 1B o describe an effec no fel unil he nex ime period; or ω (B) =ω 0 +ω1(b) o describe an even ha affecs he measured response in boh he curren and nex ime period. The operaor in he denominaor, δ (B), represens he way in which an impac dissipaes. In mos cases, he δ(b) of an inervenion model is a low order polynomial, for example, δ (B) = 1 δ 1B. If an inervenion has a relaively long erm residual effec (or growh paern), hen he value of δ1 will be moderae o large. However, if he effec is shor erm, hen he value of δ 1 will be small. In an exreme case, he inervenion may no have any residual effec. In such a case, we have δ 1 = 0. To formally summarize, he raional polynomial ω(b)/ δ(b) consiss of he operaors ω (B) =ω +ω B+ω B + +ω B 1, and 2 s s 1 δ (B) = 1 δb δ B δ B. 2 r 1 2 r However, in pracice ω (B) usually consiss of only a few erms (ofen no more han 1 or 2 erms) while δ (B) usually can be represened as eiher δ (B) =1 or δ (B) = 1 δ 1B. A useful se of informaion are he descripions of he responses o a sep and pulse inpu funcion for various configuraions of ω (B) and δ (B). In Figure 6.1 responses are shown for ω B, ω/(1 δ B), and ω(b)/(1 B) for boh a sep and a pulse funcion. Visuals, or descripions, of oher frequenly used responses can be found in Box and Tiao (1975), Vandaele (1983, pages ), Wei (1990, pages ), and Abraham and Ledoler (1983, pages ).

165 6.4 INTERVENTION ANALYSIS Figure 6.1 Some responses o a sep and a pulse funcion Sep funcion Pulse funcion ω B ω 1 δb ωb 1 B In Figure 6.1 we noe ha here is an exac relaionship beween a sep and a pulse funcion. Tha is, (1 B)S = P. (6.5) (T) (T) Because of his relaionship, an inervenion can be described equally well by eiher a pulse or a sep funcion. The form used ofen depends upon he one ha is more convenien o use, or he form ha provides he easier inerpreaion. 6.3 A Modeling Sraegy for Inervenion Analysis There are wo separae componens in an inervenion model: a deerminisic componen describing he inervenion(s) and he associaed response(s), and a sochasic disurbance erm. The overall modeling sraegy is o obain reasonable iniial represenaions for boh componens and ierae o a final model based on inermediae esimaions, diagnosic check, and model inerpreaions. I may be difficul o iniially idenify a model for he disurbance erm N since i is direcly affeced by he effecs of he inervenion(s). One sraegy is o model N using eiher he observaions prior o he occurrence of any inervenion or he observaions well afer he ime of occurrence of he las inervenion, depending upon which porion provides

166 INTERVENTION ANALYSIS 6.5 he longer se of daa. Alernaively, models may be consruced for each of he wo periods and compared. A composie choice for N may hen be made. During he esimaion and checking process, N may hen be modified based on he changes made o he exogenous effecs and on he residual series. The exogenous inervenion porion of he model canno be idenified using rigorous saisical echniques. This porion is generally posulaed based on he plo of he ime series or using knowledge of he daa under sudy, and is hen modified as necessary. Usually, he known characerizaions of responses o pulse and sep funcions (as described above) are used o provide iniial represenaions for he inervenions. Three examples are used in he remainder of his chaper o illusrae inervenion analysis and he use of he SCA Sysem in such analyses. Furher analyses and discussions of hese examples can be found in Chaper Inervenion Analysis of a Producion Process As a simple example of an inervenion analysis, we consider he daily producion daa of an auomobile componen. The daa are lised in Table 6.1 and are ploed in Figure 6.2. The daa are sored in he SCA workspace in he variable PRODUCTN. Table 6.1 Producion process daa (read across) Figure 6.2 Producion process daa Our aenion is immediaely drawn o a change in he mean level in he plo of PRODUCTN. In fac, he producion process was changed beginning a =47. In Figure 6.3, wo separae mean level lines are insered, one prior o he process change and one afer.

167 6.6 INTERVENTION ANALYSIS Figure 6.3 Producion process daa wih mean level lines before and afer a process change Since he change in he process remained in effec from is inroducion, we will use a (47) sep funcion o represen he period of he inervenion. Specifically, we will use S as he sep funcion for his inervenion. I appears ha he effec of he inervenion was an upward shif in he mean level. As a resul, he deerminisic componen of our model will be (47) ω S. (6.6) We will resric our aenion o he firs 46 observaions o idenify a model for he disurbance erm, N. Since he number of observaions is relaively small, here may be some ambiguiy in he order of he model idenified. The ACF of he firs 46 observaions reveals he following -->ACF PRODUCTN. SPAN IS 1, 46. MAXLAG IS 12. TIME PERIOD ANALYZED TO 46 NAME OF THE SERIES PRODUCTN EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXXX IXXXXXX+X IXXXX I XXI XXXXXI XXXXXXI XXXXI XXXXXXI XI XXI I +

168 INTERVENTION ANALYSIS 6.7 Excep for he auocorrelaion a lag 2, he ACF is clean. To obain more informaion, we will now use he EACF for he same period. -->EACF PRODUCTN. SPAN IS 1,46. TIME PERIOD ANALYZED TO 46 NAME OF THE SERIES PRODUCTN EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) O O O O O O O O O O O O O (P= 1) X O O O O O O O O O O O O (P= 2) O O O O O O O O O O O O O (P= 3) O O O O O O O O O O O O O (P= 4) X O O O O O O O O O O O O (P= 5) X O O O O O O O O O O O O (P= 6) X O O O O O O O O O O O O From he summary saisics of boh he ACF and EACF, we see ha a consan erm (o represen he mean level) should be in he model. The simplified EACF able indicaes ha an ARMA(0,0) model may be appropriae for he daa. We will slighly overfi his model by considering an ARMA(0,1) model. Tha is, N = (1 θb)a. (6.7) By combining (6.6) and (6.7), we have he following iniial model for he producion daa: Y = C+ω S + (1 θ B)a. (6.8) (47) In order o fi he model of (6.8), we need o firs creae he sep funcion and hen specify he model. We will use he GENERATE paragraph (see Appendix B) o creae he sep funcion. The sep funcion will be given he variable name SHIFT. The SCA oupu is edied for presenaion purposes.

169 6.8 INTERVENTION ANALYSIS -->GENERATE SHIFT. NROW ARE 85. VALUES ARE 0 FOR 46, 1 FOR >TSMODEL PRODUCT. MODEL IS --> PRODUCTN = CONST + (WO)SHIFT(BINARY) + (1-THETA*B)NOISE. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PRODUCTN RANDOM ORIGINAL NONE SHIFT BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE WO SHIFT NUM. 1 0 NONE THETA PRODUCTN MA 1 1 NONE.1000 Noe ha he inervenion componen wihin he TSMODEL paragraph is specified as (W0)SHIFT(BINARY). As noed above, SHIFT is he name of he sep funcion. I is designaed as a BINARY series o disinguish i from a series ha is no deerminisic (see Chaper 8). The parenheses on he operaor (W0) are necessary so ha he SCA Sysem can disinguish he model parameer ω and he inervenion indicaor S. We can esimae he above model by enering (SCA oupu is edied) -->ESTIM PRODUCT. HOLD RESIDUALS(RES) (47) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PRODUCTN RANDOM ORIGINAL NONE SHIFT BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE WO SHIFT NUM. 1 0 NONE THETA PRODUCTN MA 1 1 NONE TOTAL SUM OF SQUARES E+07 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+07 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS.. 85 RESIDUAL VARIANCE ESTIMATE E+05 RESIDUAL STANDARD ERROR E+03

170 INTERVENTION ANALYSIS 6.9 Modifying an exising model As we may have expeced, he esimae of he MA parameer is no saisically significan and we may consider dropping i from he model. Alhough he model is simple and does no involve many parameers, we may no wish o re-specify he enire model simply o aler one porion of i. Here we wish o change our noise componen from (1 θb)a o jus. We can do his using he CHANGE senence of he TSMODEL paragraph. If we ener a -->TSMODEL PRODUCT. CHANGE NOISE. we will aler he exising model held under he name PRODUCT in he manner indicaed. Currenly he model named PRODUCT has wo componens, one involving he variable SHIFT and anoher involving NOISE. We can change any componen by simply re-saing i. For example, if CHANGE senence above had been specified as CHANGE (1 - THETA*B - THETA2*B**2)NOISE hen we would have changed he componen involving NOISE from an MA(1) model o an MA(2) model. More informaion on alering exising inervenion models is provided in Secion 6.7. The TSMODEL paragraph above yields he following SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PRODUCTN RANDOM ORIGINAL NONE SHIFT BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE WO SHIFT NUM. 1 0 NONE We can esimae he changed model by enering (SCA oupu is edied) -->ESTIM PRODUCT. HOLD RESIDUALS(RES) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PRODUCTN RANDOM ORIGINAL NONE SHIFT BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE

171 6.10 INTERVENTION ANALYSIS 2 WO SHIFT NUM. 1 0 NONE TOTAL SUM OF SQUARES E+07 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+07 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS.. 85 RESIDUAL VARIANCE ESTIMATE E+05 RESIDUAL STANDARD ERROR E+03 Residuals are mainained in he variable RES for diagnosic checking purposes. The ACF of RES (no shown) reveals no anomalies. In he ime plo of RES (also no shown here) here are wo poins (a = 24 and 26) ha are apar from he res. These will be discussed in Chaper 7. As expeced, we find evidence of a significan shif in he mean level of he producion daa caused by he change in he producion process. 6.5 Inervenion Analysis of he Rae of Change in he U.S. Consumer Price Index As a second example of inervenion analysis, we consider an example from Box and Tiao (1975) concerning he rae of change in he U.S. Consumer Price Index (CPI). The daa consis of 234 successive monhly values during he period July 1953 hrough December The daa are sored in he SCA workspace under he name RATECPI and are lised and ploed in Table 6.2 and Figure 6.4, respecively. Table 6.2 Monhly rae of change in he U.S. Consumer Price Index July 1953 hrough December 1972 from Box and Tiao (1975) (Read daa across. Daa should be divided by 100.)

172 INTERVENTION ANALYSIS 6.11 Figure 6.4 Rae of change of he U.S. Consumers Price Index (July 1953 hrough December 1972) In Sepember, Ocober and November of 1971, a collecion of federal conrols ermed Phase I were imposed on he U.S. economy. These conrols were followed by Phase II conrols ha lased for he remainder of he observaion period. These conrol policies were designed o reduce he level of inflaion. As a resul, i was posulaed ha each phase produced a (negaive) change in he level of he rae of change of he CPI Preliminary model posulaion Box and Tiao (1975) idenified an ARIMA model for he period prior o Sepember 1971 and used i as he model for he disurbance erm. The model was an ARIMA (0,1,1) model; ha is, (1 B)N = (1 θb)a. (6.9) In order o incorporae his ARIMA model wih he inervenion componens, we can re-wrie (6.9) as N 1 θb = a. (6.10) 1 B I was assumed ha he model for he disurbance remained essenially he same during he inervenion period. As a resul, he following model was used where 1 θb Y =ω 1I1 +ω 2I2 + a, (6.11) 1 B I 1 I 2 1, = Sepember, Ocober, November 1971 = 0, oherwise 1, December 1971 = 0, oherwise and Y is he rae of change of he CPI (ha is, RATECPI).

173 6.12 INTERVENTION ANALYSIS Creaing indicaors for he inervenions We need o creae indicaors represening I 1 and I 2. The GENERATE paragraph (see Appendix B) will be used wice o creae he binary variables labeled PHASE1 and PHASE2, corresponding o I1 and I 2, respecively. PHASE1 will have he value 1 for = 219, 220 ( and 221; while PHASE2 is he sep funcion S 222). We can use he following commands (he SCA responses o he commands are no shown) o generae hese wo indicaors. -->GENERATE PHASE1. NROW ARE > VALUES ARE 0 FOR 218, 1, 1, 1, 0 FOR >GENERATE PHASE2. NROW ARE 234. VALUES ARE 0 FOR 221, 1 FOR Model specificaion wih a differencing facor The TSMODEL paragraph permis he use of denominaor erms in he specificaion of any polynomial operaor. For example, an ARMA(1,1) disurbance erm can be specified as (1 - THETA*B)/(1 - PHI*B)NOISE since N = {(1 θb)/(1 φb)}a a is he same as (1 φ B)N = (1 θ B)a. As a resul, we may consider specifying he model of (6.11) in he same manner as ha used in he producion process example. Tha is, we may consider specifying he model as RATECPI = (W1)PHASE1(BINARY) + (W2)PHASE2(BINARY) + (1-TH*B)/(1-B)NOISE However, in he SCA convenion, a differencing erm may no be specified as a denominaor of an operaor. The reason for his is wofold. Firs, by excluding differencing operaors from he denominaor of such expressions, he SCA Sysem can disinguish AR operaors from differencing operaors. This is especially rue when only orders of operaors are specified. In his way he shorhand noaion (see Secion 5.4.5) (1,2)/(1)NOISE can be uniquely inerpreed as he specificaion of an ARMA(1,2) process. More imporanly, his resricion ensures ha an unsable model is no specified by misake. As a consequence of his resricion on he specificaion of differencing operaions, we mus phrase he differencing operaor of (6.11) in such a fashion ha can be reaed as he modifier of one or more series. If we rea he differencing facor (1-B) as an operaor, we can muliply boh sides of (6.11) by (1-B). The resulan expression is (1 B)Y (6.12) =ω1(1 B)I 1 +ω2(1 B)I 2 + (1 θb)a Now he differencing operaor can be specified as a modifier of Y, I1 and I 2. Hence we now specify he model of (6.12) as

174 INTERVENTION ANALYSIS >TSMODEL CPIMODEL. MODEL IS RATECPI(1) = (W1)PHASE1(BINARY,1) + --> (W2)PHASE2(BINARY,1) + (1 - TH*B)NOISE. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- CPIMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 RATECPI RANDOM ORIGINAL (1-B ) 1 PHASE1 BINARY ORIGINAL (1-B ) 1 PHASE2 BINARY ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 PHASE1 NUM. 1 0 NONE W2 PHASE2 NUM. 1 0 NONE TH RATECPI MA 1 1 NONE.1000 Noe ha we employed a shorhand noaion for he specificaion of he differencing operaor. Tha is, we specified RATECPI((1-B)) simply as RATECPI(1) and RATECPI(BINARY, (1-B) ) as RATECPI(BINARY,1) in he MODEL senence above. Since he model conains an MA parameer, we will esimae he model sequenially, firs employing he condiional likelihood funcion and hen he exac likelihood funcion (see Secion 5.2 for a discussion of hese mehods). Only he resuls for he exac esimaion are shown, and all SCA oupu below is edied for presenaion purposes. -->ESTIM CPIMODEL -->ESTIM CPIMODEL. METHOD IS EXACT. HOLD RESIDUAL(RESCPI). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- CPIMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 RATECPI RANDOM ORIGINAL (1-B ) 1 PHASE1 BINARY ORIGINAL (1-B ) 1 PHASE2 BINARY ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 PHASE1 NUM. 1 0 NONE W2 PHASE2 NUM. 1 0 NONE TH RATECPI MA 1 1 NONE TOTAL SUM OF SQUARES E-02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E-02 R-SQUARE

175 6.14 INTERVENTION ANALYSIS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-05 RESIDUAL STANDARD ERROR E-02 Based on he signs of he esimaes for ω 1 and ω 2, boh conrol periods appear o have reduced he level of inflaion. However, neiher effec is significan a he 5% level even hough he effec associaed wih Phase I is close o be significan. Clearly, Phase II produced no significan drop in he change of CPI. The ACF of he residual series (no shown) is fairly clean and does no indicae any major flaw in he model. However, a check of ouliers, or spurious values, in he residuals reveals a few quesionable observaions. This example will be coninued in Chaper 7 o demonsrae he effec of hese observaions on he above resuls. 6.6 Inervenion Analysis of Los Angeles Ozone Daa As a las example of inervenion analysis, we consider he monhly average of he ozone (O3) level in downown Los Angeles for he period January 1955 hrough December These daa were used by Box and Tiao (1975) and are sored in he SCA workspace under he name OZONE. The values of OZONE are lised in Table 6.3 and are ploed in Figure 6.5. Table 6.3 Monhly averages of ozone (in 10-3 pphm) in downown Los Angeles ( ) (read daa across)

176 INTERVENTION ANALYSIS 6.15 Figure 6.5 Monhly averages of ozone (in 10 3 pphm) in downown Los Angeles ( ) As may be observed in Figure 6.5, a srong seasonaliy is apparen in he daa. The daa are no saionary, so differencing is required. A decrease in he level of ozone hrough he years is also visible. As noed in Box and Tiao (1975), wo inervenions of poenial imporance are: INT1: he opening of he Golden Sae Freeway and he incepion of a new law reducing hydrocarbons in gasoline (January 1960), and INT2: regulaions regarding engine designs (beginning in 1966). The firs inervenion is expeced o produce a sep change in he ozone level beginning in January The second inervenion is expeced o gradually reduce he level of ozone as new cars are inroduced in he area. The effecs associaed wih he second inervenion were furher divided ino wo seasons, summer and winer, in order o accoun for amospheric condiions ha resul in higher readings of ozone in he summer season. Box and Tiao (1975) found ha a muliplicaive MA model is adequae for he seasonally differenced series. As a resul, we will esimae a model corresponding o 12 ω2 ω3 (1 θ1b)(1 θ2b ) OZONE =ω 1INT1 + INT2S 12 + INT2W 12 + a 12 1 B 1 B 1 B (6.13) where INT1 is a sep funcion wih he value 1 beginning in January 1960 ( = 61), and INT2S (summer) and INT2W (winer) assume he value 1 for appropriae seasonal periods beginning June 1966 and he value 0 oherwise. The response associaed wih INT1 is modeled as a level change. The response associaed wih boh INT2S and INT2W requires furher explanaion. To illusrae his response, consider INT2S. The summer period is defined as he monhs June-Ocober (he winer period is all oher monhs). Hence he values of INT2S associaed wih January, February,..., December beginning in 1966 are 0,0,0,0,0,1,1,1,1,1,0,0. (T) If we observe he response of ( ω/(1 B))S in Figure 6.1, we noe a ramp response ha grows in equal increm ens (he value of ω). The response associaed wih INT2S is a seasonal exension of his response. Here we have a ramp response ha grows in uniform incremens for each monh in he period. The same inerpreaion is rue for he response associaed wih INT2W.

177 6.16 INTERVENTION ANALYSIS There are a number of ways in which he necessary indicaor variables can be inroduced ino he SCA workspace. In some cases, hese indicaors may reside wih he ime series on an exernal file and may be ransmied o he SCA workspace using he INPUT paragraph (see Chaper 2). In addiion, we can use SCA commands o creae he variables. For example, he following are commands or sequence of commands ha can be used o creae he necessary binary indicaor variables here. Please see Appendix B for more informaion on he GENERATE and JOIN paragraphs, and Appendix A for more informaion on he row direc produc (RDP) operaor. All SCA responses o hese commands are edied ou for presenaion purposes. (The sep funcion, INT1) -->GENERATE INT1. NROW IS 216. VALUES ARE 0 FOR 60, 1 FOR 156. (The summer indicaor, INT2S) -->GENERATE ZERO. NROW IS 132. VALUES ARE 0 FOR >GENERATE SUMM. NROW IS 12. VALUES ARE 0,0,0,0,0,1,1,1,1,1,0,0. -->GENERATE NSUM. NROW IS 7. VALUES ARE 1 FOR 7. -->SUMMER = RDP(SUMM,NSUM) -->JOIN ZERO, SUMMER. NEW IS INT2S. (The winer indicaor, INT2W) -->GENERATE W1966. NROW IS 12. VALUES ARE 0 FOR 10, 1, 1. -->GENERATE WINT. NROW IS 12. VALUES ARE 1,1,1,1,1,0,0,0,0,0,1,1. -->GENERATE NWIN. NROW IS 6. VALUES ARE 1 FOR 6. -->WINTER = RDP(WINT,NWIN) -->JOIN ZERO, W1966, WINTER. NEW IS INT2W. As noed in Secion 6.5.3, we canno specify model (6.13) direcly since i conains a differencing operaor in one or more denominaors. If we muliply boh sides of (6.13) by 12 (1 B ), we obain he following (1 B )OZONE (6.14) =ω1(1 B )INT1 +ω 2INT2S +ω 3INT2W + (1 θ1b)(1 θ1b )a We can now use he TSMODEL paragraph o specify model (6.14). -->TSMODEL OZONEMDL. MODEL IS OZONE(12) = (W1)INT1(BINARY,12) + --> (W2)INT2S(BINARY) + (W3)INT2W(BINARY) + (1-TH1*B)(1-TH2*B**12)NOISE. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- OZONEMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 OZONE RANDOM ORIGINAL (1-B ) 12 INT1 BINARY ORIGINAL (1-B ) INT2S BINARY ORIGINAL NONE INT2W BINARY ORIGINAL NONE

178 INTERVENTION ANALYSIS 6.17 PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 INT1 NUM. 1 0 NONE W2 INT2S NUM. 1 0 NONE W3 INT2W NUM. 1 0 NONE TH1 OZONE MA 1 1 NONE TH2 OZONE MA 2 12 NONE.1000 Since he model conains MA parameers (in paricular, a seasonal MA parameer), we will esimae he model sequenially. We firs employ he condiional likelihood algorihm, hen re-esimae using he exac likelihood algorihm (see Secion 5.2 for a discussion of hese mehods). Only he resuls for he final esimaion are shown, and he oupu is edied for presenaion purposes. -->ESTIM OZONEMDL -->ESTIM OZONEMDL. METHOD IS EXACT. HOLD RESIDUALS(RESOZONE) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- OZONEMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 OZONE RANDOM ORIGINAL (1-B ) 12 INT1 BINARY ORIGINAL (1-B ) INT2S BINARY ORIGINAL NONE INT2W BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 INT1 NUM. 1 0 NONE W2 INT2S NUM. 1 0 NONE W3 INT2W NUM. 1 0 NONE TH1 OZONE MA 1 1 NONE TH2 OZONE MA 2 12 NONE TOTAL SUM OF SQUARES E+03 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+03 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+00 RESIDUAL STANDARD ERROR E+00 As expeced, all esimaes of he inervenion parameers have a negaive sign, indicaing reducions of he ozone level. The esimae of ω 1, -1.34, indicaes ha he join effec of he opening of a new freeway and change in gasoline mixures resul in a permanen level reducion in ozone of abou 1.34 unis. There is an approximae 0.24 uni per year reducion in ozone during he summer period and a 0.10 uni per year reducion in ozone during he winer period. The reducion associaed wih he summer period is saisically significan a he 5% level, bu he reducion associaed wih he winer period is no. Box

179 6.18 INTERVENTION ANALYSIS and Tiao (1975) conclude ha he reducion during he winer period may be classified as sligh. The ACF of he residuals indicae a good fi, and no gross errors are seen in he ime plo. However, if we re-esimae he above model while deecing and adjusing for possible ouliers, we will obain somewha differen resuls. These resuls are presened in Chaper Oher Inervenion Relaed Topics This secion provides a brief overview of opics relaed o inervenion analysis or he execuion of SCA paragraphs relaed o inervenion analysis. Much of he maerial presened in his secion can be considered advanced or of occasional use. As a consequence, his secion can be skipped, and seleced opics can be referenced as necessary. The maerial presened, and he secion conaining i are: Secion Topic Modifying an inervenion model Esimaion of inervenions conaining a denominaor polynomial Forecasing from an inervenion model Consrains on parameers Noaional shorhand Modifying an inervenion model An inervenion model may be modified by adding or deleing inervenions as well as changing he exising inervenions or disurbance. This is accomplished hrough he inclusion of he ADD, CHANGE, or DELETE senence in he TSMODEL paragraph. To illusrae hese capabiliies, we will assume ha we have he already specified following modified version of he inervenion model used in Secion 6.5 (only a porion of he MODEL senence is given below). RATECPI(1) = CONST + (W1)PHASE1(BINARY,1) + (W2)PHASE2(BINARY,1) + (1 TH * B)NOISE (6.15) As in Secion 6.5, we will use he name CPIMODEL for he above specificaion.

180 INTERVENTION ANALYSIS 6.19 The ADD senence The ADD senence is used in TSMODEL paragraph o modify an exising inervenion model by he addiion of new inervenions. Any inervenion mus be represened wih a new binary variable and he complee response associaed wih i. For example, if he componen ω3(1 B)I3 is o be added o CPIMODEL where I3 is a defined PHASE3 period, hen he following command suffices TSMODEL CPIMODEL. ADD (W3)PHASE3(BINARY,1). I is imporan ha he labels of parameers used in he ADD senence as well as he label of he binary series be differen from any labels in he exising model. More han one inervenions may be added o an exising model by joining each inervenion wih an addiion symbol (+). For example, if boh he above PHASE3 componen and he componen ω4(1 B)I 4 are o be added o CPIMODEL where I4 is a defined PHASE4 period, hen he following command may be used TSMODEL CPIMODEL. ADD (W3)PHASE3(BINARY,1) + (W4)PHASE4(BINARY,1). The CHANGE senence The CHANGE senence is used in he TSMODEL paragraph o modify operaors of exising componens wihin an inervenion model. In he CPIMODEL employing (6.15), here are hree componens associaed wih he variable names PHASE1, PHASE2 and NOISE. The change is made by a complee re-specificaion of affeced componens. Hence he senence has a synax similar o ha of ADD senence. For example, if he ARMA operaor of he disurbance in (6.15) is o be changed o {1/(1 φ B) } a hen he following TSMODEL paragraph suffices TSMODEL CPIMODEL. CHANGE 1/(1-PHI*B)NOISE. I is imporan o emphasize ha only operaors of exising componens of an inervenion model are affeced by he CHANGE senence. As in he ADD senence, if more han one componen are o be changed, hen each componen mus be separaed by an addiion symbol (+). The SCA Sysem will no process a CHANGE senence involving variables no presen in he exising model, i only changes exising componens. The CHANGE senence may be used o modify a componen specified in an ADD senence when boh senences are used wihin he same TSMODEL paragraph. In such siuaion, he SCA Sysem firs processes he ADD senence and hen he CHANGE senence regardless of he order in which hey are wrien.

181 6.20 INTERVENTION ANALYSIS The DELETE senence The DELETE senence is used in a TSMODEL paragraph o modify an exising inervenion model by deleing specified inervenion componens or he consan erm from he model. The former is accomplished by deleing he variable describing he inervenion period. For example, if he inervenion occurring during PHASE1 is o be removed from he inervenion model CPIMODEL, he following command suffices TSMODEL CPIMODEL. DELETE PHASE1. To delee he consan erm from he model CPIMODEL, we simply ener TSMODEL CPIMODEL. DELETE CONSTANT. We do no ener he variable name, he keyword CONSTANT is recognized as he consan erm. A consan erm can only be added by re-specificaion of a model hrough he MODEL senence Esimaion of inervenions conaining a denominaor polynomial The general represenaion of he response of an inervenion is given by ω(b) / δ(b). As noed in Secion 6.1, he order of he δ (B) polynomial is usually no greaer han 1. Hence some of he mos common inervenion response funcions used are ω ω; ω 0 +ω1b; and (6.16) 1 δb The esimaion procedure used by he SCA Sysem is fairly robus; ha is, in mos cases any non-zero iniial esimaes of parameers will lead o he convergence o a final se. However, problems can arise in he case of inervenion response funcions ha conain a denominaor polynomial (e.g., ω/(1 δb) ). A more deailed discussion can be found in Liu and Tiao (1980). The same is rue in he case of ransfer funcion models (see Chaper 8). In hese cases, i is ofen imporan ha reasonable iniial esimaes of parameers in he numeraor polynomial (i.e., ω(b) ) be provided. If reasonable iniial esimaes are no provided, he esimaion process may resul in an overflow error and cause he esimaion process o erminae. A simple sraegy o preven such an overflow error is o proceed sequenially whenever a denominaor polynomial is o be used. Firs, esimae he model wihou denominaor erms o obain reasonable esimaes of he erms in ω (B). Nex use he CHANGE senence o inser he denominaor erms ino he model. To illusrae his, suppose he response funcion we waned o use o describe he effec of Phase I of Secion 6.5 was ω1/(1 δ1b) insead of ω 1, he one acually used. As a resul, a some poin we would wan a componen like (W1)/(1-D1*B)PHASE1(BINARY,1)

182 INTERVENTION ANALYSIS 6.21 in he model. However, since we are unsure of he approximae value for ω 1 iniially, i would be unwise o specify he model wih his componen immediaely. Insead, we should use a model ha includes he componen (W1)PHASE1(BINARY,1) such as in he CPIMODEL specified in Secion 6.5. Afer an iniial esimaion, we can change he componen using a command similar o TSMODEL CPIMODEL. CHANGE (W1)/(1-D1*B)PHASE1(BINARY,1). In his manner, we are more cerain of esimaing ω/(1 δ B) beginning from a reasonable esimae of ω. In using his sraegy, i is necessary ha label(s) are given o he numeraor parameer(s) Forecasing from an inervenion model Forecass calculaed using he SCA Sysem for inervenion models are minimum mean squared error forecass, discussed in Secion The basic difference beween forecasing an ARMA model and an inervenion model is ha he inervenion model includes binary series represening inervenion periods. Since binary series are deerminisic and canno be forecased, we mus provide he fuure values of hese series. Tha is, he variables in he SCA workspace ha conain he daa for he inervenion indicaor may need o be appended using ediing paragraphs (see Appendix B). The exra values in he binary series represen he envisioned fuure of he inervenion. For example, if we were o forecas 12 values from he end of he daa using CPIMODEL of Secion 6.5, hen we would need o append 12 zero values o he end of PHASE1 and 12 values o he end of PHASE2 indicaing how much longer he second inervenion period would be. We can iniially generae longer series for PHASE1 and PHASE2 if we know ha we will laer forecas from he model. By defaul, he ESTIM paragraph will only use he commonly shared periods of PHASE1, PHASE2 and RATECPI Consrains on parameers Consrains on parameers in an inervenion model are accommodaed in he same manner as in he case of ARMA parameers. Parameers may be fixed o a specific value or consrained o be equal o oher parameers using he FIXED-PARAMETER or CONSTRAINT senences in he TSMODEL paragraph. These senences have he same meaning as hose described in Secion 5.2. In addiion, if we use he same label names o represen wo or more parameers, hese parameers will be held equal o one anoher during model esimaion.

183 6.22 INTERVENTION ANALYSIS Noaional shorhand The noaional shorhand available for ARIMA model specificaion (see Secion 5.4.5) exends o inervenion model specificaion as well. The only appreciable difference is ha he numeraor of an inervenion componen can conain a conemporaneous (i.e., a zeroorder) erm. Each inervenion componen permis parameers o be abbreviaed as (order of he backshif erm; parameer labels or values) where he parameer labels may be omied as before. To illusrae longhand and shorhand expressions of a model, he following specificaions of a model are all equivalen (provided all parameers are esimaed wihou consrain): RATECPI((1-B)) = CONST + (W1)PHASE1(BINARY,(1-B)) + (W2)PHASE2(BINARY,(1-B)) + (1-TH*B)NOISE RATECPI(1) = CONST + (W1)PHASE1(BINARY,1) + (W2)PHASE2(BINARY,1) + (1-TH*B)NOISE RATECPI(1) = CONST + (0;W1)PHASE1(BINARY,1) + (0;W2)PHASE2(BINARY,1) +(1;TH)NOISE RATECPI(1) = CONST +(0)PHASE1(BINARY,1) + (0)PHASE2 + (1)NOISE

184 INTERVENTION ANALYSIS 6.23 SUMMARY OF THE SCA PARAGRAPHS IN CHAPTER 6 This secion provides a summary of hose SCA paragraphs employed in his chaper. The synax for many paragraphs is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of a paragraph, while he full display presens all possible modifying senences of a paragraph. In addiion, special remarks relaed o a paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are TSMODEL, ESTIM, FORECAST, and SIMULATE. Legend (see Chaper 2 for furher explanaion) v i r w : variable or model name : ineger : real value : keyword TSMODEL Paragraph The TSMODEL paragraph is used o specify or modify an inervenion model. The paragraph is also used for he specificaion or modificaion of an ARIMA or ransfer funcion model. The synax descripion for hese usages is provided in Chapers 5 and 8, respecively. For each model specified in a TSMODEL paragraph, a disinguishing label or name mus also be given. A number of differen models may be specified, each having a unique name, and subsequenly employed a a user's discreion. Moreover, he label also enables he informaion conained under i o be modified.

185 6.24 INTERVENTION ANALYSIS Synax for he TSMODEL Paragraph Brief synax TSMODEL NAME IS model-name. MODEL IS model. Required senence: NAME Full synax TSMODEL NAME IS model-name. MODEL IS model. ADD componens of a model. CHANGE componens of a model. DELETE CONSTANT. FIXED-PARAMETERS ARE v1, v2, ---. CONSTRAINTS ARE (v1,v2,---), ---, (v1,v2,---). VARIANCE IS v. SHOW./NO SHOW. CHECK./NO CHECK. ROOTS./NO ROOTS. SIMULATION./NO SIMULATION. UPDATE./NO UPDATE. Required senence: NAME Senences Used in he TSMODEL Paragraph NAME senence The NAME senence is used o specify a unique label (name) for he model specified in he paragraph. This label is used o refer o his model in oher ime series relaed paragraphs or if he model is o be modified. MODEL senence The MODEL senence is used o specify an inervenion model. ADD senence The ADD senence is used o specify componen erms ha will be added o an exising model. More informaion is provided in Secion

186 INTERVENTION ANALYSIS 6.25 CHANGE senence The CHANGE senence is used o modify componen erms of an exising model. More informaion is provided in Secion DELETE senence The DELETE senence is used o delee inervenion componens or he consan erm from an exising inervenion model. An inervenion componen is deleed by lising he name of he binary variable represening he inervenion period. The consan erm is deleed by specifying he keyword CONSTANT. Once he consan erm is deleed, i can only be re-insered using he MODEL senence. FIXED-PARAMETER senence The FIXED-PARAMETER senence is used o specify he parameers whose values will be held consan during model esimaion, where v s are he parameer names. See Secion 5.2 for a brief discussion of his senence. The defaul condiion is ha no parameers are fixed. CONSTRAINT senence The CONSTRAINT senence is used o specify ha he parameers wihin each pair of parenheses will be consrained o have he same value during model esimaion. See Secion for a brief discussion of his senence. The defaul condiion is ha no parameers are consrained o be equal. VARIANCE senence The VARIANCE senence is used o specify a variable where he value of he noise variance is or will be sored. If a value for he variable is known, his value will be used as iniial variance in esimaion and he final esimaed value of he variance will be sored in his variable for fuure esimaion or in forecasing. Oherwise he variance is calculaed from he residual series derived from he specified model and parameer esimaes. Noe ha he SCA Sysem designaes an inernal variable for he VARIANCE senence so ha he specificaion of his senence is opional. SHOW senence The SHOW senence is used o display a summary of he specified model. The defaul is SHOW. The summary includes series name, differencing (if any), span for daa, parameer labels (if any) and curren values for parameers. CHECK senence The CHECK senence is used o check wheher all roos of he AR, MA, and denominaor polynomials lie ouside he uni circle. The defaul is NO CHECK. ROOTS senence The ROOTS senence is used o display all roos of he AR, MA and denominaor polynomials. The defaul is NO ROOTS.

187 6.26 INTERVENTION ANALYSIS SIMULATION senence The SIMULATION senence is used o specify ha he model will be used for simulaion purposes. Ordinarily his senence is no specified. See Secion or for more deails. The defaul is NO SIMULATION. UPDATE senence The UPDATE senence is used o specify ha parameer values of he model are updaed using he mos curren informaion available. The defaul is NO UPDATE. In he defaul case, parameer values are updaed only afer execuion of he ESTIM paragraph raher han immediaely. ESTIM Paragraph The ESTIM paragraph is used o conrol he esimaion of he parameers of an inervenion model. Synax of he ESTIM Paragraph Brief synax ESTIM MODEL v. HOLD RESIDUALS(v). Required senence: MODEL Full synax ESTIM MODEL v. METHOD IS w. STOP-CRITERIA ARE MAXIT(i), LIKELIHOOD(r1), ESTIMATE(r2). SPAN IS i1, i2. HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: MODEL

188 INTERVENTION ANALYSIS 6.27 Senences Used in he ESTIM Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model o be esimaed. The label mus be one specified in a previous TSMODEL paragraph. METHOD senence The METHOD senence is used o specify he likelihood funcion used for model esimaion. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is CONDITIONAL. STOP senence The STOP senence is used o specify he sopping crierion for nonlinear esimaion. The argumen, i, for he keyword MAXIT specifies he maximum number of ieraions (defaul is i=10); he argumen, r1, for he keyword LIKELIHOOD specifies he value of he relaive convergence crierion on he likelihood funcion (defaul is r1=0.0001); and he argumen, r2, for he keyword ESTIMATE specifies he value of he relaive convergence crierion on he parameer esimaes (defaul is r2=0.001). Esimaion ieraions will be erminaed when he relaive change in he value of he likelihood funcion or parameer esimaes beween wo successive ieraions is less han or equal o he convergence crierion, or if he maximum number of ieraions is reached. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: RESIDUAL : he residual series FITTED : he one-sep-ahead forecass (fied values) of he series VARIANCE : variance of he noise DISTURBANCE : he disurbance series of he model OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF : esimaes and heir relaed saisics only

189 6.28 INTERVENTION ANALYSIS NORMAL DETAILED : RCORR : ITERATION, CORR, and RCORR where he keywords on he righ denoe: ITERATION : he parameer and covariance esimaes for each ieraion CORR : he correlaion marix for he parameer esimaes RCORR : he reduced correlaion marix for he parameer esimaes (i.e., a display in which all values have no more han wo decimal places and hose esimaes wihin wo sandard errors of zero are displayed as dos,. ). FORECAST Paragraph The FORECAST paragraph is used o compue he forecas of fuure values of a ime series based on a specified inervenion model. The binary variables represening inervenion periods mus be defined for he forecas period (see Secion 6.7.3). The FORECAST paragraph requires he curren esimae of he variance σ2 o compue sandard errors of forecass. The variance for he esimaed model is always sored inernally during he execuion of he ESTIM paragraph, bu he inernal esimae is overwrien a each subsequen execuion of a ESTIM paragraph for he same model. The FORECAST paragraph has oher senences available, bu are no described below. These are used in he forecasing of ransfer funcion models and are described in Chaper 8. Synax of he FORECAST Paragraph Brief synax FORECAST MODEL v. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. Required senence: MODEL

190 INTERVENTION ANALYSIS 6.29 Full synax FORECAST MODEL v. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. JOIN. /NO JOIN. METHOD IS w. HOLD FORECASTS(v1,v2,---), STD_ERRS(v1,v2,---). OUTPUT PRINT(w), NOPRINT(w). Required senence: MODEL Senences Used in he FORECAST Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model for he series o be forecased. The label mus be one specified in a previous TSMODEL paragraph. NOFS senence The NOFS senence is used o specify for each ime origin he number of ime periods ahead for which forecass will be generaed. The number of argumens in his senence mus be he same as ha in he ORIGINS senence. The defaul is 24 forecass for each ime origin. ORIGINS senence The ORIGINS senence is used o specify he ime origins for forecass. The defaul is one origin, he las observaion. JOIN senence The JOIN senence is used o specify ha he forecass calculaed should be appended o he variable of he model relaive o he specified origin. If more han one origin is specified only he las will be used. The defaul is NO JOIN. METHOD senence The METHOD senence is used o specify he likelihood funcion used for he compuaion of he residual series employed in forecasing. The keyword may be CONDITIONAL for he condiional likelihood, or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is EXACT. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are:

191 6.30 INTERVENTION ANALYSIS FORECASTS STD_ERRS : forecass for each corresponding ime origin : sandard errors of he forecass a he las ime origin OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for various saisics. The defaul condiion is PRINT(FORECASTS); ha is, o display forecas values for each ime origin. To suppress his, specify NOPRINT(FORECASTS). SIMULATE Paragraph The SIMULATE paragraph is used o generae daa according o a user specified univariae ime series model. See Secions and for more informaion on his paragraph. A univariae ime series model mus have been specified previously using he TSMODEL paragraph. The paragraph is also used o generae daa according o a user specified disribuion. More informaion on his can be found in Chaper 12 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Synax for he SIMULATE Paragraph SIMULATE VARIABLE IS v. MODEL IS model-name. NOISE IS disribuion (parameers) or VARIABLE(v). NOBS IS i. SEED IS i. Required senences: MODEL, NOISE and NOBS Senences Used in he SIMULATE Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he variable o sore he simulaion resuls. The senence is no required if a univariae ime series is generaed. If he senence is no specified, he variable name used in he MODEL senence of he TSMODEL paragraph is used o sore he resuls. MODEL senence The MODEL senence is used o specify he name (label) of he model o be simulaed. The model may be an ARIMA model specified in a TSMODEL paragraph. The senence SIMULATION mus also appear in he TSMODEL paragraph.

192 INTERVENTION ANALYSIS 6.31 NOISE senence The NOISE senence is used o specify he noise sequence for he simulaed ime series model. Eiher he disribuion for generaing he noise sequence or he name of a variable conaining values o be used as he sequence is specified. The following disribuions can be used: U(r1,r2) : uniform disribuion beween r1 and r2 N(r1,r2) : normal disribuion wih mean r1 and variance r2 MN(v1,v2): mulivariae normal disribuion wih mean vecor v1 and covariance marix v2. Noe ha v1 and v2 mus be names of variables defined previously. NOBS senence The NOBS senence is used o specify he number of observaions o be simulaed. SEED senence The SEED senence is used o specify an ineger or he name of a variable for saring he random number generaion. When a variable is used, he seven digi value is used as a seed if i is no defined ye, or he value of he variable is used if he variable is an exising one. Afer he simulaion, he variable conains he seed las used. The number of digis for he seed mus no be more han 8 digis. The defaul is REFERENCES Abraham, B., and Ledoler, J. (1983). Saisical Mehods for Forecasing. New York: Wiley. Box, G.E.P., and Tiao, G.C. (1965). A Change in Level of a Non-Saionary Time Series. Biomerika 52: Box, G.E.P. and Tiao, G.C. (1975). Inervenion Analysis Wih Applicaions o Economic and Environmenal Problems. Journal of he American Saisical Associaion 70: Liu, L.-M. and Tiao, G.C. (1980). Parameer Esimaion in Dynamic Models. Communicaion in Saisics A9: Vandaele, W. (1983). Applied Time Series Analysis and Box-Jenkins Models. New York: Academic Press. Wei, W.W.S. (1990). Time Series Analysis: Univariae and Mulivariae Mehods. Redwood Ciy, CA: Addison-Wesley.

193

194 CHAPTER 7 OUTLIER DETECTION AND ADJUSTMENT As noed in Chaper 6, ime series are ofen subjec o unexpeced or unconrolled evens. If hese evens are known o us, we may be able o accoun for heir effecs hrough an inervenion model. However, if he evens are no iniially known, or if he imes of he evens are unknown, hen oher approaches may be necessary for heir deecion and adjusmen. This chaper considers how o deec and adjus for he effecs of such unknown or unexpeced occurrences in a ime series. These unusual observaions are referred o as ouliers. Depending on heir naure, ouliers may have moderae o subsanial impac on an analysis. I is imporan o deec ouliers for a number of reasons: (1) Beer undersanding of he series under sudy. The deecion of ouliers may highligh he occurrences of hose exernal evens affecing a series, and in wha manner. Uncovering such occurrences can lead o enlighenmen on why a series performs as i does. In addiion, we may discover spurious observaions (e.g., recording errors) ha may mask he proper modeling of a ime series. (2) Beer modeling and esimaion. Unknown exernal evens can aler he srucures of saisics ypically used for model idenificaion. Uncovering ouliers can resul in simplifying he srucure of a model used. Moreover, even if we employ he proper model for a series, he presence of unaccouned exernal evens may seriously affec he parameer esimaes of he model. (3) Improved inervenion analyses. As noed above, parameer esimaes can be affeced by he presence of unknown exernal evens. As a resul, if we employ an inervenion model, we need o be cerain ha he inervenion effecs are no conaminaed by any oulier effecs. In his manner we are also more confiden ha es saisics for parameer esimaes will no be biased due o an inflaed variance. (4) Beer forecasing performance. Depending upon he iming and naure of he even, an exernal even may affec he forecasing performance of a model. By adjusing for he presence of an oulier, we may be able o improve he forecass and he overall forecasing performance of a model. In addiion, should a deeced even re-occur, we may be able o beer forecas how a series will respond o i. Addiional informaion regarding he naure and moivaion for oulier deecion and adjusmen can be found in Fox (1972), Chang (1982), Hillmer, Bell, and Tiao (1983), Tsay (1988), Chang, Tiao and Chen (1988), Ledoler (1987 and 1989), Pankraz (1991), Chen and Liu (1990), and Liu and Chen (1991).

195 7.2 OUTLIER DETECTION AND ADJUSTMENT 7.1 Ouliers in a Time Series In Chaper 5, we inroduced he auoregressive moving average (ARMA) model ha may be wrien as: or more simply Z Z Z C a a a, (7.1) φ1 1 φ p p = + θ1 1 θq q φ (B)Z = C +θ(b)a. (7.2) The model of equaion (7.2) can be direcly exended o include differencing operaors o induce saionariy and o encompass seasonal erms (as muliplicaive AR or MA operaors, see Secion 5.3). In Chaper 6, we inroduced deerminisic (binary) series ino a ime series model o represen inervenions. In he laer case, equaion (7.2) was used o represen he model for he underlying disurbance erm. To faciliae our undersanding of ouliers, in his secion we will concenrae our discussions o non-seasonal models. Moreover we will assume C=0 so ha we may re-wrie (7.2) as θ(b) Z = a. (7.3) φ (B) In he above equaion, use Y Z o represen he values observed for represens a series ha is no conaminaed wih ouliers. We will in he presence of an oulier. As we will see, our represenaion for an oulier will ake he form of he inervenion model used in Chaper 6 in which: (a) he inervenion period mus be deermined, and Z (b) he disurbance erm, N, represens he unconaminaed series Z of equaion (7.2) or (7.3). We will now define and illusrae four ypes of ouliers. These are addiive oulier (AO), innovaional oulier (IO), level shif (LS), and emporary (or ransien) change (TC). To illusrae he effec of each ype of oulier, we consider how an oulier affecs he values of a simulaed AR(1) process. For his purpose, 65 observaions are simulaed from he model 1 Z = a, wih σ a = B The daa are shown in Figure 7.1.

196 OUTLIER DETECTION AND ADJUSTMENT 7.3 Figure 7.1 Daa from a simulaed AR(1) process Addiive oulier (AO) An addiive oulier (AO) is an even ha affecs a series for one ime period only. One illusraion of an AO is a recording error (e.g., he acual value 2.1 may be recorded as 21.0, 0.1, or he like). For his reason, an addiive oulier is someimes called a gross error. If we assume ha an oulier occurs a ime =T, we can represen he series we observe by he model Y = Z +ω P (T) A (7.4) (T) (T) where is a pulse funcion (ha is, assumes he value 1 when = T and is 0 P oherwise). The value ωa P represens he amoun of deviaion from he rue value of To illusrae he effec of an AO on he base AR(1) model, we include an AO a ime = 30 wih ω A = 5. The plo of he resulan series is shown, ogeher wih he original value a =30 in Figure 7.2. We see ha all observaions are unchanged, excep for he change in he value a = 30. Z T. Figure 7.2 Addiive oulier a = 30 in a simulaed AR(1) process

197 7.4 OUTLIER DETECTION AND ADJUSTMENT Innovaional oulier (IO) Unlike an addiive oulier, an innovaional oulier (IO) is an even whose effec is propagaed according o he ARIMA model of he process. In his manner, an IO affecs all values observed afer is occurrence. In pracice, an IO ofen represens he onse of an exernal cause (Tsay, 1988). The model for he observed series is θ(b) Y = Z + ωp φ(b) (T) I (7.5) The model given in (7.5) can be re-wrien as θ(b) (T) Y = (a +ωip ) (7.6) φ(b) We may beer undersand he difference beween an IO and an AO by comparing (7.6) wih (7.4). We see in (7.4) ha an AO alers only he observaion Z, while an IO alers only he shock a. As a resul, an AO only affecs one observaion, Y, while he effec of an IO is T presen in all values of Y f or T according o he ψ -weighs of he model (see Box and Jenkins (1970) for mo re informaion regarding ψ -weighs). The erminology IO arises because of he represen aion given in (7.6) as he series { a } is someimes referred o as he innovaion series. To illusrae he effec of an IO on he base AR(1) model, we include an IO a ime = 30 wih ω I = 5. The plo of he resulan series, along wih he original poins, is shown in Figure 7.3. We may observe ha he values ploed from =30 hrough =38 are all noiceably above hose of he original series. Moreover, a comparison of he values of he simulaed AR(1) series and hose wih he IO effec presen reveals he effec of he IO can be observed (o 3 significan digis) hrough = 47. T T Figure 7.3 Innovaional oulier a = 30 in a simulaed AR(1) process

198 OUTLIER DETECTION AND ADJUSTMENT Level shif (LS) A level shif (LS) is an even ha affecs a series a a given ime, and whose effec becomes permanen. A level shif could reflec he change of a process mechanism, he change in a recording device, or a change in he definiion of he variable iself. The model for he series we observe may be represened by (T) Y = Z 1 + ωlp (7.7) 1 B Equaion (7.7) is he same as Y = Z +ω S (T) L (7.8) (T) where S is a sep funcion (i.e., assumes he value 0 before = T and has he value 1 hereafer). We can see ha he model for an AO, given by (7.4), and he model for a level (T) shif, given by (7.8), are he same, excep ha an AO affecs only a = T ( ) while an LS affecs permanenly from = T onwards ( ). Z To illusrae he effec of an LS, we include an LS a ime = 30 on he base AR(1) model. As before, we use ω L = 5. Plos of he resulan series and he original series are shown in Figure 7.4. We observe ha afer =30 he mean level of he resulan series is higher han before. Excep for his, he wo series are idenical in all oher ways. (T) S Z P Figure 7.4 Level shif a = 30 in a simulaed AR(1) process Temporary change (TC) An addiive oulier (AO) and a level shif (LS) represen wo disinc paerns in which an even affecs a series. For a level shif, he level of he underlying process is affeced for all fuure ime, while an addiive oulier affecs he series for only one ime period. I is useful o consider an even ha has some iniial impac on a series, and he impac evenually disappears. A emporary (or ransien) change (TC) is an even having such an iniial impac

199 7.6 OUTLIER DETECTION AND ADJUSTMENT and whose effec decays exponenially according o some dampening facor, say δ. We can represen he observed series as 1 (T) Y = Z + ω CP, 0<δ< 1 (7.9) 1 δb We can see ha (7.4) and (7.7) are he limiing cases of (7.9). In (7.4), he dampening facor δ is 0, while in (7.7) his facor is 1. To illusrae he effec of a TC, we include a TC a ime =30 wih ω C =5 and δ=.8. Plos of he resulan series and he original series are displayed in Figure 7.5. We may noe he resulan plo looks similar o ha of an IO. This is especially rue in he case of an AR(1) model since he form of he decay of he impac is idenical o an AR(1). Here, he TC is idenical o an IO if δ =.6. Since δ is relaively close o 1, he effec of he oulier is discernible o he eye for a number of periods (here hrough abou = 45). Figure 7.5 Temporary change a = 30 in a simulaed AR(1) process 7.2 Mehods for Oulier Deecion and Adjusmen In his secion we provide an overview of mehods for deecion and adjusmen of one or more ouliers. This secion may be skipped on firs reading and laer referenced as necessary. A more complee discussion of he maerials presened in his secion may be found in Chen and Liu (1990) and Chang, Tiao and Chen (1988) Oulier deecion when ARMA parameers are known I is naural o consider he residuals of a fied model for use in deecing ouliers in a ime series, since mos diagnosic checks of a model are based on residuals (see Secions and 5.1.5). However, ouliers in a ime series can affec boh he model we may idenify for he series as well as he parameer esimaes of he idenified model. As a resul, i is unclear how useful he residuals may be for oulier deecion in cerain siuaions. To beer undersand how a single oulier manifess iself in he residual series, consider he filered series

200 OUTLIER DETECTION AND ADJUSTMENT 7.7 e =π(b) Y (7.10) where π(b) is he polynomial operaor in he π-weighs of he ARIMA model (see Secion 5.1.2). The weighs in π(b) may be obained by equaing coefficiens in he backshif operaor in an expression involving π(b) and he polynomial operaors of he model. In he case of he nonseasonal model (i.e., he ARMA model of (7.3)), hese π-weighs may be compued from θ(b) π (B) =φ(b) (7.11) The values of e become he residuals of he fied model if he π-weighs are compued from he esimaed parameers of he ARIMA model raher han from he known parameers of he rue ARIMA model. To illusrae he filering concep above and how a single oulier may appear in he residual series, we consider an AO imposed on he base AR(1) model (see Secion 7.1.1). The rue model for our original series is (1.6B)Z = a, wih σ = 1.0. a As a resul, from (7.11) we obain π (B) =φ (B) = (1.6B) I is informaive o compare he filered series we obain by applying he above π(b) o boh he original series, Z, and he conaminaed series, Y. The series obained from π (B)Z produces he rue noise series used in generaing he daa, while π(b)y produces he residual series by applying he rue value of φ (i.e., 0.6) o filer h e conaminaed series. These wo series are ploed ogeher in Figure 7.6. The series are idenical excep a =30 and =31. Figure 7.6 Filered series, π(b)y, for a simulaed AR(1) process wih an AO; and filered values when oulier is no presen(o)

201 7.8 OUTLIER DETECTION AND ADJUSTMENT Alhough an AO only affecs he observed series for one period, i affecs he filered (residual) series for more han one period. Specifically, he informaion (affec) for an AO in he series e begins a he period in which he AO occurs, and hen decays according o he π- weighs of he ARIMA model. Hence we canno deec an AO by simply looking for a single large oulier in he residual (filered) series. Similarly, he effec of a single IO, LS or TC is no he same in boh he observed and residual series. The effec of a single oulier on residuals ypically is no as clean as displayed above, since he oulier also affecs he esimaion resuls of our fied model. We can observe he influence ha ouliers have on parameer esimaion by fiing an AR(1) model o he four simulaed series ha have been considered previously. Table 7.1 liss he esimae of φ, is sandard error, and he esimae of for each of he four simulaed series. σ a Table 7.1 Esimaion resuls for an AR(1) fi of he simulaed AR(1) processes Case ˆφ S.E. of ˆφ ˆσ a Wihou oulier AO a = IO a = LS a = TC a = Depending on he naure of he oulier presen, we see differen effecs on he esimaes of φ and σa. Excep in he LS case, he esimaes of φ are raher close o he rue parameer. Due o he naure and he posiioning of he LS oulier, he fied model for Y is approximaely (1 B)Y = a σ a and he esimae of is more inflaed han ha of he oher cases. In all cases, excep for ha of he LS oulier, he residuals obained have a plo similar o ha of is associaed e shown earlier. Hence alhough he esimae of φ may be biased, he informaion we may expec o exrac regarding ouliers from hese residuals is similar o ha provided by e wen h φ is known. To illusrae his, in Figure 7.7 we plo he residual series of boh he original series and he series conaminaed wih an AO. We see he residual series are virually idenical o hose displayed in Figure 7.6. Hence he residuals of he conaminaed series conain almos complee informaion for he deecion and esimaion of ouliers.

202 OUTLIER DETECTION AND ADJUSTMENT 7.9 Figure 7.7 Residual series for a simulaed AR(1) process wih an AO (solid line) and ha when oulier is no presen (dashed line) Suppose we have a single oulier, say an AO a ime T, in he series Y. We can obain an analyic descripion of e by subsiuing (7.4) ino (7.10). Similar analyic descripions can be derived for a single IO, LS, or TC in like manner. The precise analyic descripions of for each ype of oulier are provided in Secion e We may be able o use he analyic represenaion of e o es for he effec of an oulier. If only one oulier occurs in a ime series, hen a leas squares esimae for he effec of he oulier a ime = T, ω ˆ i(i= 1,2,3,4), and he saisics ha may be used for esing is significance can be easily derived (see Chang, Tiao, and Chen, 1988, and Chen and Liu 1990). An adjused series (i.e., one wih he oulier effec removed) can also be obained. However, some problems remain since: (1) we do no know wheher an oulier occurs, and if i occurs, he ime of is occurrence; (2) in he even here is an oulier, we do no know is ype; (3) here may be more han one oulier presen in he series; and (4) we do no know precisely wha he rue underlying model is, nor are we sure of he accuracy of he esimaes of a correc model. Procedures o accoun for (1) - (3) above have been developed during he pas few years. Mos of hese oulier deecion procedures are based on he residuals from fied models. In his way, we can diagnosically check a fied model for he presence of ouliers. An overview of such a procedure is provided below. Recenly, Chen and Liu (1990) developed an ieraive procedure for he join esimaion of model parameers and oulier effecs. This procedure addresses problems (1) - (4) above more horoughly Deecing ouliers from a fied model In pracice, he ARMA parameers and σ a are unknown, bu esimaes for he model parameers and σ a can be obained. We may hen use he residuals of he fied model (i.e., ê ) o check for ouliers in he series. Chang (1982), Hillmer, Bell, and Tiao (1983), and Chang, Tiao and Chen (1988) all provide a similar procedure for deecing ouliers in such a case, as we now summarize.

203 7.10 OUTLIER DETECTION AND ADJUSTMENT Since we do no know when an oulier may occur nor is ype, we firs proceed sequenially hrough ime and calculae four es saisics (one for each ype of oulier) for each ime index. We mainain he larges es saisic (in absolue value) for each oulier ype and reain is ime index. We hen compare he larges (in absolue value) of all hese saisics wih some pre-specified criical value. If he criical value is no exceeded, hen i is concluded here is no oulier in he series. However, if he criical value is exceeded, hen we have deermined ha an oulier has occurred and have idenified is ype. The residuals are now adjused for he presence of he deeced oulier and a new esimae of σ a is compued. We again proceed hrough he adjused residuals o see if anoher oulier can be deeced. We ieraively deec and adjus residuals unil no addiional oulier can be found. The criical value for such ess is dependen on he underlying ARIMA model and he sample size. As a resul, only broad guidelines can be provided for a general choice of he criical value. In pracice, he value 3.0 provides reasonable sensiiviy o ouliers. Lower sensiiviy is provided by using larger criical values and higher sensiiviy is provided by using smaller criical values. Ofen a value less han 3.0 is recommended for ime series wih a small number of observaions (say fewer han 100 or so). Alhough he above procedure can be used as a simple device for he deecion of ouliers in a ime series, wo poenial problems exis. Firs, i may be argued ha he ieraive search for ouliers may no be efficien. Second, and more imporanly, he deecion procedure is compleely dependen on he ARIMA model ha has been idenified and esimaed based on he conaminaed series, which ofen has biased parameer esimaes. The OUTLIER paragraph of he SCA Sysem employs a procedure similar o ha described above o deec ouliers in a fied model. Temporary changes (TC) are no considered in he curren release of he OUTLIER paragraph. The OFILTER paragraph employs a procedure described in Secion and may be used in lieu of he OUTLIER paragraph. The OFILTER paragraph can deec all four ypes of oulier, and is discussed in more deail in Secion Adjusmen of deeced ouliers using inervenion models In Secion 7.2.2, we oulined a procedure for he deecion of ouliers when he ARIMA parameers of a model are known (or have been esimaed). Such a procedure can be used as a diagnosic check of a fied model. We now address he issues for he deecion and adjusmen of ouliers. In doing so, we need o consider: (1) Model re-esimaion, o obain beer esimaes of ARIMA parameers as well as checking on he general adequacy of he underlying ARIMA model, and (2) Incorporaion of oulier effecs wihin a model, o esimae poenial oulier effecs joinly wih he underlying ARMA model in order o check wheher he ouliers deeced are real.

204 OUTLIER DETECTION AND ADJUSTMENT 7.11 Two procedures are discussed. Deails regarding hese procedures may be found in Chang, Tiao and Chen (1988), and Chen and Liu (1990). A sraighforward procedure for oulier deecion and adjusmen is o sequenially employ he deecion echniques described in Secion wih inervenion models described in Chaper 6. A mehod for implemening he procedure is described in Chang, Tiao, and Chen (1988). In his procedure, an ARIMA model is firs idenified and esimaed assuming here are no ouliers presen. The oulier deecion procedure is applied o he residuals o check if any ouliers are presen. If so, an adjused model is esimaed. This model includes deeced ouliers as inervenion componens. Oulier deecion and adjusmen coninues as necessary afer he inervenion model is esimaed. This procedure apparenly can be laborious and ime consuming. The above procedure can be conduced in he SCA Sysem using he TSMODEL, ESTIM and OUTLIER paragraphs. Special consideraions involving model specificaion mus be aken in he even an IO is deeced. More deails regarding employing such a procedure may be found in Pankraz (1991) and Wei (1990) An ieraive procedure for join esimaion of model parameers and oulier effecs The inervenion based procedure oulined above is useful o a cerain exen. However, such a procedure has some deficiencies. Among hese are: (1) ouliers may resul in an inappropriaely idenified iniial model, (2) he efficiency of he oulier deecion procedure may be affeced by he bias in parameer esimaes due o he presence of ouliers, (3) some ouliers may be masked and no idenified, and (4) some spurious ouliers may be deeced. Chen and Liu (1990) propose an ieraive procedure for he join esimaion of model parameers and oulier effecs o address hese concerns. This procedure provides he basis of he SCA OESTIM paragraph for he esimaion of a ime series model in he presence of possible ouliers. An ouline of he seps of he procedure is presened in Secion A more complee discussion of his join esimaion procedure can be found in Chen and Liu (1990). As in he previous procedure of Chang, Tiao and Chen (1988), he procedure sars wih a model having poenially biased parameer esimaes. An ieraive oulier deecion procedure is applied o he residuals of he empirically buil model. The original series is adjused (o remove he effecs of ouliers) according o he ypes of he deeced ouliers and heir effecs. The usual maximum likelihood esimaion is applied o he adjused series. The residuals of he above esimaed model are examined again. The hree seps (1) oulier deecion, (2) oulier adjusmen, and (3) parameer esimaion based on he adjused series are ieraed unil no ouliers are found. A his poin, he accumulaed informaion of ouliers is employed o joinly esimae he oulier effecs and produce a series of final adjused observaions. Afer

205 7.12 OUTLIER DETECTION AND ADJUSTMENT his sep, he maximum likelihood esimaion is applied o he final adjused series o obain he final esimaes of he parameers. A he las sep, he oulier deecion procedure is applied o he residuals of he original series using he final parameer esimaes of he model. This join esimaion procedure differs from ha described in he previous secion in several respecs. Firs, he oulier deecion is conduced ieraively based on he adjused residuals as well as he adjused observaions. Tha is, once an oulier is deeced, is effec can be removed from he observed series, jus as i can be removed from he residuals of he esimaed model. By adjusing he observed series, he procedure avoids he need o formulae and esimae an inervenion model. Secondly, he ouliers are deeced based on robus esimaes of model parameers. Finally, in he new procedure he oulier effecs are joinly esimaed using muliple regression. As a resul, he new procedure produces more robus esimaes of model parameers, and reduces spurious ouliers and masking effecs in oulier deecion. Dampening facor in a emporary change In he oulier deecion procedures discussed above, he dampening facor (δ) of a TC oulier is no esimaed. A single value is used hroughou he procedure. The defaul value for δ is 0.7, he value recommended by Chen and Liu (1990). The OESTIM paragraph permis he specificaion of a differen value for δ. Since he value for δ is fixed, we see ha only he effecs of TC ouliers are esimaed. ω i 7.3 Example: Producion Process Daa To illusrae oulier deecion (using he OUTLIER paragraph) and oulier deecion and adjusmen (using he OESTIM paragraph), we re-consider he daily producion daa of an auomoive componen. The daa were used in Secion 6.4 and are sored in he SCA workspace under he label PRODUCTN. A plo of he series is given in Figure 7.8. Figure 7.8 Producion process daa In Secion 6.4 we noed ha a process change occurred a =47 causing a mean level change. The fied equaion of he final inervenion model esimaed for his series was PRODUCTN = S (47). (7.12)

206 OUTLIER DETECTION AND ADJUSTMENT 7.13 To illusrae oulier deecion and adjusmen in he SCA Sysem, we will now model PRODUCTN assuming we were unaware of he inervenion ha occurred. We may firs compue he ACF of PRODUCTN by enering -->ACF PRODUCTN. MAXLAG IS 12. TIME PERIOD ANALYZED TO 85 NAME OF THE SERIES PRODUCTN EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXXX+XXXXXXXXX IXXXXXX+XXXXXX IXXXXXXX+XXXXX IXXXXXXXX+XX IXXXXXXXX IXXXXXXXX IXXXXXXXXX IXXXXXXXXX IXXXXXXX IXXXXXXXXX IXXXXXXXXXX IXXXXXXXXX + Based on he above ACF, we would conclude ha PRODUCTN is no saionary. We can obain he ACF and PACF for he firs difference of PRODUCTN using he IDEN paragraph (SCA oupu is edied for presenaion purposes). -->IDEN PRODUCTN. DFORDER IS 1. MAXLAG IS DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 85 NAME OF THE SERIES PRODUCTN EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) I XXXXXX+XXXXI XI IXX IX XXXI +

207 7.14 OUTLIER DETECTION AND ADJUSTMENT XI IX IXX XXXI IX IX IXXX + PARTIAL AUTOCORRELATIONS I XXXXXX+XXXXI XXX+XXXXI XXXI IX XXI XXXXI XXXXI XI XXXI XXXXI XXXXI IXX + Since he ACF of he differenced series cus off afer he firs lag and he PACF dies ou, we would conclude ha an ARIMA(0,1,1) model is appropriae for he series. The sample EACF of PRODUCTN (no shown here) indicaes ha an ARMA(1,1) model is appropriae. Here p=1 represens he differencing operaor (i.e., (1 φ B) wih φ = 1). The EACF of he firs difference of PRODUCTN (no shown) confirms he use of a ARIMA(0,1,1) model. We will now specify and fi he model (1 B)Y = C + (1 θb)a (7.13) A consan erm is included in he model as a sligh over-parameerizaion. The exac likelihood algorihm is employed in esimaion since an MA parameer is presen in he model (see Secion 5.2). SCA oupu is edied for presenaion purposes. -->TSMODEL PRODUCT1. MODEL IS PRODUCTN(1)=CONST + (1-THETA*B)NOISE. -->ESTIM PRODUCT1. METHOD IS EXACT. HOLD RESIDUALS(RESP). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 PRODUCTN RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE

208 OUTLIER DETECTION AND ADJUSTMENT THETA PRODUCTN MA 1 1 NONE TOTAL SUM OF SQUARES E+07 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+07 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS.. 84 RESIDUAL VARIANCE ESTIMATE E+05 RESIDUAL STANDARD ERROR E+03 The fied equaion for his model is (1 B)PRODUCTN = (1.77B)a (7.14) wih he esimae of he consan erm no significanly differen from zero a he 5% level. The residuals have been reained under he label RESP for diagnosic checking. The ACF of he residual series does no indicae any anomalies. -->ACF RESP. MAXLAG IS 12. TIME PERIOD ANALYZED TO 85 NAME OF THE SERIES RESP EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXX IX IXX I XXXXXI XXXXI XXI XI XXXXI I IXX IXXX + Based on he above fi and diagnosic check, we may conclude ha an ARIMA(0,1,1) model (wihou a consan erm) is an adequae model for PRODUCTN. However, if we also use he OUTLIER paragraph as a diagnosic check, he following ouliers are revealed. -->OUTLIER PRODUCT1. TYPES ARE AO,IO,LS.

209 7.16 OUTLIER DETECTION AND ADJUSTMENT INITIAL RESIDUAL STANDARD ERROR = TIME ESTIMATE T-VALUE TYPE LS IO AO ADJUSTED RESIDUAL STANDARD ERROR = A level shif (LS) oulier is deeced a =47, he ime of he process change. Two oher ouliers are also deeced. Based on his diagnosic check, we would be led o he inervenion model used iniially in Secion 6.4 (wih perhaps addiional inervenion componens for =24 and =26). Hence we are direced oward he correc model. Alernaively, we could have esimaed model PRODUCT1 using he OESTIM paragraph, raher han he ESTIM paragraph. In his way he SCA Sysem will simulaneously deec ouliers and joinly esimae heir effecs wih he MA parameer. We may ener -->OESTIM PRODUCT1. METHOD IS EXACT. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 85 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 PRODUCTN RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE THETA PRODUCTN MA 1 1 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE TC LS TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E+03 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E+03 The resuls of he OESTIM paragraph reveal wo ouliers, a TC a =24 and an LS a =47. Moreover, wih he incorporaion of hese ouliers, he esimae of θ is close o 1.0, effecively cancelling he differencing operaor. We now will re-specify and re-fi he simpler model Y =µ+ a. (7.15)

210 OUTLIER DETECTION AND ADJUSTMENT >TSMODEL PRODUCT2. MODEL IS PRODUCTN=CONST+NOISE. -->OESTIM PRODUCT2. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 85 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- PRODUCT VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PRODUCTN RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE TC LS TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E+03 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E+03 The esimaion resuls indicae ha he producion daa follow a simple mean model. Before =47, he producion varies around he mean level of Afer =47 his mean level increases o abou 2277 (i.e., ). There was some sligh perurbaion in he process a =24. The fied equaion of he inervenion model (7.12) implies he mean level is abou 1795 before =47 and abou 2277 ( ) hereafer. The inervenion resuls are in remarkable accord wih he above fi (he higher mean level in PRODUCT2 prior o =47 is aribued o he adjusmen made for he TC deeced a =24). We see ha by use of he OESTIM paragraph, we boh discover he inervenion and produce a simpler model. 7.4 Inervenion Analysis in he Presence of Ouliers In his secion, we will demonsrae he use of oulier deecion and adjusmen in inervenion analysis (see Chaper 6). The essence of inervenion analysis is o isolae he effec of an inervenion from oher occurrences and he underlying disurbance presen in he series under sudy. Wihin he framework of an inervenion model, an observed series is described as he sum of various componens. These include he underlying ARIMA model and all known inervenion effecs.

211 7.18 OUTLIER DETECTION AND ADJUSTMENT As previously noed, undeeced ouliers in a ime series can bias he parameer esimaes of a model. Hence oulier deecion and adjusmen are essenial o he esimaion of an inervenion model. By incorporaing oulier deecion wihin an inervenion analysis, we can be more confiden ha we have no missed any imporan evens ha may influence he validiy of our findings. Moreover, oulier deecion and adjusmen may lead o changes in he parameer esimaes and he significance levels of inervenion effecs. The laer may be he resul of he improvemen in he esimae of he residual sandard deviaion (causing a once no significan es saisic o become significan) or a change in he parameer esimae due o he adjusmen of oulier effecs. We illusrae he use of oulier deecion and adjusmen in inervenion analysis by reesimaing he las wo inervenion examples of Chaper Example: The rae of change in he U.S. Consumer Price Index We will firs consider he use of he OESTIM paragraph for he esimaion of he inervenion model employed for he monhly rae of change in he U.S. Consumer Price Index (see Secion 6.5). The ime series was sored in he SCA workspace under he label RATECPI, and he inervenion model used was 1 θb RATECPI =ω 1PHASE1 +ω 2PHASE2 + a 1 B or equivalenly, (1 B)RATECPI =ω (1 B)PHASE1 +ω (1 B)PHASE2 + (1 θ B)a (7.16) 1 2 where PHASE1 and PHASE2 were binary series generaed o represen he periods a which Phase I and Phase II conrols were in place. The model described in (7.16) was specified hrough he TSMODEL paragraph and given he label CPIMODEL (see Secion 6.5.3). We can fi his model using he OESTIM paragraph by enering -->OESTIM CPIMODEL. METHOD IS EXACT. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- CPIMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 CPI RANDOM ORIGINAL (1-B ) 1 PHASE1 BINARY ORIGINAL (1-B ) 1 PHASE2 BINARY ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 PHASE1 NUM. 1 0 NONE W2 PHASE2 NUM. 1 0 NONE TH CPI MA 1 1 NONE

212 OUTLIER DETECTION AND ADJUSTMENT 7.19 SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE IO AO AO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-02 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-02 Three ouliers are deeced and joinly esimaed wih he model parameers. The esimaes of ω1 and ω2 and θ are virually he same as hose obained in Secion 6.5. However, by including he hree deeced ouliers, he residual sandard error decreases from o.00200, a reducion of abou 7% in ˆσ a. Because of his reducion in he esimae of residual sandard error, he -va lue for he esimae of ω 1 is now significan a he 5% level. By using OESTIM, a quesionably significan esimae has become significan. This is a clear illusraion for he need o accoun for all possible spurious observaions. The resuls obained above are more valid han he ones obained previously as we have more confidence ha he inervenion effecs are no confounded wih oulier effecs and ha he residual sandard error is appropriaely esimaed Example: Los Angeles ozone daa As a second illusraion of he use of he OESTIM paragraph for he esimaion of an inervenion model, we consider he inervenion model used for he monhly average of he ozone ( O 3 ) level in downown Los Angeles (see Secion 6.6). The daa are sored in he variable OZONE, and he inervenion model employed was 12 ω2 ω3 (1 θ1b)(1 θ2b ) OZONE =ω 1INT1 + INT2S 12 + INT2W 12 + a 12 1 B 1 B 1 B or equivalenly, (1 B )OZONE =ω1(1 B )INT1 +ω 2INT2S +ω 3INT2W + (1 θ1b)(1 θ 2B )a (7.17) More informaion regarding his model can be found in Secion 6.6. The above model was sored in he SCA workspace under he name OZONEMDL (see Secion 6.6). To esimae his model using he OESTIM paragraph, we may ener -->OESTIM OZONEMDL. METHOD IS EXACT. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 216

213 7.20 OUTLIER DETECTION AND ADJUSTMENT SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- OZONEMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 OZONE RANDOM ORIGINAL (1-B ) 12 INTV1 BINARY ORIGINAL (1-B ) INTV2S BINARY ORIGINAL NONE INTV2W BINARY ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 W1 INTV1 NUM. 1 0 NONE W2 INTV2S NUM. 1 0 NONE W3 INTV2W NUM. 1 0 NONE TH1 OZONE MA 1 1 NONE TH2 OZONE MA 2 12 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO TC TC TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E+00 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E+00 Three ouliers are deeced: an AO a =21 and emporary changes a =39 and =41. The posiive value for he AO a =21 corresponds o he exremely high ozone concenraion recorded in Sepember, The wo TC ouliers a =39 and =43 boh have negaive signs. These ouliers correspond o he unusually low ozone levels recorded in The TC effecs are observable in Figure 6.5. I is uncerain wha may be responsible for he low ozone level recordings in 1958, bu we are sure i is no caused by he inervenions under sudy. We also observe ha he inclusion of hese hree ouliers reduces he esimae of σ a from.774 o.703, a reducion of abou 10%. If we compare he parameer esimaes of (7.17) obained using OESTIM above and ESTIM in Secion 6.6, we observe ha he poin esimaes of ω 2, ω 3, θ1 and θ 2 are approximaely he same. The esimaes of ω 2 and ω 3 correspond o he second inervenion, he regulaions in engine designs. We see here is an approximae 0.24 uni reducion of ozone per year during he summer periods and a 0.10 uni reducion per year during he winer periods. These esimaes are boh significan a abou he 5% le vel in he OESTIM resuls. The sandard errors of hese esimaes are larger in he ESTIM resuls because ˆσ a is larger. Hence, he reducion in he winer period, ω ˆ 3, is no significan a he 5% level in he ESTIM resuls.

214 OUTLIER DETECTION AND ADJUSTMENT 7.21 The magniudes of he esimae of ω 1 are differen in he resuls of he OESTIM ( ˆω = ) and of he ESTIM ( ˆω = -1.34) paragraphs. The resuls of he OESTIM paragraph indicae he firs inervenion affecs a permanen level reducion in ozone of abou 1.53 unis. This level reducion is only 1.34 unis for he ESTIM paragraph. A smaller value is obained in he ESTIM paragraph because he TC a =39 and =43 are no accouned for. Tha is, he esimae of ω 1 from he ESTIM paragraph is biased by he presence of ouliers (he unusually low ozone levels) in In he OESTIM paragraph, hese oulier effecs are accouned for. A more deailed discussion of inervenion analysis wih oulier adjusmen can be found in Liu and Chen (1991). We see ha he inclusion of he deecion and adjusmen of ouliers in his example has a wo-fold benefi. Firs, a possible flaw in he analysis (confounding he low ozone level recordings of 1958 wih he effec of he firs inervenion) is avoided. Moreover, if ouliers are no incorporaed ino he analysis, a poenially significan effec ( ˆω ) is no revealed Forecasing in he Presence of Ouliers Depending upon he iming and he naure of an even, an oulier can subsanially affec he forecass of a model. Forecass are compued using he parameer esimaes (obained from all he daa of he ime series) and hose observaions near o he forecas origin ha are necessary for he calculaion of forecass. As a resul, ouliers ha mos affec forecass are hose a he end, or near he end, of he series. The OESTIM paragraph is useful for he deecion and adjusmen of ouliers ha can affec he parameer esimaes of he underlying ARIMA model. However, he effeciveness of oulier deecion is more limied if ouliers occur near he end of a ime series. Due o he naure of ouliers, we ofen require a few observaions afer he ime of he occurrence of an oulier in order o boh deec i and idenify is ype. For example, suppose he las observaion of a series is an oulier. We may be able o deec is presence (depending upon he size of is effec, ω i ), bu we canno idenify is ype (i.e., AO, IO, LS, or TC) based on he daa alone. We will be unable o do so empirically (i.e., based on daa alone) unil we have one or more addiional observaions. The inabiliy o empirically idenify he ype of he oulier a he end of a series will no affec parameer esimaion for he ARMA model, bu i can affec forecasing. The OFORECAST paragraph exends he oulier deecion and adjusmen capabiliies of he SCA Sysem o he forecasing of a ime series in he presence of ouliers. Unlike oher forecasing capabiliies ha simply uilize he curren parameer esimaes and he daa on hand o compue forecass, he OFORECAST paragraph also performs is own oulier deecion and adjusmen. As a resul, i provides us wih: (1) a closer scruiny of he las few observaions of a series, (2) he abiliy o incorporae our judgmen on he naure of an oulier in he forecasing process, and

215 7.22 OUTLIER DETECTION AND ADJUSTMENT (3) he capabiliy of effecively using updaed informaion in forecasing wihou reesimaing a model. More deailed discussions of forecasing wih ouliers can be found in Chen and Liu (1991) Oulier deecion a he end of a series The OFORECAST paragraph uses he curren esimaes of he model parameers o derive he residuals of a series. I hen deecs and adjuss for ouliers before he forecass are compued. Forecass are hen compued using he esimaed model wih oulier adjusmen. Usually he ouliers deeced are he same as hose found by he OESTIM paragraph. However, he OFORECAST paragraph akes a more criical look a he end of he series han he OESTIM paragraph. The mehod used for oulier deecion is he same in boh paragraphs, bu he OFORECAST paragraph reduces he criical value by 0.5 for he forecas origin (usually he end of he series) and he wo observaions preceding i. In his manner, he paragraph is more sensiive o ouliers a he end of he series (or he forecas origin) han he OESTIM paragraph. We hen have some assurance ha forecass are compued from boh he bes possible model and daa Handling end effecs As noed above, when an oulier is he las observaion of a series, i is no possible o idenify is ype. However, is ype is crucial o he forecass ha are made. For example, an addiive oulier will adversely affec he forecass unless he las observaion is properly adjused for he AO effec. If he las observaion is deermined o be an LS oulier, a permanen effec in all fuure forecass is caused. The oulier ype he OFORECAST paragraph assumes for he las observaion of a series is specified in he TYPE senence. The TYPE senence specifies he ypes of ouliers o deec and oher special acions o ake. The defaul is o deec all ypes of ouliers (i.e., AO, IO, LS and TC). A keyword specified afer he slash (/) in he TYPE senence dicaes he acion o ake if he las observaion of a series is deeced o be an oulier. If AO is specified, hen an oulier a he end of a series is reaed as an addiive oulier. Similar acions are aken if IO, TC or LS is specified. If no specificaion is made, hen he las observaion is no reaed as an oulier for forecasing purposes, even if i is deeced as an oulier. This is he defaul employed for forecasing using he OFORECAST paragraph. In forecasing, reaing an oulier a he end of a series as an ordinary observaion is he same as assuming ha i is an IO (see Ledoler 1989, Hillmer 1984, or Chen and Liu 1991). I may be he case ha we have relevan informaion of he ype of oulier a he ime of forecasing; or we may wish o compue forecass under a paricular ype of oulier ha represens a paricular scenario. The OFORECAST paragraph permis us o specify how we wan he oulier a he end of a series o be handled (see he descripion of he TYPE senence in he synax descripion a he end of his chaper).

216 OUTLIER DETECTION AND ADJUSTMENT Forecass wih updaed daa Someimes i is he case ha forecass are updaed as new daa become available, bu we do no wish o re-esimae he parameers of he underlying model. The OFORECAST paragraph provides us wih he capabiliy o re-use he same esimaed parameer values wih updaed daa. We can use he paragraph o forecas from all periods since he model was las esimaed. In his manner, he forecass may be compared coninually wih he acual occurrences. The OFORECAST paragraph will make auomaic adjusmen for any new ouliers deeced based on he specified model before a forecas is made from he las ime origin (i.e., he las available daa poin) Example: Airline daa To illusrae he OFORECAST paragraph, we consider he monhly oals (in housands) of inernaional airline passengers from January 1949 hrough December The daa are Series G of Box and Jenkins (1970), and we modeled previously in Secion 3 of Chaper 5. We have 144 observaions in his series, bu for his illusraion we will reserve he las 12 observaions for pos-sample comparisons. As in our previous ARIMA modeling of he series, we will use he naural logarihm of he monhly oals o obain a more homogenous variance. These values are sored in he SCA workspace in he variable LNAIRPAS. In Secion 5.3, we deermined an appropriae model for his daa o be an ARIMA (0,1,1)x(0,1,1) ; ha is, (1 B)(1 B )LNAIRPAS. (7.18) = (1 θ1b)(1 θ2b )a The above model was specified using he TSMODEL paragraph and held in he SCA workspace under he model name AIRLINE. To esimae his model using he OESTIM paragraph (and only observaions 1 hrough 132), we ener -->OESTIM AIRLINE. METHOD IS EXACT. SPAN IS 1,132. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH1 LNAIRPAS MA 1 1 NONE TH2 LNAIRPAS MA 2 12 NONE

217 7.24 OUTLIER DETECTION AND ADJUSTMENT SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO LS AO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 Three ouliers are deeced, none of hem are near o he forecas origin we will use (=132). By correcing for hese ouliers, ˆσ a is reduced by abou 14%. We now use he OFORECAST paragraph o compue one-sep-ahead forecass for he forecas origins 132 hrough 143 by enering -->OFORECAST AIRLINE. ORIGINS ARE 132 TO 143. NOF IS 1. --> TYPES ARE AO,IO,LS,TC/AO. We include he TYPES senence o specify ha we wish o deec all possible ypes of ouliers. The specificaion of AO afer he slash (/) indicaes ha we wan an oulier deeced a he forecas origin o be reaed as an addiive oulier. We are provided wih a sequenial summary of he deeced ouliers and adjusmens ha are made a each forecas origin before forecass are made. For example, a our firs forecas origin ( = 132), we obain RESIDUAL STANDARD ERROR (USES DATA UP TO THE FIRST FORECAST ORIGIN)=.33223E-01 TIME ESTIMATE T-VALUE TYPE AO LS AO FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN Here he ouliers deeced are he same as hose deeced by he OESTIM paragraph. The compued forecas for = 133 (wih he indicaed adjusmens) is The informaion provided for forecas origin = 133 is RESIDUAL STANDARD ERROR (USES DATA UP TO THE FIRST FORECAST ORIGIN)=.33223E-01 TIME ESTIMATE T-VALUE TYPE AO LS AO

218 OUTLIER DETECTION AND ADJUSTMENT FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN We see ha he same ouliers are deeced when he forecas origin is = 134. However, an addiional oulier is deeced when he forecas origin is = 135. Noe ha he deeced oulier a =135 has a -saisic of 2.79 (in absolue value), which is greaer han 2.5, bu smaller han 3.0. RESIDUAL STANDARD ERROR (USES DATA UP TO THE FIRST FORECAST ORIGIN)=.33223D-01 TIME ESTIMATE T-VALUE TYPE AO LS AO AO FORECAST, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN The deeced oulier is reaed as an AO according o our specificaions. The forecas for = 136 is now based on he esimaed model and he deeced ouliers. No addiional ouliers are deeced in he subsequen forecas origins, and he oulier a = 135 is coninually deeced as an addiive oulier. The summary of he one-sep-ahead forecass from he OFORECAST paragraph, he acual values, he forecas errors, and he resulan roo mean squared error (RMSE) for he pos-sample period are provided in Table 7.2. Also presened in Table 7.2 are he one-sepahead forecass we obain for ime indices 133 hrough 144 if we sequenial employ he ESTIM and FORECAST paragraph in lieu of OESTIM and OFORECAST. The parameer esimaes obained by ESTIM differ slighly from hose obained in Secion (since only he firs 132 observaions are used here). The fied model in he resriced ime span is wih (1 B)(1 B )LNAIRPAS = ( B)( B )a ˆσ = The acual values for he forecas period are also lised.

219 7.26 OUTLIER DETECTION AND ADJUSTMENT Table 7.2 Forecass of he airline daa in he pos-sample period OFORECAST paragraph FORECAST paragraph Acual Sep-ahead Forecas Sep-ahead Forecas value forecas error forecas error RMSE We see ha he pos-sample RMSE for he forecass from he OFORECAST paragraph is abou 17.5% less han ha from he FORECAST paragraph. The difference is almos enirely caused by he resul of he one-sep-ahead forecas for = 136. We were informed by he OFORECAST paragraph ha an oulier occurs a = 135. As a resul, he one-sep-ahead forecas from eiher he OFORECAST or FORECAST paragraph is larger han he acual value by abou he same amoun. However, by deecing he oulier a =135, he OFORECAST for = 136 is much more accurae han ha from he FORECAST paragraph. Hence he OFORECAST paragraph is able o adap o he occurrence of a new oulier and improve he accuracy of he forecass. 7.6 Oulier Deecion wih a Known Model: The OFILTER Paragraph The OFILTER paragraph deecs and adjuss for ouliers based on a model ha has been esimaed previously. The parameer esimaes are no revised in his paragraph. The OFILTER paragraph can hen be used for a number of purposes including: (1) Derivaion of an adjused residual series or adjused observed series The OFILTER paragraph permis us o obain an adjused residual series or an adjused observed series wihou he re-esimaion of a model. This can save compuer ime, paricularly in he case when new daa are acquired for he same series. The resuls from he OFILTER paragraph and he adjused residual series can be used o check for ouliers and he validiy of he model.

220 OUTLIER DETECTION AND ADJUSTMENT 7.27 (2) Oulier deecion for a fied model As noed previously, he OFILTER paragraph can be used in lieu of he OUTLIER paragraph o deec ouliers in a model esimaed using he ESTIM paragraph. Thus, he OFILTER paragraph can be used as a diagnosic ool, much like he OUTLIER paragraph. In his way we do no need o expend he compuaion ime required o deec, adjus, and esimae he parameers using he OESTIM paragraph. In addiion, he OFILTER paragraph can deec a TC ha he OUTLIER paragraph canno. (3) Qualiy conrol of a ime dependen process In some siuaions a ime dependen process may be moniored o assure ha he aribues or he yield of a process are in a sae of saisical conrol. In mos siuaions, i is no necessary o coninually re-fi a model as new daa are acquired. As a resul, a ime series model may be esimaed infrequenly, bu i may be coninually employed for conrol purposes. Alwan and Robers (1988) discuss how he residuals from a fied ime series model can be used o highligh special causes (Deming, 1982) of a process. The OFILTER paragraph provides for he applicaion of a fied model as more daa are acquired. We may hen be able o locae he occurrence of a special cause in he newly acquired daa by examining any new ouliers ha are deeced. We can also obain an adjused residual series for furher sudy. Example: Airline daa To illusrae he OFILTER paragraph, we will consider he airline daa used in he previous secion. The model AIRLINE was fi using he OESTIM paragraph based on he firs 132 observaions of he series LNAIRP. Three ouliers were idenified a =29, 54 and 62. We can now apply his model o he enire ime series. We will sore he residuals derived from he OFILTER paragraph in he variable ADJRES and he adjused observed series (adjused for deeced ouliers) in he variable ADJY. We can obain his by enering -->OFILTER AIRLINE. NEW ARE ADJRES, ADJY. METHOD IS EXACT. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 144 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRP RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH1 LNAIRP MA 1 1 NONE TH2 LNAIRP MA 2 12 NONE

221 7.28 OUTLIER DETECTION AND ADJUSTMENT SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO LS LS AO AO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 In addiion o he previously deeced ouliers a =29, 54, and 62, ouliers are deeced a =135 and =39. The oulier a =39 is marginal ( value = -3.06) and is caused by he reducion in he esimae of ˆσ a (from o ). This example shows ha he OFILTER paragraph can be used o deec ouliers as new daa are added o he ime series. 7.7 Modeling and Forecasing Time Series in he Presence of Missing Observaions One common assumpion of ime series analysis is ha he series o be analyzed has no missing observaions. In pracice, missing daa may occur in a ime series. For example, here may be occasions in which no daa are generaed (e.g., occasional producion line shu downs due o equipmen malfuncions, re-ooling, or he like) or daa may simply no be recorded, or los. Ofen he acual effec of missing daa may be sligh. A simple ime series plo of he daa may indicae a likely (small) range of values for a missing daa poin, based eiher on he values assumed by neighboring poins or poins of he same periodiciy. However, mos modeling procedures acily assume all daa are presen. The procedures will be sill usable if he missing observaions are pached appropriaely. As a resul, ad hoc mehods are ofen employed o recode missing observaions wih suiable replacemen values. Unforunaely, sofware usually does no possess he visual exrapolaion abiliy of a ime series analys. Many packages are limied o modeling or esimaing only he longes sequenial run of non-missing observaions. The SCA Sysem provides he PATCH paragraph for he ad hoc replacemen of missing observaions before he use of radiional modeling, esimaion and forecasing procedures. More complee informaion on he PATCH paragraph can be found in Appendix C. As noed previously in Secion 5.4.2, new capabiliies of he SCA Sysem permi a direc analysis of a ime series wih missing daa. Informaion necessary for model idenificaion can be obained using he ACF and PACF paragraphs provided he logical senence MISSING is included in he paragraph. In his manner, he SCA Sysem anicipaes he presence of missing observaions and makes proper accommodaions whenever missing daa are encounered. A enaively idenified model can hen be esimaed

222 OUTLIER DETECTION AND ADJUSTMENT 7.29 and forecased using he OESTIM and OFORECAST paragraphs, respecively. The OESTIM paragraph will provide esimaes of missing values and will also auomaically deec and esimae ouliers in he ime series joinly wih model parameers. In he remainder of his secion, we will illusrae he handling of missing daa by he OESTIM and OFORECAST paragraphs Characerizaion and esimaion of missing daa A naural characerizaion for a missing value is as an addiive oulier (AO). The AO characerizaion has been employed by a number of auhors including Ljung (1989a, 1989b) and Liu and Chen (1991). Recall (see Secion 7.1.1) if we assume ha an oulier occurs a ime =T, we can represen he series we observe by he model Y = Z +ω P (T) A. (7.18) The value ω represens he amoun of deviaion from he rue value of Z. In his case A Chen and Liu (1990) have shown ha he adjused value for oulier effec from Y ) is: T Y % = π π Y + π π Y T 1 n T+ j n T n T n T T k k j T j k k j T+ j j1 = k= j j1 = k= j j0 = Y T 2 j T (i.e., afer removing he π (7.19) The adjused value in (7.19) is an inerpolaed value based on he observaions of he series preceding and following Y T. The adjused value has nohing o do wih he observaion Y. T This suggess we may be able o esimae missing daa in a ime series by reaing any missing value as an AO. The procedure of Chen and Liu (1991) ha uilizes (7.19) is ieraive. To begin he ieraion, enaive values are assigned o he missing daa. Equaion (7.19) is hen employed o esimae he missing value. The esimaed missing value is only dependen upon he esimaes of he model parameers and he observed values before and afer i, bu is no dependen upon he paching value iself. I can be shown ha he esimae given in (7.19) is he condiional expecaion of he missing value given he observed values and he model parameers. This implies ha he procedure opimally employs all he relevan informaion o esimae he missing value. When a consecuive sequence of missing daa occurs, he esimaed missing values may also be obained based on he observed values and he esimaed model parameers. As noed above, he ieraive esimaion procedure requires a enaive iniial value for a missing observaion. The SCA Sysem uses an inuiive iniial paching value. If Y T is missing, he average of YT 1 and Y T+ 1 are used if he series is saionary or if only nonseasonal (i.e., firs order) differencing is needed. If seasonal differencings are needed, hen he average value of Y an T 1 d Y T + 1 is used, where i is he minimum value of he seasonal differencing orders employed. A similar paching scheme is used if consecuive observaions are missing.

223 7.30 OUTLIER DETECTION AND ADJUSTMENT Example: Airline daa We now illusrae he modeling of a ime series wih missing observaions, and conras resuls wih hose obained when no daa are missing. To accomplish his, we will use a daa se ha has no missing values, hen inser missing values in various posiions. Specifically, we consider he monhly oals (in housands) of inernaional airline passengers from January 1949 hrough December The daa are Series G of Box and Jenkins (1970), and have been used previously in Secion 5.3 and The logged values of his series are held in he SCA workspace in he variable LNAIRPAS. As in Secion 7.5.4, we will reserve he las 12 observaions for a pos-sample comparison of forecass. Analysis wih no missing daa In secion 5.3 we showed ha an appropriae model for his ime series is an ARIMA (0,1,1)x(0,1,1) ; ha is, (1 B)(1 B )LNAIRPAS. (7.20) = (1 θ1b)(1 θ2b )a 12 The idenificaion of he above model was based on he ACF of (1 B)(1 B )Y. This ACF will be shown laer, ogeher wih he ACF of he series wih insered missing observaions. In Table 7.3 we summarize he esimaion resuls of his model. In using he OESTIM paragraph, we boh deec ouliers in he series and hen esimae heir effecs joinly wih ARMA parameers. Table 7.3 Esimaion resuls for he airline model (7.20) using condiional and exac likelihood funcions and he ESTIM and OESTIM paragraphs (sandard errors of esimaes are in parenheses). Oulier summary (if any) Paragraph Mehod ˆθ 1 ˆθ 2 ˆσ a Type Esimae -value ESTIM Condiional (.087) (.079) OESTIM Condiional AO (.090) (.083) 54 LS AO ESTIM Exac (.086) (.073) OESTIM Exac AO (.088) (.077) 54 LS AO

224 OUTLIER DETECTION AND ADJUSTMENT 7.31 For boh he condiional and exac likelihood mehods, he use of OESTIM reduces by 9%. Three ouliers are deeced a ime indices 29, 54 and 62. ˆσ a Analysis wih missing daa To illusrae he modeling of a univariae ime series in he presence of missing daa, we will re-analyze he above airline daa afer we recode he values of LNAIRPAS a =48, 70 and 110 o he SCA inernal missing value code (he acual recoding is no shown). A enaive model for LNAIRPAS can be idenified based on he ACF of he series. We can obain he ACF of his modified series (i.e., wih missing daa) by simply enering -->ACF LNAIRPAS. DFORDER IS 1, 12. MISSING. The PACF can be obained in like fashion. The logical senence MISSING is included so ha ACF is compued in he usual manner excep erms involving missing values are excluded. In so doing, he effecive number of observaions used in he compuaion of a lagged auocovariance is dependen on he number of erms used in he compuaion a his specific lag. If he MISSING senence is no specified, he ACF is compued using he daa span ha begins wih he firs non-missing observaion and ends wih he observaion ha precedes he firs missing value encounered (here =48). The ACF paern for all daa and for he modified series are given in Figure 7.9. No missing daa Figure 7.9 ACF of I XXXXX+XXXI IXXX XXXXXI IX IX IX XI I IXXXX XXI IXX XXXXX+XXXXI IXXXX XI IXXXX XXXI IXX I I XXXI IX XXI IXXXXXX I XXXI IX + 12 (1 B)(1 B ) LNAIRPAS Missing daa a = 48, 70 and I XXXXXX+XXXI IXX XXXXXI IX IX IX XI XI IXXXXX XXI IXX XXX+XXXXXI IXXXX XI IXXXX XXXXI IXXX IX XI XXXI IX XXI IXXXXXX XI XXI IX +

225 7.32 OUTLIER DETECTION AND ADJUSTMENT XI IX I XI XI IXXXXX XXXI IXX XXXXI I XI IX XI XI XI IXXXXXX XXXXI IXX XXXI XI + We observe ha he ACFs of boh ime series provide he same informaion for he idenificaion of a enaive model. We can now specify he airline model (7.20) and use he OESTIM paragraph for is esimaion. Tha is, we ener (some SCA oupu is suppressed for presenaion purposes) -->TSMODEL AIRLINE. MODEL IS LNAIRPAS(1,12)=(1-TH1*B)(1-TH2*B**12)NOISE -->OESTIM AIRLINE. SPAN 1,132. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 132 THE 48-TH OBSERVATION IS RECODED TO THE 70-TH OBSERVATION IS RECODED TO THE 110-TH OBSERVATION IS RECODED TO THE AVERAGE OF THE OBSERVATIONS THAT ARE 12 TIME PERIOD(S) APART ARE USED AS AN INITIAL PATCH FOR THE MISSING VALUE(S) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH1 LNAIRPAS MA 1 1 NONE TH2 LNAIRPAS MA 2 12 NONE SUMMARY OF MISSING OBSERVATION ADJUSTMENT TIME ESTIMATE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO LS AO

226 OUTLIER DETECTION AND ADJUSTMENT 7.33 TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 We are informed ha he iniial esimae of Y 48 is , he average of Y36 and Y 60 (since he only seasonal difference is 12). Sim ilarly, Y 70 and Y 110 are recoded o and , respecively. The final esimaes for Y 48, Y70 and Y 110 are 5.265, and 5.762, respecively. The acual values for hese observaions are 5.268, and 5.762, respecively. Hence he missing values have been esimaed appropriaely. The condiional esimaes of θ 1 and θ 2 are.336 and.532, respecively; and are in agreemen wih he condiional esimaes displayed in Table 7.3. The ouliers deeced are he same as before, and ˆσ is reduced by 9%, as before. a Using OESTIM wih he condiional algorihm accomplishes wo asks. Firs, we obain good iniial parameer esimaes if we ulimaely wish o use he exac algorihm. Second, all missing daa of LNAIRPAS are now esimaed and recoded o he esimaed values indicaed in he above oupu. We can now use he exac algorihm o obain esimaes of θ1 and θ 2 by enering -->OESTIM AIRLINE. METHOD IS EXACT. SPAN 1,132. We obain he following resuls: THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 132 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- AIRLINE VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 LNAIRPAS RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH1 LNAIRPAS MA 1 1 NONE TH2 LNAIRPAS MA 2 12 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO LS AO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 The resuls are in accord wih hose presened in Table 7.3. Noe ha no missing values are esimaed here since hey have already been esimaed and recoded o non-missing values

227 7.34 OUTLIER DETECTION AND ADJUSTMENT in he previous use of OESTIM. If we desire, we can employ he EXACT algorihm direcly afer he model specificaion paragraph (TSMODEL) insead of he sequenial use of he condiional and exac algorihms. In such a case, we may affec he ARMA esimaes slighly, and may also affec he ouliers deeced, heir ype, and heir esimaed effecs. Differences are due o he fac ha he exac algorihm is more sensiive o iniial paching values and ouliers. I is also possible o obain he esimaes of missing daa wihou performing oulier adjusmen in he OESTIM paragraph. To accomplish his ask, he senence OADJUSTMENT IS NONE mus be included in he OESTIM paragraph Forecasing wih missing daa Once a model has been esimaed using he OESTIM paragraph, we can compue forecass from i using he OFORECAST paragraph. The OFORECAST paragraph provides us wih he capabiliy o re-use esimaed parameer values wih updaed daa. In his manner, forecass may be compared coninually wih acual occurrences, and he OFORECAST paragraph will make auomaic adjusmen for any new ouliers deeced based on he specified model (see Secion 7.5). As in Secion 7.5.4, we illusrae he effeciveness of he OESTIM paragraph in handling and recoding missing daa, we consider one-sep-ahead forecass from ime origins 132 hrough 136 for he esimaed model of he airline daa. We use boh he original series and he modified series wih esimaed missing daa. We compue forecass, and perform oulier deecion and adjusmen during he pos-sample period using he OFORECAST paragraph. To obain hese resuls, we may ener -->OFORECAST AIRLINE. ORIGINS ARE 132 TO 136. NOFS IS 1. --> TYPES ARE AO,IO,LS,TC/AO. The oupu produced by he above paragraph for his daa se is similar o ha shown in Secion and is no shown here. The differences beween he resuls for oulier deecion and adjusmen using he esimaed model (7.20) wih he original airline daa and he modified airline daa (afer recoding he missing observaions wih heir esimaes) are sligh, and are due o he differen esimaes obained for θ 1 and θ 2. A summary of ouliers deeced and one-sep-ahead forecass for boh ime series is given in Table 7.4.

228 OUTLIER DETECTION AND ADJUSTMENT 7.35 Table 7.4 Summary of oulier deecion and forecass for he airline model of LNAIRPAS (original series and modified series) (A) Ouliers deeced up o he forecas origins a =132, 133 and 134 For original LNAIRPAS For modified LNAIRPAS TYPE ESTIMATE -value TYPE ESTIMATE -value AO AO LS LS AO AO (B) Ouliers deeced up o he forecas origins a =135 and 136 For original LNAIRPAS For modified and recoded LNAIRPAS TYPE ESTIMATE -value TYPE ESTIMATE -value AO AO LS LS AO AO AO AO (C) One-sep-ahead forecas summary (sandard error =.0332 in all cases) Forecas Acual Forecased value for LNAIRPAS using Origin Value Original Daa Modified Daa Oher Relaed Topics This secion provides a brief overview of opics relaed o oulier deecion and adjusmen. The maerial presened in his secion can be considered advanced or of occasional use. As a consequence, his secion can be skipped, and referenced as necessary. The maerial presened, and he secion conaining i are: Secion Topic Effec of an oulier on a filered residual series when ARMA parameers are known Ouline of he oulier deecion and adjusmen procedure of he OESTIM paragraph

229 7.36 OUTLIER DETECTION AND ADJUSTMENT Effec of an oulier on a filered residual series when ARMA parameers are known In Secion 7.2.1, we consider he case ha he parameers of an underlying ARIMA model are known. In order o observe he effec of an oulier on a residual series, he following filered series was considered: e =π(b) Y, where π(b) is he polynomial operaor in he π-weighs of he ARIMA model. The values of e become he residuals of he fied model if he above π-weighs are compued from an esimaed ARIMA model raher han from known parameers. If we have a single oulier a ime = T, hen of oulier presen. Specifically, e can be re-wrien according o he ype (T) e =ω π (B)P + a, for an AO A (T) e =ω IP + a, for an IO (7.20) ω L (T) e = π (B)P + a, 1 B for an LS ωc (T) e = π (B)P + a, for atc 1 δb The e series can also be expressed as ω A, for an AO ωi, for an IO e =ω x + a, where ω= ω L, for an LS ω C, for a TC In equaion (7.21) above, he series x assumes he value 0 for and for T+k (k = 1, 2,..., n-t) he value for x is for an AO: -π k T; he value 1 for = T; (7.21) for an IO: 0 for an LS: 1 k j= 1 π j (7.22) for a TC: k 1 k k j δ δ π π j= 1 j k More informaion regarding he values in (7.22) can be found in Chen and Liu (1990).

230 OUTLIER DETECTION AND ADJUSTMENT Ouline of oulier deecion and adjusmen procedure of he OESTIM paragraph A summary of he seps employed in he oulier deecion and adjusmen procedure used in he OESTIM paragraph is given below. A more complee discussion of his deecion and esimaion procedure is found in Chen and Liu (1990). Sage 1 (iniial deecion and esimaion) (1.1) Esimae he idenified ARMA model using he mos recenly adjused observed series. (The procedure begins wih no adjusmen.) Compue a residual series. (1.2) Employ he mehod described in Secion o deermine if here is an oulier in he curren residual series. (1.3) If a poenial oulier is discovered, remove is posulaed effec from he residuals and repea sep (1.2). Oherwise, proceed o (1.4). (1.4) If no oulier has been discovered in he residuals of he original daa, hen we are done and he series is free from oulier effecs. However, if an oulier has been found, hen adjus he observed daa and repea (1.1) - (1.3). Coninue o adjus he daa and repea (1.1) - (1.3) unil no new ouliers are found. Now proceed o Sage 2. Sage 2 (join esimaion of oulier effecs) (2.1) Esimae he effecs of he exising idenified ouliers using a muliple regression model. (2.2) Sandardize he esimaed effecs. If he smalles (in absolue value) of hese sandardized effecs is less han he criical level used in oulier deecion (1.2), hen delee he oulier from he exising se and reurn o (2.1). Oherwise proceed o (2.3). (2.3) Obain an adjused se of observaions based only on hose ouliers ha are sill significan. (2.4) Use he adjused observaions o esimae ARMA parameers. If he model conains a consan erm (or if requesed by he user), compue a residual sandard error and check o see if he relaive change in is esimae exceeds a specified value. If so, reurn o (2.1). Oherwise (or if his check is no used), proceed o Sage 3.

231 7.38 OUTLIER DETECTION AND ADJUSTMENT Sage 3 (final esimaion of parameer and effecs) (3.1) The las se of parameers esimaes compued in (2.4) are he final esimaes of he ARMA parameers. (3.2) Use he parameer esimaes of (3.1) and he original se of observaions o compue a residual series. (3.3) Repea Sage 1 excep ha no ARMA parameers are re-esimaed. (3.4) Repea (2.1) and (2.2) of Sage 2 as necessary. The esimaes obained in he final ieraion of (2.1) are hose of he oulier effecs. Sage 1 is essenially he procedure of Chang, Tiao and Chen (1988) as described in Secion The sepwise procedure of Sage 2 is used o evaluae oulier effecs joinly and remove any spurious effecs. Once rue ouliers are deermined and esimaed, he series is adjused and he ARMA parameers can be more properly esimaed. Now he residual series should be closer o e (described in Secion above) and ouliers can be deeced, esimaed joinly and re-evaluaed. Hence when Sage 1 is repeaed a sep (3.3), i begins assuming no ouliers are presen and essenially re-discovers hem (and any ha may have been masked). The re-applicaion of Sage 2 re-esimaes he effecs.

232 OUTLIER DETECTION AND ADJUSTMENT 7.39 SUMMARY OF THE SCA PARAGRAPHS IN CHAPTER 7 This secion provides a summary of hose SCA paragraphs employed in his chaper. The synax for he paragraphs is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of a paragraph, while he full display presens all possible modifying senences of a paragraph. In addiion, special remarks relaed o a paragraph may also be presened wih he descripion. I is recommended ha he brief form of he synax of a paragraph be used before employing any Sysem capabiliy ha can be accessed only hrough he use of he full form of he paragraph synax. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are OESTIM, OFORECAST, OFILTER, and OUTLIER. Legend v : variable or model name i : ineger r : real value w : keyword

233 7.40 OUTLIER DETECTION AND ADJUSTMENT OESTIM Paragraph The OESTIM paragraph is used o esimae joinly he model parameers and oulier effecs in an ARIMA or ransfer funcion model (see Chaper 8). This paragraph also creaes a number of variables which are useful for furher analyses. Synax for he OESTIM paragraph Brief synax OESTIM MODEL model-name. TYPES ARE w1, w2, DELTA IS r. OSTOP ARE MXOUTLIERS(i1), CRITICAL(r). NEW-SERIES IN v1, v2. HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). Required senence: MODEL Full synax OESTIM MODEL model-name. TYPES ARE w1, w2, DELTA IS r. OSTOP ARE MXOUTLIERS(i1), CRITICAL(r), MXESTIM(i2). NEW-SERIES IN v1, v2, v3, v4, v5. METHOD IS w. STOP ARE MAXIT(i), LIKELIHOOD(r1), ESTIMATE(r2), STDEV(r3). OADJUSTMENT IS w. STDEV IS w(r). SPAN IS i1, i2. OUTPUT IS LEVEL(w), PRINT(w1, w2, - - -), NOPRINT(w1, w2, - - -). HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). Required senence: MODEL Senences used in he OESTIM paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model o be esimaed. The label mus be one specified in a previous TSMODEL paragraph. I is a required senence.

234 OUTLIER DETECTION AND ADJUSTMENT 7.41 TYPE senence The TYPE senence is used o specify ypes of ouliers o be deeced. The valid keywords are IO (innovaive oulier), AO (addiive oulier), LS (level shif), and TC (emporary change). The defaul is IO, AO, TC, and LS. DELTA senence The DELTA senence is used o specify he δ value employed for he TC oulier (see Secions and 7.2.4). The defaul is δ=0.7. OSTOP senence The OSTOP senence is used o specify he sopping crierion for oulier deecion. Parameer esimaion and oulier deecion and adjusmen are done ieraively. If any oulier is deeced afer a parameer esimaion, he ime series is adjused for ouliers and parameers are re-esimaed. The ieraion sops if he maximum number of ouliers ha may be adjused is reached, if he maximum number of re-esimaions of parameers is reached; or if all oulier saisics are smaller han a specified criical value. The argumen for he keyword MXOUTLIERS (i1) specifies he maximum number of ouliers permied o be deeced and adjused. The defaul for i1 is equal o 10% of he number of observaions. The argumen for he keyword CRITICAL (r) specifies a criical value for esing he presence of ouliers. The recommended value for r is 3.50 for low sensiiviy, 3.00 for medium sensiiviy, and 2.70 for high sensiiviy. The defaul for r is 3.0. The argumen for he keyword MXESTIM (i2) specifies he maximum number of reesimaions of model parameers wihin each esimaion. The defaul for i2 is 3. NEW-SERIES senence The NEW-SERIES senence is used o specify he labels (names) of variables o be creaed for saving informaion of he oulier deecion process. Only hose resuls desired o be reained need be named. The defaul is ha no variable is reained afer he paragraph is execued. The variables ha may be reained (and he posiion a label mus occupy in he senence) are: v1: he name used o sore he residuals afer all oulier adjusmens v2: he name used o sore he adjused series (i.e., he resulan series afer removing deeced oulier effecs from he original observaions) v3: he name used o sore an indicaor variable designaing he ypes of ouliers, if any, found during he oulier deecion process. The value of he -h observaion of his variable is 0 if he -h value of he ime series is no an oulier; 2 if i is an innovaive oulier; 3 if i is an addiive oulier; 4 if i is a emporary change; 5 if i is a level shif, and 1 if is value is missing. v4: he name used o sore he esimaes of any deeced ouliers v5: he name used o sore he effecs of deeced ouliers on residuals

235 7.42 OUTLIER DETECTION AND ADJUSTMENT METHOD senence The METHOD senence is used o specify he mehod for he compuaion of he likelihood funcion used in model esimaion. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. The defaul is CONDITIONAL. STOP senence The STOP senence is used o specify he sopping crierion for he nonlinear esimaion of parameers. This esimaion is condiional on he mos recen oulier adjusmen. Esimaion is erminaed when he relaive change in he value of he likelihood funcion or parameer esimaes beween wo successive ieraions is less han or equal o he convergence crierion, or if he maximum number of ieraions is reached. The argumen, i, for he keyword MAXIT specifies he maximum number of ieraions. The defaul is i=10. The argumen, r1, for he keyword LIKELIHOOD specifies he value of he relaive convergence crierion on he likelihood funcion. The defaul is r1 = The argumen, r2, for he keyword ESTIMATE specifies he value of he relaive convergence crierion on he parameer esimaes. The defaul is r2 = The argumen, r3, for he keyword STDEV specifies he value of he relaive convergence crierion on he esimae of he sandard deviaion σa in he ieraion. The las crierion (r3) is employed by he SCA Sysem o provide furher conrol of accuracy in parameer esimaes. The defaul is r3=0.001 when a consan erm is presen, and he crierion is disabled oherwise. The crierion can be disabled by he user by specifying a negaive value for r3. The crierion is enabled if a posiive value is specified for r3 even if no consan erm is presen. OADJUSTMENT senence The OADJUSTMENT senence is used o specify he mehod of oulier esimaion and adjusmen. The keyword may be SEQUENTIAL for he deecion and adjusmen of ouliers sequenially from larges effec o smalles (see Chang, Tiao and Chen 1988). JOINT specifies he deecion and join esimaion of oulier effecs (he defaul). The use of NONE is equivalen o using ESTIM (excep missing daa are esimaed). STDEV senence The STDEV senence is used o specify a mehod for he esimaion of σ a. TRIM(r) specifies ha an rx100% rimmed sandard deviaion is used (i.e., he op rx100% larges observaions, according o absolue values, are excluded from he compuaion). A specificaion of TRIM(0.0) indicaes ha σ a is compued a each observaion (residual) using all daa excep he curren observaion. TRIM(0.0) is he defaul. MAD(r) specifies ha he median absolue deviaion is used fo r he esimaion of σ a ( σ = 1.483*median absolue deviaion). For furher informaion, see Chen and Liu (1990). a

236 OUTLIER DETECTION AND ADJUSTMENT 7.43 SPAN senence The SPAN senence is used o specify he span of ime indices, i1 o i2, for which daa are analyzed. The defaul is he maximum span available for he series. OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF NORMAL DETAILED : esimaes and heir relaed saisics only : RCORR : ITERATION, CORR, and RCORR where he keywords on he righ denoe: ITERATION : he parameer and covariance esimaes for each ieraion CORR RCORR : he correlaion marix for he parameer esimaes : he reduced correlaion marix for he parameer esimaes (i.e., a display in which all values have no more han wo decimal places and hose esimaes wihin wo sandard errors of zero are displayed as dos,. ). HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace unil he end of he session. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: RESIDUALS: FITTED: VARIANCE: he residual series wihou oulier adjusmen he one-sep-ahead forecass (fied values) of he series he variance of he noise

237 7.44 OUTLIER DETECTION AND ADJUSTMENT OFORECAST Paragraph The OFORECAST paragraph is used o compue he forecas of fuure values of a ime series based on a specified ARIMA or ransfer funcion model. Unlike he FORECAST paragraph he OFORECAST paragraph handles ouliers ha may exis in he oupu ime series. The OFORECAST should be used in conjuncion wih a model esimaed using he OESTIM paragraph. Synax of he OFORECAST paragraph Brief synax OFORECAST MODEL model-name. NOFS ARE i1, i2, TYPES ARE w1, w2, /w. DELTA IS r. OSTOP IS MXOUTLIERS(i), CRITICAL(r1, r2). HOLD FORECASTS(v1, v2, ---), STD_ERRS(v1, v2, ---). Required senence: MODEL Full synax OFORECAST MODEL model-name. ORIGINS ARE i1, i2, NOFS ARE i1, i2, TYPES ARE w1, w2, /w. DELTA IS r. OSTOP IS MXOUTLIERS(i), CRITICAL(r1, r2), MXESTIM(i2). METHOD IS w. OADJUSTMENT IS w. STDEV IS w(r). HOLD FORECASTS(v1, v2, ---), STD_ERRS(v1, v2, ---). Required senence: MODEL Senences used in he OFORECAST paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model for he series o be forecased. The label mus be one specified in a previous TSMODEL paragraph. ORIGINS senence The ORIGINS senence is used o specify he ime origins for forecass. The defaul is one origin, he las observaion.

238 OUTLIER DETECTION AND ADJUSTMENT 7.45 NOFS senence The NOFS senence is used o specify for each ime origin he number of ime periods ahead for which forecass are generaed. The number of argumens in his senence mus be he same as ha in he ORIGINS senence. The defaul is 24 forecass for each ime origin. TYPE senence The TYPE senence is used o specify ypes of ouliers o be deeced and how o rea he las observaion should i be deeced as an oulier. The valid keywords are AO (addiive oulier), IO (innovaive oulier), LS (level shif), and TC (emporary change). Those keywords specified before a slash ( / ) indicae he ypes of oulier o be deeced. The keyword, if any, specified afer he slash indicaes he ype of oulier a he end of he series, should one be deeced, for forecasing purposes. If no keyword is specified afer he slash, hen he las observaion is no reaed as an oulier in he compuaion of forecass. The defaul is AO, IO, LS, TC, and he las observaion is no reaed as an oulier even if i has a significan es saisic. DELTA senence The DELTA senence is used o specify he δ value employed for he TC oulier (see Secions and 7.2.4). The defaul is δ=0.7. OSTOP senence The OSTOP senence is used o specify he sopping crierion for oulier deecion. Parameer esimaion and oulier deecion and adjusmen are done ieraively. If any oulier is deeced afer a parameer esimaion, he ime series is adjused for ouliers and parameers are re-esimaed. The ieraion sops if he maximum number of ouliers ha may be adjused is reached, if he maximum number of re-esimaions of parameers is reached; or if all oulier saisics are smaller han a specified criical value. The argumen for he keyword MXOUTLIERS (i1) specifies he maximum number of ouliers permied o be deeced and adjused. The defaul for i1 is equal o 10% of he number of observaions. The argumen for he keyword CRITICAL (r1, r2) specifies a criical values for esing he presence of ouliers. One or wo values may be specified. The criical value r1 is used for all observaions excep he forecas origin and he wo observaions preceding i. The criical value r2 is used for hese hree observaions. If r2 is no specified, hen he value r1-0.5 will be used. The defaul value for r1 is 3.0 and he smalles value permied for r2 is The recommended value for r1 is 3.50 for low sensiiviy, 3.00 for medium sensiiviy, and 2.70 for high sensiiviy. METHOD senence The METHOD senence is used o specify he likelihood funcion used in he calculaion of residuals. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. The defaul is EXACT.

239 7.46 OUTLIER DETECTION AND ADJUSTMENT OADJUSTMENT senence The OADJUSTMENT senence is used o specify he mehod of oulier esimaion and adjusmen. The keyword may be SEQUENTIAL for he deecion and adjusmen of ouliers sequenially from larges effec o smalles (see Chang, Tiao and Chen 1988). JOINT specifies he deecion and join esimaion of oulier effecs (he defaul). The use of NONE is equivalen o using ESTIM (excep missing daa are esimaed). STDEV senence The STDEV senence is used o specify a mehod for he esimaion of σ a. TRIM(r) specifies ha an rx100% rimmed sandard deviaion is used (i.e., he op rx100% larges observaions, according o absolue values, are excluded from he compuaion). A specificaion of TRIM(0.0) indicaes ha σ a is compued a each observaion (residual) using all daa excep he curren observaion. TRIM(0.0) is he defaul. MAD(r) specifies ha he median absolue deviaion is used fo r he esimaion of σ a ( σ a = 1.483*median absolue deviaion). HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace unil he end of he session. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: FORECASTS: STD_ERRS: a new variable ha sores he original values of he series up o he forecas origin, and he forecass afer he origin. a new variable ha sores he value 0.0 up o he forecas origin, and he ndard errors of he forecass afer he forecas origin. Noe ha if he number of variables specified (say m) is fewer han he number of forecasing ime origins, hen only he forecass and sandard errors for he firs m ime origins will be held.

240 OUTLIER DETECTION AND ADJUSTMENT 7.47 OFILTER Paragraph The OFILTER paragraph is used o perform oulier deecion, generae residual ime series wih oulier adjusmen, and he adjused oupu ime series. This paragraph can be used in conjuncion wih fied models from eiher he OESTIM or ESTIM paragraph. Synax for he OFILTER paragraph Brief synax OFILTER MODEL model-name. NEW-SERIES IN v1, v2, v3, v4, v5. TYPES ARE w1, w2, Required senence: MODEL, NEW-SERIES Full synax OFILTER MODEL model-name. NEW-SERIES IN v1, v2, v3, v4, v5. TYPES ARE w1, w2, DELTA IS r. OSTOP IS MXOUTLIERS(i1), CRITICAL(r), MXESTIM(i2). METHOD IS w. OADJUSTMENT IS w. STDEV IS w(r). SPAN IS i1, i2. Required senence: MODEL, NEW-SERIES Senences used in he OFILTER paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model o be esimaed. The label mus be one specified in a previous TSMODEL paragraph. I is a required senence. NEW-SERIES senence The NEW-SERIES senence is used o specify he labels (names) of variables o be creaed for saving informaion of he oulier deecion process. Only hose resuls desired o be reained need be named. The defaul is ha no variable is reained afer he paragraph is execued. The variables ha may be reained (and he posiion a label mus occupy in he senence) are: v1: he name used o sore he residuals afer all oulier adjusmens

241 7.48 OUTLIER DETECTION AND ADJUSTMENT v2: he name used o sore he adjused series (i.e., he resulan series afer removing eeced oulier effecs from he original observaions) v3: he name used o sore an indicaor variable designaing he ypes of ouliers, if any, found during he oulier deecion process. The value of he -h observaion of his variable is 0 if he -h value of he ime series is no an oulier; 2 if i is an innovaive oulier; 3 if i is an addiive oulier; 4 if i is a emporary change; 5 if i is a level shif, and 1 if is value is missing. v4: he name used o sore he esimaes of any deeced ouliers v5: he name used o sore he effecs of deeced ouliers on residuals TYPES senence The TYPES senence is used o specify ypes of ouliers o be deeced. The valid keywords are AO (addiive oulier), IO (innovaive oulier), LS (level shif), and TC (emporary change). The defaul is IO, AO, TC, and LS. DELTA senence The DELTA senence is used o specify he δ value employed for he TC oulier (see Secions and 7.2.4). The defaul is δ=0.7. OSTOP senence The OSTOP senence is used o specify he sopping crierion for oulier deecion. Parameer esimaion and oulier deecion and adjusmen are done ieraively. If any oulier is deeced afer a parameer esimaion, he ime series is adjused for ouliers and parameers are re-esimaed. The ieraion sops if he maximum number of ouliers ha may be adjused is reached, if he maximum number of re-esimaions of parameers is reached; or if all oulier saisics are smaller han a specified criical value. The argumen for he keyword MXOUTLIERS (i1) specifies he maximum number of ouliers permied o be deeced and adjused. The defaul for i1 is equal o 10% of he number of observaions. The argumen for he keyword CRITICAL (r) specifies a criical value for esing he presence of ouliers. The recommended value for r is 3.50 for low sensiiviy, 3.00 for medium sensiiviy, and 2.70 for high sensiiviy. The defaul for r is 3.0. The argumen for he keyword MXESTIM (i2) specifies he maximum number of reesimaions of model parameers wihin each esimaion. The defaul for i2 is 3. METHOD senence The METHOD senence is used o specify he mehod for he compuaion of he likelihood funcion used in model esimaion. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. The defaul is CONDITIONAL.

242 OUTLIER DETECTION AND ADJUSTMENT 7.49 OADJUSTMENT senence The OADJUSTMENT senence is used o specify he mehod of oulier esimaion and adjusmen. The keyword may be SEQUENTIAL for he deecion and adjusmen of ouliers sequenially from larges effec o smalles (see Chang, Tiao and Chen 1988). JOINT specifies he deecion and join esimaion of oulier effecs (he defaul). The use of NONE is equivalen o using ESTIM (excep missing daa are esimaed). STDEV senence The STDEV senence is used o specify a mehod for he esimaion of σ a. TRIM(r) specifies ha an rx100% rimmed sandard deviaion is used (i.e., he op rx100% larges observaions, according o absolue values, are excluded from he compuaion). A specificaion of TRIM(0.0) indicaes ha σ a is compued a each observaion (residual) using all daa excep he curren observaion. TRIM(0.0) is he defaul. MAD(r) specifies ha he median absolue deviaion is used for he esimaion of σ a ( σ a = 1.483*median absolue deviaion). SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which daa are analyzed. The defaul is he maximum span available for he series. OUTLIER Paragraph The OUTLIER paragraph is used for he deecion of ouliers in a ime series using he deecion procedure of Chang (1982) (as described in Secion 7.2.2). The OUTLIER paragraph can be used in conjuncion wih fied models from he ESTIM paragraph. This paragraph can be used for he deecion of AO, IO and LS ouliers only. The OFILTER paragraph employs a procedure of Chen and Liu (1990), and may be used in lieu of he OUTLIER paragraph. Synax for he OUTLIER paragraph Brief synax OUTLIER MODEL model-name. TYPES ARE w1, w2, INDICATOR IN v. Required senence: MODEL

243 7.50 OUTLIER DETECTION AND ADJUSTMENT Full synax OUTLIER MODEL model-name. TYPES ARE w1, w2, OLD IN v. RESIDUAL IN v. INDICATOR IN v. STOP IS MAXIT(i), CRITICAL(r). VARIANCE IS TRIMMED(r). SPAIN IS i1, i2. Required senence: MODEL Senences used in he OFILTER paragraph MODEL senence The MODEL senence is used o specify he label (name) of a univariae ime series model defined previously ha will be used in he deecion of ouliers associaed wih he oupu variable of he model or wih he variable(s) specified in he OLD or RESIDUAL senence. TYPES senence The TYPES senence is used o specify ypes of ouliers o be deeced. The valid keywords are AO (addiive oulier), IO (innovaive oulier), and LS (level shif). The defaul is AO, and IO. OLD senence The OLD senence is used o specify he name of he series for which oulier deecion will be performed. If his senence is omied, he oupu variable of he univariae model specified in he MODEL senence will be used in oulier deecion. RESIDUAL senence The RESIDUAL senence is used o specify he name of a residual series for which oulier deecion will be performed. Compuaionally, when his senence is used, his specified residual series, raher han ha derived from he oupu series and he model, will be used for oulier deecion. However, some compuaions are sill based on he specified model. INDICATOR senence The INDICATOR senence is used o specify he label (name) for an indicaor variable designaing he ypes of ouliers, if any, ha are deermined during he oulier deecion process. The value of he -h observaion of his variable is 0 if he -h value of he ime series is no an oulier, 2 if an addiive oulier, 3 if an innovaive oulier, and 4 if a level shif. STOP senence The STOP senence is used o specify he sopping crierion for he oulier deecion. MAXIT(i) specifies he maximum number of ieraions (i) o be performed, and

244 OUTLIER DETECTION AND ADJUSTMENT 7.51 CRITICAL(r) specifies a criical value for esing he presence of ouliers. The ieraion sops if he maximum number of ieraions is reached or if all oulier saisics are smaller han his criical value. The recommended value for r is 3.50 for low sensiiviy, 3.00 for medium sensiiviy, and 2.50 for high sensiiviy. The defaul is 3.00 for r. VARIANCE senence The VARIANCE senence is used o specify he amoun of rimming o be performed in he compuaion of robus residual variance. For he ordered values of he residual series, r percen of boh he smalles and larges values is removed in he compuaion of variance. The defaul is r=0.0, no rimming. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa are analyzed. The defaul is he maximum span available for he variables. REFERENCES Alwan, L.C. and Robers, H.V. (1985). Time Series Modeling for Saisical Process Conrol. Journal of Business & Economic Saisics 6: Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden Day. (Revised ediion published in 1976). Box, G.E.P. and Tiao, G.C. (1975). Inervenion Analysis wih Applicaion o Economic and Environmenal Problems. Journal of he American Saisical Associaion 70: Chang, I. (1982). Ouliers in Time Series. Unpublished Ph.D. Disseraion, Universiy of Wisconsin-Madison, Deparmen of Saisics. Chang, I., Tiao, G.C. and Chen, C. (1988). Esimaion of Time Series Parameers in he Presence of Ouliers. Technomerics 30: Chen, C. and Liu, L.-M. (1990). Join Esimaion of Model Parameers and Oulier Effecs in Time Series. Working Paper Series, Scienific Compuing Associaes, P.O. Box 625, DeKalb, Illinois To appear in he Journal of he American Saisical Associaion (1993). Chen, C. and Liu, L.-M. (1991). Forecasing Time Series wih Ouliers. Working Paper Series, Scienific Compuing Associaes, P.O. Box 625, DeKalb, Illinois To appear in he Journal of Forecasing. Deming, W.C. (1982). Qualiy, Produciviy and Compeiive Posiion. Cambridge, MA: MIT Cener for Advanced Engineering Sudy. Fox, A.J. (1972). Ouliers in Time Series. Journal of he Royal Saisical Sociey, Series B 34: Hillmer, S.C. (1984). Monioring and Adjusing Forecass in he Presence of Addiive Ouliers. Journal of Forecasing 3:

245 7.52 OUTLIER DETECTION AND ADJUSTMENT Hillmer, S.C., Bell, W.R. and Tiao, G.C. (1983). Modeling Consideraions in he Seasonal Adjusmen of Economic Time Series. Applied Time Series Analysis of Economic Daa, Washingon, D.C.: US Bureau of he Census, Ledoler, J. (1987). The Effec of Ouliers on he Esimaes in and he Forecass from ARIMA Time Series Models, American Saisical Associaion 1987 Proceedings of he Business and Economic Saisics Secion, Ledoler, J. (1989). The Effec of Addiive Ouliers on he Forecass from ARIMA Models. Inernaional Journal of Forecasing 5: Liu, L.-M., and Chen, C. (1991). Recen Developmens of Time Series Analysis in Inervenion in Environmenal Impac Sudies. Journal of Environmenal Science and Healh A 26: Ljung, G.M. (1989a). A Noe on he Esimaion of Missing Values in Time Series. Communicaions in Saisics, B 17: Ljung, G.M. (1989b). Ouliers and Missing Observaions in Time Series. American Saisical Associaion 1989 Proceedings of he Business and Economic Saisics Secion: Pankraz, A. (1991). Forecasing wih Dynamic Regression Models. New York: John Wiley & Sons. Tsay, R.S. (1988). Ouliers, Level Shifs, and Variance Changes in Time Series. Journal of Forecasing 7: Wei, W.W.S. (1990). Time Series Analysis: Univariae and Mulivariae Mehods. Redwood Ciy, CA: Addison-wesley.

246 CHAPTER 8 TRANSFER FUNCTION MODELING In Chaper 4, we discussed relaing a response variable o one or more explanaory variables using linear regression models. We observed he deficiency of regression analysis when he error erms of he model were serially correlaed. In order o accoun for he correlaed srucure of ime series daa, auoregressive-inegraed moving average (ARIMA) models were inroduced. In Chapers 5 hrough 7, we presened aspecs of he modeling and forecasing of a single ime series. Chaper 5 laid he foundaions of ARIMA modeling. In Chaper 6, we exended he ARIMA model o incorporae (deerminisic) inervenion componens ino he model. Chaper 7 discussed he handling of ouliers and missing daa ha may be presen in a ime series. The univariae modeling mehods presened in Chapers 5 hrough 7 are useful for he analysis of a single ime series. In such a case, we basically limi our modeling o he informaion conained in he series own pas, and we do no explicily use he informaion conained in oher relaed (sochasic) ime series. In many cases, we may be able o relae he response (i.e., he observed value) of one series o is own pas values, and also o he pas and presen values of oher ime series. In his manner we effecively merge he basic conceps of he regression model wih ha of ARIMA models. In his chaper we inroduce a class of models known as ransfer funcion models. As will be seen, ransfer funcion models are flexible ime series models ha can be used for a variey of applicaions. A simple scheme for ransfer funcion modeling is also presened. An alernaive o his simple scheme, he classical mehod for ransfer funcion modeling, is conained in Secion Exending he Linear Regression Model: Regression wih Serially Correlaed Errors As an inroducion o ransfer funcion models, we begin wih he linear regression model. A brief overview of linear regression is found in Secion 4.1. In Secion 4.3, we illusraed he use of regression models for ime dependen daa in an analysis of hree series relaed o he sock marke. The daa consis of monhly observaions (from January 1976 hrough 1990) of (1) The monhly average of he Sandard and Poor s 500 sock index, (2) The monhly average of long erm governmen securiy ineres raes, and (3) The monhly composie index of leading indicaors.

247 8.2 TRANSFER FUNCTION MODELING The daa, lised in Table 4.2 and shown in Figure 4.1, are sored in he SCA workspace under he labels, SP500, LONGTERM and LINDCTR, respecively. In Chaper 4 we limied our analysis o only he firs 141 observaions. The naural logarihms of he series were used in order o provide a more convenien inerpreaion. Plos of he log ransformed series used in he analysis are given in Figure 8.1. The daa analyzed are sored in he SCA workspace under he labels LNSP500, LNLONG and LNLEAD. Figure 8.1 Logged sock marke daa (January 1976 hrough Sepember 1987) Using he regression model o incorporae serial correlaion In Chaper 4, a regression of LNSP500 on LNLONG and LNLEAD was performed. Serial correlaion was found in he residual series (see Secion 4.3.1). In an effor o accoun for serial correlaion, a dynamic regression was considered (see Secion 4.3.2). Specifically, we indicaed ha we could regress he curren monhly observaion of LNSP500 on he curren values of LNLONG and LNLEAD and on he values of LNLONG, LNLEAD and

248 TRANSFER FUNCTION MODELING 8.3 LNSP500 ha were observed in he prior monh. The fied equaion for his model can be wrien as LNSP500 = b0 + b1lnlong + b2lnlong 1 (8.1) + b LNLEAD + b LNLEAD + b LNSP In (8.1) he serial correlaion, or he memory mainained by he response variable LNSP500 is accouned for hrough he inclusion of he mos recenly observed value of LNSP500 as a regressor (i.e., an explanaory variable) in he regression. We can creae he hree lagged explanaory variables by using he LAG paragraph (see Appendix C). SCA oupu is suppressed. -->LAG LNSP500. NEW IS LNSP1. -->LAG LNLONG. NEW IS LNLONG1. -->LAG LNLEAD. NEW IS LNLEAD1. We can obain he fi for he model of (8.1) by enering -->REGRESS LNSP500,LNLONG,LNLONG1,LNLEAD,LNLEAD1,LNSP1. DW. --> HOLD RESIDUALS(RES). The residuals are mainained for diagnosic checking and he Durbin-Wason saisic (see Secion 4.3.1) is prined as a check for firs-order serial correlaion in he residuals. We obain 1 REGRESSION ANALYSIS FOR THE VARIABLE LNSP500 PREDICTOR COEFFICIENT STD. ERROR T-VALUE INTERCEPT LNLONG LNLONG LNLEAD LNLEAD LNSP CORRELATION MATRIX OF REGRESSION COEFFICIENTS LNLONG 1.00 LNLONG LNLEAD LNLEAD LNSP E LNLONG LNLONG1 LNLEAD LNLEAD1 LNSP1 S =.0294 R**2 = 99.3% R**2(ADJ) = 99.3% ANALYSIS OF VARIANCE TABLE SOURCE SUM OF SQUARES DF MEAN SQUARE F-RATIO REGRESSION RESIDUAL ADJ. TOTAL SOURCE SEQUENTIAL SS DF MEAN SQUARE F-RATIO LNLONG

249 8.4 TRANSFER FUNCTION MODELING LNLONG LNLEAD LNLEAD LNSP DURBIN-WATSON STATISTIC = 1.73 The value of he Durbin-Wason saisic does no indicae any firs-order serial correlaion in he residual series. The ACF of he residuals (no shown here) is relaively clean. The fied equaion obained is LNSP500 = ( 0.34)LNLONG + (0.34)LNLONG 1 + (0.66)LNLEAD + ( 0.63)LNLEAD + (1.00)LNSP (8.2) If we collec like erms and use he backshif operaor, we can re-wrie (8.2) as or approximaely (1 1.00B)LNSP500 = 0.12 ( B)LNLONG + ( B)LNLEAD, (8.3) (1 B)LNSP500. (8.4) = 0.12 (0.34)(1 B)LNLONG + (0.66)(1 B)LNLEAD Equaion (8.4) suggess ha we model series comprised of he differences of he logged daa raher han he original series. We fi jus such a model previously (see Secion 4.3.3) and obained almos idenical esimaes for he parameers associaed wih LNLONG and LNLEAD A ime series model for regression If we rea Wih a sligh generalizaion, equaion (8.4) can also be inerpreed as a fi of he model (1 φ B)LNSP500. (8.5) =β 0 +β1(1 φ B)LNLONG +β2(1 φ B)LNLEAD + a (1 φb) as a mahemaical operaor, we can divide all erms of (8.5) by i o obain 1 LNSP500 = C +β 1 LNLONG +β 2 LNLEAD + a 1 φ B, (8.6) where C =β0 /(1 φ ). We can also represen he error componen by N, where N 1 a 1 B = or equivalenly (1 B)N a φ φ =. Equaion (8.6) is of he same form as an inervenion model (see Chaper 6), excep LNLONG and LNLEAD are no deerminisic binary series. Here boh LNLONG and

250 TRANSFER FUNCTION MODELING 8.5 LNLEAD are sochasic series, ha is, he series exhibi random variaion. We can fi he model specified in (8.6) using he TSMODEL and ESTIM paragraphs as follows (SCA oupu is edied for presenaion purposes): -->TSMODEL STOCKMDL. MODEL IS LNSP500 = CNST + (B1)LNLONG + (B2)LNLEAD --> + 1/(1-PHI*B)NOISE. -->ESTIM STOCKMDL SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED LNSP500 RANDOM ORIGINAL NONE LNLONG RANDOM ORIGINAL NONE LNLEAD RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE B1 LNLONG NUM. 1 0 NONE B2 LNLEAD NUM. 1 0 NONE PHI LNSP500 D-AR 1 1 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-03 RESIDUAL STANDARD ERROR E-01 The resuls above are virually idenical o hose of he above regression. The only perceived difference is he esimae of he consan erm, which is no significan. The esimae of φ is close o 1. As a resul he consan erm in (8.6) may assume any value. Moreover, we may be beer served by using a model involving differenced series. Fiing such a model yields resuls ha are idenical o he regression fi given in Secion and is no presened here. By using he ime series model represenaion of (8.6) for his example, we achieve a number of valuable resuls. These include: (1) Maximum likelihood esimaes of he regression parameers ogeher wih an AR(1) adjusmen of he disurbance erm; (2) A model ha is easy o inerpre; and (3) A clear indicaion ha we should analyze differenced series raher han he original (log ransformed) series. Alhough he model in (8.6) has advanages, i also has some limiaions. The mos obvious limiaions are

251 8.6 TRANSFER FUNCTION MODELING (1) he use of only conemporaneous (i.e., lag 0) informaion from he explanaory series; and (2) resricing he disurbance erm o ha of an AR(1) process only. I would be beneficial if we can appropriaely exend he model. We do so in he nex secion. 8.2 The ransfer funcion model The basic form of he model given in (8.6) above is Y, (8.7) = C+β 1X1 +β 2X2 + N where N represens a saionary ARMA process. To avoid any noaional confusions, we will develop he ransfer funcion model from equaion (8.7), bu will resric our discussion o a single explanaory variable. Hence we firs consider he model Y. (8.8) = C+β 1X + N In equaion (8.8), he response (oupu) variable is relaed o he curren (conemporaneous) value of he explanaory (inpu) variable X. We can exend (8.8) by replacing β1 wih eiher a linear polynomial or a raional polynomial operaor. Specifically, if we assume ha he inpu and oupu variables are boh saionary ime series, he general form of he single-inpu, single-oupu ransfer funcion model can be expressed as ω(b) Y = C+ X + N δ(b), (8.9) θ(b) where N follows an ARMA model (i.e., N = a, or φ (B)N =φ (B)a ), φ (B) and 2 s 1 ω (B) = ( ω +ω B+ω B + +ω B )B b, (8.10) s 1 2 r δ (B) = (1 δb δ B δ B ). (8.11) 1 2 r In pracice, he number of erms in ω(b) is small and he value of r in (8.11) is usually 0 or 1. We can also represen he raional polynomial operaor ω(b) / δ (B) wih a linear operaor v(b), where 2 v(b) = v + v B + v B +. (8.12) Y The polynomial operaors are relaed according o ω(b) v(b) =. δ (B)

252 TRANSFER FUNCTION MODELING 8.7 Since we assume he ransfer funcion is sable (i.e., no explosive), he coefficiens v0, v1, v2,... diminish o zero regardless he order of he δ (B) polynomial. If he linear operaor v(b) is used, he model given in (8.9) can be wrien as Y. (8.13) = C+ v(b)x + N In he even ha δ (B) = 1 (i.e., r=0), we have v(b)=ω(b) and v(b) has a finie number of erms. In he case ha δ(b) 1 (i.e., r>0), hen v(b) has an infinie number of erms. For convenience, we will ofen use v(b) o denoe eiher he linear or raional form of he ransfer funcion in he remainder of his chaper. A discussion of he erms of hese operaors is given in Secion N in he above models is referred o as he disurbance of he ransfer funcion models. I has he same inerpreaion as he disurbance of he inervenion model of Chaper 6. The represenaion in (8.9) can be exended direcly o he case of muliple-inpu ransfer funcion models as ω1(b) ωm(b) Y = C+ X1 + + Xm + N δ (B) δ (B), (8.14) 1 m We can also use he linear form of he ransfer funcion by wriing (8.13) as Y. (8.15) = C + v 1(B)X1 + v 2(B)X2 + + v m(b)xm + N Inerpreing he erms of he ransfer funcion operaors The value b in (8.10) represens he delay of response in he process. The parameers of he numeraor polynomial ω(b) describe he iniial effecs of he inpu (as well as any effecs ha follow no specific paern). The denominaor polynomial δ (B) characerizes he decay paern of iniial effecs in he response. As noed previous ly, he operaors ω (B) and δ (B) usually consis of only a few erms. The mos frequen represenaions of δ(b) are eiher δ (B) = 1 or δ (B) = 1 δ B. The values v0, v1, v2,... are eiher referred o as he ransfer funcion (TF) weighs or he impulse response weighs for he inpu series X (see Chaper 9 of Box and Jenkins, 1970). These weighs provide a measure of how he inpu series affecs he oupu series, and he weigh given o each ime lag. Tha is, v0 is a measure of how he curren response is affeced by he curren value of he inpu series; v1 is a measure of how he curren response is affeced by he value of he inpu series one period ago; v2 is a measure of how he curren response is affeced by he value of he inpu series wo periods ago; and so on. The sum of all weighs, usually represened by g, is called he seady sae gain and represens he oal change in he mean level of he response variable if we mainain he inpu a a single uni increase above is mean level.

253 8.8 TRANSFER FUNCTION MODELING Assumpions of he ransfer funcion model As noed previously, he general form of he ransfer funcion model is Y = C+ v(b)x + N, where v(b) describes he ransfer funcion beween X and Y (eiher in a linear form or as a raional polynomial). There are wo principal assumpions of his model: (1) The inpu series can affec he response variable, bu no conversely (i.e., he relaionship beween X and Y is unidirecional); and (2) The inpu series is assumed o be independen of he disurbance. Anoher aci assumpion of he model is ha he sysem being modeled is sable. This is usually manifesed as assuming he inpu and oupu series are saionary ime series, and ha he sum of he TF weighs is finie. The assumpion ha he oupu series does no affec he inpu series is ofen appropriae for physical or engineering processes. In hese cases he inpu may be viewed as a conroller mechanism ha is used o mainain a cerain level in he response variable. If we model economic and business daa, we may wish o use more dynamic models ha allow for bidirecional (or feedback) relaionships. Examples of such models include simulaneous ransfer funcion (STF) models, vecor ARMA models and numerous economeric models. These are no discussed here. However, alhough he assumpion of a unidirecional relaionship may no be sricly rue, ransfer funcion models can sill be used effecively in modeling business and economic daa Relaionship of ransfer funcion models o regression models As seen above, here are many similariies beween ransfer funcion and linear regression models. The models differ in wo imporan respecs: (1) The assumpion regarding he disurbance (or error) erm, and (2) The complexiy of he parameer represenaions. The firs of hese differences has been discussed. Transfer funcion models are more general han regression models since hey permi an ARMA represenaion for he disurbance componen of he model. The second difference is also imporan. If we consider he raional polynomial represenaion of he ransfer funcion (i.e., ω(b) / δ (B) ), hen when δ (B) = 1 we obain he ypical lagged regression model (ha allows for lagged relaionships and correlaed error). If δ(b) 1, we permi a nonlinear represenaion of he model; and may have a more effecive uilizaion of parameers in a model.

254 TRANSFER FUNCTION MODELING Some special cases of he ransfer funcion model We have already indicaed how he ransfer funcion model is an exension of various oher models. For he sake of compleeness, we now summarize some special cases of he ransfer funcion model. We relae hese cases o he muliple-inpu ransfer funcion model shown in (8.14). (A) Simple linear regression If we le C =β0 ; ω j(b) = β j and δ (B) = 1 for each ransfer funcion; and j N = a, we have he classic linear regression model Y =β 0 +β 1X1 +β 2X2 + +β mxm + a. (B) Firs-order auoregressive models If we assume N = {1/(1 φb)}a (equivalenly, (1 φ B)N = a ) in he represenaion above, hen we have a muliple linear regression wih a firs-order auoregressive error process. Cochrane and Orcu (1949) and Hildreh and Lu (1960) proposed procedures for he esimaion of φ in such a siuaion. (C) Disribued lag and Koyck disribued lag model The ransfer funcion represenaion wih N = a is also known as a disribued lag model. A special case of his model was considered by Koyck (1954). We can obain he Koyck model from he raional polynomial represenaion of a single-inpu equaion by leing ω (B) =ω and δ (B) = 1 δb. Using his represenaion, we have 0 ω(b) ω ν (B) = = (B) 1 B 0 δ δ. (8.16) If we now muliply boh sides of (8.16) by (1 δ B) we obain or ν(b)(1 δ B) = ω 0 ( ν. (8.17) 0 +ν 1B+ν 2B 2 + )(1 δ B) =ω0 By expanding he lef-hand side of (8.17), we obain 2 ν. (8.18) 0 + ( ν1 δν 0)B + ( ν2 δν 1)B + =ω0

255 8.10 TRANSFER FUNCTION MODELING From (8.18), we see ν 0 =ω 0 and ν j =δν j 1 for j 1. Hence ν = w, ν =δω, ν =δ ω,..., ν =δ w,... 2 k k 0 In he Koyck model, here is a conemporaneous effec ha hen decays exponenially. The seady sae gain of his model is ω 1 0 ν 0 +ν 1+ν 2 + =ν (1) =. (8.19) δ We see from (8.19) ha he seady sae gain may be obained by leing B=1 in he polynomial operaors. Hence ω(1) g =ν (1) =. δ (1) (D) ARIMA models If here are no explanaory variables, hen he ransfer funcion model is he ARIMA model discussed in Chaper 5. (E) Inervenion models The inervenion models discussed in Chaper 6 can be obained direcly if all inpu series are binary series (ha is, series consising of only he values 0 and 1). 8.3 Transfer Funcion Modeling As in he case of inervenion analysis (see Chaper 6), here are wo disinc componens in a ransfer funcion model. One componen consiss of he explanaory variables and he ransfer funcion for each variable. The disurbance erm is he oher componen. For inervenion models, we need o idenify a model for he disurbance while we posulae models for he res. However, for ransfer funcion models, models are idenified for boh componens based on he daa The ieraive modeling sraegy As in he case of ARIMA model building (see Chaper 5), here are hree sages for ransfer funcion modeling: idenificaion, esimaion, and diagnosic checking. Here he mos difficul of hese sages is he idenificaion of one or more reasonable ransfer funcion models. Some preliminary modeling ordinarily precedes he deerminaion of he form of he ransfer funcion and he ARIMA model of he disurbance erm. Plos of he series are useful o deec any poenial spurious observaions, he need for a variance sabilizing

256 TRANSFER FUNCTION MODELING 8.11 ransformaion, he possibiliy of he use of a saionary inducing operaion (e.g., differencing), and perhaps he naure of he ransfer funcion. Pankraz (1991, page 169) also saes ha i is good pracice o consruc separae ARIMA models for all series of our proposed model. Such ARIMA modeling may be viewed as par of preliminary analysis in ransfer funcion modeling. An ARIMA model for he oupu (response) is paricularly useful, and provides a measure for he relaive performance of a ransfer funcion model. Models for all inpu series are necessary if we inend o compue forecass from our esimaed model. In addiion, separae ARIMA models may provide useful modeling informaion The linear ransfer funcion (LTF) idenificaion mehod The idenificaion sage of ransfer funcion modeling can be divided ino hree pars: (1) he esimaion of a se of TF (ransfer funcion) weighs; (2) he deerminaion of he form of he ARMA model for he disurbance, and (3) he deerminaion of he form of a raional polynomial o represen he esimaed TF weighs if hese weighs display a die-ou paern. Two procedures have evolved for he realizaion of pars (1) and (2) above. One procedure uilizes a cross correlaion funcion and a filering echnique known as prewhiening. This procedure has been ermed he CCF mehod, and is discussed in Secion The oher procedure, discussed below, direcly uilizes he linear ransfer form of he ransfer funcion model and has been ermed he LTF mehod. The underlying raionale for each can be found in Box and Jenkins (1970). However, Box and Jenkins only provided a comprehensive procedure for single-inpu ransfer funcion modeling using he CCF mehod. As a resul, he CCF mehod has been he only mehod discussed in mos subsequen exs. The LTF mehod follows an approach proposed by Liu and Hanssens (1982) and is deailed in Liu e. al. (1986), Liu and Hudak (1985), Liu (1986, 1987), and Pankraz (1991). The LTF approach is appealing because i can be easily explained (as an exension o regression) and simplifies he idenificaion sage by reducing he seps necessary o obain required informaion. Moreover, he LTF mehod can be generalized o muliple-inpu ransfer funcion modeling easily. Such a generalizaion using he CCF mehod is difficul. Since we assume he ransfer funcion relaionship o be sable, in pracice he raional ransfer funcion model in (8.9) can be approximaed by he following linear model: 2 k Y, (8.20) = C + (v0 + v1b+ v2b + + vkb )X + N where k is a sufficienly large number. The above linear ransfer funcion model is he basis of he LTF mehod. Whenever we esimae (8.20) we obain informaion on boh he TF weighs and he series N. Informaion on he laer can be used o idenify an ARMA model for he disurbance process. Hence i is possible o reduce our modeling seps if we exploi N;

257 8.12 TRANSFER FUNCTION MODELING (8.20). The general scheme of he LTF mehod consiss of he seps given below. The complee se of seps assumes ha he inpu and oupu series are saionary. Hence, he mehod includes a check for saionariy. (1) Iniially esimae (8.20) for a sufficienly large value of k and a reasonable approximaion for N. These are discussed below. (2) Examine he esimaes of he parameers in he model for N and he residuals from he fied equaion. The esimaed parameers may indicae ha differencing is necessary (see Secion 8.3.3). The residuals are used o discover any gross discrepancies in he model. (3) Use he esimaed TF weighs o deermine he form of he ransfer funcion (see Secion 8.3.5). In addiion, examine he disurbance from he fied model, ha is, 2 k Nˆ, (8.21) = Y C ˆ (vˆ ˆ ˆ ˆ 0 + v1b+ v2b + + vkb )X where 12,..., 13 are esimaed values. We now may use sandard ARMA echniques o deermine an appropriae ARMA model for N. If, in sep (2), i is deermined ha differencing is necessary, hen he complee se of seps is repeaed for differenced daa. Sep (3) is only valid if he series are saionary. There are wo key elemens in he LTF mehod, he choice for he number of TF weighs and he proxy used for he disurbance erm. The laer is discussed in more deail in Secion below. The choice for he number of TF weighs is somewha arbirary, bu can be based on pracical consideraions. There should be enough weighs o accoun for he longes lagged response beween inpu and oupu. This may be known based on prior knowledge, heory, or physical properies (e.g., seasonaliy) of he process under sudy. Ulimaely, he sample size will limi our choice for he number of weighs. A small sample size dicaes ha relaively few weighs be used Useful approximaions for he disurbance erm in he LTF mehod In he LTF mehod oulined above, he disurbance erm should no assumed o be whie noise. Tha is, he approximaion used for he model of N should no be N = a. If we use reasonable approximaions for N, we can boh obain more efficien esimaes of he TF weighs and obain useful informaion regarding differencing in cerain cases. In paricular, wo useful represenaions of he disurbance erm are: (a) An AR(1) approximaion when here is no seasonaliy presen. Tha is, 1 N = a. 1 φ B

258 TRANSFER FUNCTION MODELING 8.13 (b) A muliplicaive AR approximaion when we have seasonaliy (wih seasonal period s). Specifically, we use N = 1 s (1 φ1b)(1 φ2b ) a. The usefulness of hese approximaions becomes clear afer a shor inspecion. For example, consider he AR(1) approximaion for he nonseasonal case. The approximaion is useful since: (1) i is correc if he disurbance is acually an AR(1) process; (2) i is a reasonable approximaion if he disurbance acually follows a pure MA process of low order; (3) i provides an indicaion of differencing if φ ˆ 1 or if he ACF of of posiive values ha die ou slowly; and (4) i validaes a whie noise represenaion for N if φˆ 0. N consiss A similar argumen is rue for he use of a muliplicaive AR model when seasonaliy is presen An example of he LTF mehod: Sock marke daa To briefly illusrae he LTF mehod, we will coninue o model he sock marke daa of Secion 8.1 using a ransfer funcion model. Here he LTF mehod is applied in a mulipleinpu model (wo inpu variables in his case). We begin he analysis by exending he model used in Secion Insead of limiing ourselves o conemporaneous erms only, we will firs fi he model LNSP500 = C + (v + v B + v B + v B + v B )LNLONG (w0 + wb 1 + wb 2 + wb 3 + wb)lnlead 4 + a 1 φ B The above model is an illusraion of he firs sep in he LTF mehod. Since he daa are nonseasonal, an AR(1) approximaion is used. We fi 5 weighs for each linear ransfer funcions (i.e., k=4). We can specify his model by enering -->TSMODEL STOCKLTF. MODEL IS LNSP500 = CONST + --> (0 TO 4; V0 TO V4)LNLONG + (0 TO 4; W0 TO W4)LNLEAD + 1/(1)NOISE. The specificaion above uses a shorhand noaion for all operaors (see Secions 5.4.5, and 8.7.6). Noe ha no variable label is used o mainain he esimae of φ. We do his deliberaely o force he iniial value used for φ o be 0.1 whenever he model is fi. In his way, we will no begin he esimaion process wih a value of φ ha may be inappropriae.

259 8.14 TRANSFER FUNCTION MODELING However, i is useful o mainain esimaes of he TF weighs (see Secions and 8.7.5). The SCA oupu for his specificaion has been suppressed. We can esimae he model STOCKLTF, and reain he residuals and esimaed disurbance erm, by enering he following command (SCA oupu is edied for presenaion purposes): -->ESTIM STOCKLTF. HOLD RESIDUALS(RES), DISTURBANCE(NT) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKLTF VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED LNSP500 RANDOM ORIGINAL NONE LNLONG RANDOM ORIGINAL NONE LNLEAD RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE V0 LNLONG NUM. 1 0 NONE V1 LNLONG NUM. 1 1 NONE V2 LNLONG NUM. 1 2 NONE V3 LNLONG NUM. 1 3 NONE V4 LNLONG NUM. 1 4 NONE W0 LNLEAD NUM. 1 0 NONE W1 LNLEAD NUM. 1 1 NONE W2 LNLEAD NUM. 1 2 NONE W3 LNLEAD NUM. 1 3 NONE W4 LNLEAD NUM. 1 4 NONE LNSP500 D-AR 1 1 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-03 RESIDUAL STANDARD ERROR E-01 The esimae of φ is virually 1 (in accord wih previous esimaions). Hence we will now re-specify and esimae he same model as above, wih all series differenced. We may ener he following sequence of commands (SCA oupu is edied for presenaion purposes): -->TSMODEL STOCKLTF. MODEL IS LNSP500(1) = CONST + --> (0 TO 4; V0 TO V4)LNLONG(1) + (0 TO 4; W0 TO W4)LNLEAD(1) + --> 1/(1)NOISE. -->ESTIM STOCKLTF. HOLD RESIDUALS(RES), DISTURBANCE(NT).

260 TRANSFER FUNCTION MODELING 8.15 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKLTF VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNLONG RANDOM ORIGINAL (1-B ) 1 LNLEAD RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CONST CNST 1 0 NONE V0 LNLONG NUM. 1 0 NONE V1 LNLONG NUM. 1 1 NONE V2 LNLONG NUM. 1 2 NONE V3 LNLONG NUM. 1 3 NONE V4 LNLONG NUM. 1 4 NONE W0 LNLEAD NUM. 1 0 NONE W1 LNLEAD NUM. 1 1 NONE W2 LNLEAD NUM. 1 2 NONE W3 LNLEAD NUM. 1 3 NONE W4 LNLEAD NUM. 1 4 NONE LNSP500 D-AR 1 1 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-03 RESIDUAL STANDARD ERROR E-01 We have achieved a fied saionary model. Before we use he resuls of his model, we should examine he ACF of he residuals o see if here are any gross discrepancies ha sill need o be correced. We obain he ACF for he residuals (sored in RES) by enering -->ACF RES. MAXLAG IS 12. TIME PERIOD ANALYZED TO 141 NAME OF THE SERIES RES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q

261 8.16 TRANSFER FUNCTION MODELING I I IXX I IXX IXX I IX IX IX XXI IXX XXXI + No anomalies are apparen. Hence, we can use he esimaed response weighs and esimaed disurbance erm o deermine a form for he ransfer funcion model. Noe we should no always expec an ACF paern as clean as he one above. Since we are roughly approximaing N, we may anicipae some significan lags in he ACF of he residuals. We only need o be concerned when he residual series is grossly differen from a whie noise process. We see ha he TF weighs associaed wih LNLEAD cu off afer he conemporaneous lag (i.e., lag 0). Moreover, only he esimae of he weigh associaed wih he conemporaneous lag for LNLONG is significan a he 5% level. The -value of V3 is near significance. Since he ransfer funcion weighs for boh inpus cu-off, here is no need o incorporae he denominaor polynomial δ (B). In his case δ (B) = 1 for each ransfer funcion and ω=v(b) i i. Because he value of V3 is near significance, we may wish o explore eiher of he models or (1 B)LNSP500 = C + (v )(1 B)LNLONG + (w )(1 B)LNLEAD + N 3 (1 B)LNSP500 = C + (v + v B )(1 B)LNLONG + (w )(1 B)LNLEAD + N. The former model is more plausible han he laer, unless here is a possible reason ha he curren percen change in he S&P's 500 index is influenced by he percen change in long erm governmen securiy ineres raes hree monhs ago. We can now use he esimaed disurbance erm, sored in he variable NT, o deermine a model for N. We can compue he ACF and PACF by enering -->IDEN NT. MAXLAG IS TIME PERIOD ANALYZED TO 141 NAME OF THE SERIES NT EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO)

262 TRANSFER FUNCTION MODELING 8.17 AUTOCORRELATIONS ST.E Q I IXXXX IXXX IX IXXX IXX I IX IX IX XI IXX XXXI + PARTIAL AUTOCORRELATIONS ST.E I IXXXX IXX I IXXX IX XI IX I I XI IXX XXXI + Boh he ACF and PACF cu-off afer he firs lag. Hence we can consider using eiher an MA(1) or AR(1) represenaion for N. We can also observe he EACF for NT by enering -->EACF NT. MAXLAG IS 12. TIME PERIOD ANALYZED TO 141 NAME OF THE SERIES NT EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO)

263 8.18 TRANSFER FUNCTION MODELING THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X O O O O O O O O O O O O (P= 1) X O O O O O O O O O O O O (P= 2) O O O O O O O O O O O O O (P= 3) O X X O O O O O O O O O O (P= 4) X X X O O O O O O O O O O (P= 5) X O X O O O O O O O O O O (P= 6) X X X O O O O O O O O O O The EACF seems o suppor an MA(1) represenaion for fiing eiher he model N. Hence we may consider 3 (1 B)LNSP500 = C + (V + V B )(1 B)LNLONG + (W )(1 B)LNLEAD + (1 θ B)a, or simplified variaions of his model. Esimaion resuls for various models are presened below. Model 1 Esimaes (and -values) for various ransfer funcion models of (1-B)LNSP500 (1 B)LNLONG (1 B)LNLEAD Consan v 0 v 3 w (2.21) (-4.92).145 (2.11).724 (3.26) θ (-1.64) σ a.0285 Model (2.41) (-5.20).142 (2.12).826 (3.73).0288 Model (2.71) (-5.31).700 (3.26).0291 The ACF of all he models above are relaively clean. Due o he similar values of a, we may likely choose he simples model (1 B)LNSP500 = (1 B)LNLONG (1 B)LNLEAD + a. The above model is virually idenical o ha obained in Secion

264 TRANSFER FUNCTION MODELING Example: Series M of Box and Jenkins As an illusraion of he complee ransfer funcion modeling procedure using he LTF mehod, we consider he daa of Series M of Box and Jenkins (1970). The oupu series (response) consiss of sales daa, and he inpu series (explanaory variable) is a leading indicaor. There are 150 observaions in each series. The daa are lised in Table 8.1 and are displayed in Figure 8.2. The daa are sored in he SCA workspace under he labels SALES and LEADING. Table 8.1 Daa of Series M of Box and Jenkins (1970) Oupu variable: Sales daa (read across a line) Inpu series: A leading indicaor (read across a line)

265 8.20 TRANSFER FUNCTION MODELING Figure 8.2 Sales daa of SERIES M of Box and Jenkins (1970) Leading indicaor daa of SERIES M Sales daa of SERIES M The sales daa were modeled previously using an ARIMA model (see Secion 5.2). In his secion, we will only use he firs 126 observaions for model building and esimaion. In Secion we will provide he revised esimaes of he ARIMA model for SALES for his span of daa. Esimaed models for boh SALES alone and he ransfer funcion model involving SALES and LEADING will be used o compue one-sep-ahead forecass from ime origins 126 hrough 149. We can hen compare ransfer funcion and ARIMA resuls wih acual values Preliminary modeling phase As noed in Secion 8.3.1, some preliminary exploraory analysis and modeling should precede he consrucion of a ransfer funcion model. This preliminary sage involves inferences drawn from plos or oher sources and he developmen of separae ARIMA models for (possibly) all series involved in our proposed model. We can use he plos of Figure 8.2 o make some iniial observaions regarding SALES and LEADING. We see ha (1) here are no apparen aberraions in eiher series, (2) he variaion presen in each series appears o be consan over ime, (3) boh series display non-saionary behavior as here is no fixed mean level,

266 TRANSFER FUNCTION MODELING 8.21 (4) he LEADING series appears o be a good indicaor for he SALES series as is peaks, valleys and urning poins are seen in SALES afer a shor delay. ARIMA models for SALES and LEADING An ARIMA model was consruced for SALES in Secion 5.2. This model was based on all 150 observaions of he series. The same model is found if only he firs 126 observaions are used (deails no shown here). The fied model obained in his case is (1 0.89B)(1 B)SALES = (1 0.64B)a, (8.22) wih σ a = These resuls are almos idenical o hose obained in Secion 2 of Chaper 5. An ARIMA model is now consruced for LEADING using only he firs 126 observaions, bu only he resuls are given here. The fied model for LEADING is found o be (1 B)LEADING = (1 0.44B)e, (8.23) wih e = The error series associaed wih he ARIMA model for LEADING is disinc from he error series associaed wih he disurbance of he ransfer funcion model since he series LEADING and N are assumed o be independen. The model informaion for LEADING is sored in he SCA workspace under he label LEADMDL. From he ime series plos of LEADING and SALES and he resuls from individual ARIMA model building, we may conclude ha differencing will be used in our ransfer funcion model and ha he underlying disurbance for SALES (i.e., N ) may conain a moving average erm. We will verify his in he idenificaion sage using he LTF mehod Transfer funcion idenificaion using he LTF mehod We will now use he LTF mehod o idenify a ransfer funcion model. Since here is no apparen seasonaliy in he daa, we will use an AR(1) approximaion for N. Alhough we suspec ha differencing is necessary, we will iniially examine he original series. Based on he plos in Figure 8.2, we may deec a delay in he process (of abou 2 o 5 ime periods). We will begin he LTF mehod wih 11 TF weighs (i.e., he 0h hrough 10h lags inclusive). We may decide o adjus he number of weighs laer. Hence he model we will fi is 10 SALES. (8.23) = C + (v0 + v1b + + v10b )LEADING + {1/(1 φb)}a We can specify his model by enering -->TSMODEL SALESMDL. MODEL IS SALES = CNST + (0 TO 10; V0 TO V10)LEADING --> + 1/(1)NOISE.

267 8.22 TRANSFER FUNCTION MODELING We used a shorhand noaion in he above model specificaion (see Secion 5.4.5). Tha is, 1/(1)NOISE indicaes an AR(1) represenaion for N; and (0 TO 10; V0 TO V10) in he above specificaion is equivalen o enering (VO + V1*B + V2*B**2 + V3*B**3 + V4*B**4 + V5*B**5 + V6*B**6 + V7*B**7 + V8*B**8 + V9*B**9 + V10*B**10) We suppressed he SCA oupu generaed by he above paragraph. To esimae his model, we may ener -->ESTIM SALESMDL. HOLD RESIDUALS(RES), DISTURBANCE(NT). The esimaes of all parameers will be held in he SCA workspace under he labels designaed in he previous TSMODEL paragraph. The HOLD senence is used above o designae ha he residuals of he fied model will be reained in he variable RES and he esimaed disurbance (i.e., ˆN ) will be reained in he variable NT. We obain he following (he SCA oupu is edied for presenaion purposes) SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED SALES RANDOM ORIGINAL NONE LEADING RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V0 LEADING NUM. 1 0 NONE V1 LEADING NUM. 1 1 NONE V2 LEADING NUM. 1 2 NONE V3 LEADING NUM. 1 3 NONE V4 LEADING NUM. 1 4 NONE V5 LEADING NUM. 1 5 NONE V6 LEADING NUM. 1 6 NONE V7 LEADING NUM. 1 7 NONE V8 LEADING NUM. 1 8 NONE V9 LEADING NUM. 1 9 NONE V10 LEADING NUM NONE SALES D-AR 1 1 NONE TOTAL SUM OF SQUARES E+05 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+01 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-01 RESIDUAL STANDARD ERROR E+00

268 TRANSFER FUNCTION MODELING 8.23 Our aenion is drawn immediaely o he esimae of he AR parameer. This value is essenially 1, as we anicipaed. Hence we may conclude ha we should employ differencing o achieve saionariy. We can also confirm his by compuing he ACF of he esimaed disurbance, NT. -->ACF NT. MAXLAG IS 12. TIME PERIOD ANALYZED TO 126 NAME OF THE SERIES NT EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXXX+XXXXXXXXXXXXXXXXXXX IXXXXXXX+XXXXXXXXXXXXXXX IXXXXXXXXX+XXXXXXXXXXXX IXXXXXXXXXX+XXXXXXXXXX IXXXXXXXXXXX+XXXXXXXX IXXXXXXXXXXXX+XXXXXX IXXXXXXXXXXXXX+XXXX IXXXXXXXXXXXXXX+XX IXXXXXXXXXXXXXXXX IXXXXXXXXXXXXXXXX IXXXXXXXXXXXXXXX IXXXXXXXXXXXXXX + We will now aler he model being fi o include differencing in he disurbance. Tha is, we wan o consider he model 10 SALES. (8.24) = C + (v0 + v1b + + v10b )LEADING + {1/(1 φb)(1 B)}a We canno fi he model of (8.23) direcly since a differencing operaor may no be specified in a denominaor (see Secion 6.5.3). The above model is equivalen o 10 (1 B)SALES. (8.25) = C + (v0 + + v10b )(1 B)LEADING + {1/(1 φb)}a We can specify and esimae his revised model in he same manner as we used above. Tha is, we can ener he following commands (SCA oupu is edied for presenaion purposes). -->TSMODEL SALESMDL. MODEL IS SALES(1) = CNST + --> (0 TO 10; V0 TO V10)LEADING(1) + 1/(1)NOISE. -->ESTIM SALESMDL. HOLD RESIDUALS(RES), DISTURBANCE(NT).

269 8.24 TRANSFER FUNCTION MODELING SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V0 LEADING NUM. 1 0 NONE V1 LEADING NUM. 1 1 NONE V2 LEADING NUM. 1 2 NONE V3 LEADING NUM. 1 3 NONE V4 LEADING NUM. 1 4 NONE V5 LEADING NUM. 1 5 NONE V6 LEADING NUM. 1 6 NONE V7 LEADING NUM. 1 7 NONE V8 LEADING NUM. 1 8 NONE V9 LEADING NUM. 1 9 NONE V10 LEADING NUM NONE SALES D-AR 1 1 NONE TOTAL SUM OF SQUARES E+05 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+01 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-01 RESIDUAL STANDARD ERROR E+00 The esimaes of he firs hree TF weighs (V0, V1 and V2) canno be saisically disinguished from 0. Hence here is a delay of hree ime periods in he process. This is consisen wih wha was observed in he ime series plos. The values of he esimaed TF weighs for he remaining lags are significan, bu exhibi a die-ou paern. As a resul, we may be able o use a raional polynomial represenaion for he ransfer funcion. If we use a linear ransfer funcion form, we will need o include many lags. The above speculaion regarding he TF weighs is only valid if he esimaed weighs are o some degree correc. One means o assess he validiy of he fi is o compue he ACF of he residuals from his fi. I is imporan o noe ha he residuals are disinc from he esimaed disurbance. The esimaed disurbance is wha is lef over afer accouning for a consan erm and ransfer funcion componens. The residual series represens he error remaining afer accouning for all componens of he model. We have he following -->ACF RES. MAXLAG IS 12. TIME PERIOD ANALYZED TO 126 NAME OF THE SERIES RES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN

270 TRANSFER FUNCTION MODELING 8.25 T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I I IX IXX XI IXXX IXXX XXI IXX XXI IXX IX IX + The above ACF indicaes ha he residuals are consonan wih whie noise. Hence he esimaed ransfer funcion weighs may be used o idenify he form of he ransfer funcion. We will do his in Secion Before we do ha we will obain a model for he disurbance erm Obaining a model for he disurbance erm So far, we have used an AR(1) approximaion of he disurbance erm, N. Alhough he residual series of our las fi appear o be whie noise and he esimae of he AR parameer is saisically significan, we may no have he mos appropriae model for N. W e can now use he esimaed disurbance series, mainained in he variable NT, o deermine an ARIMA model for N. If we compue he ACF and PACF (no shown here), we will see ha boh he ACF and he PACF cu-off afer lag 1. Due o relaively small ma gniude of he value of he lag 1 ACF and lag 1 PACF, we may conclude we can represen N eiher as an AR(1) or MA(1) process. However, if we compue he EACF of we obain TIME PERIOD ANALYZED TO 126 NAME OF THE SERIES NT EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) THE EXTENDED ACF TABLE (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) N

271 8.26 TRANSFER FUNCTION MODELING (P= 6) SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X O O O O O O O O O O O O (P= 1) O O O O O O O O O O O O O (P= 2) X X O O O O O O O O O O O (P= 3) X X O O O O O O O O O O O (P= 4) X O X O O O O O O O O O O (P= 5) X X X O X O O O O O O O O (P= 6) X O O O X O O O O O O O O The EACF suppors an MA(1) process. As a resul, we will use he MA(1) represenaion in he remainder of his analysis Obaining a model for he ransfer funcion As noed in Secion 8.4.2, if we wish o use he linear form of he ransfer funcion, hen we will need a relaively large number of erms, beginning wih lag 3. However, a decay paern appears o be presen in he TF weighs. As a resul, we may consider using δ (B) = 1 δb in he denominaor of he raional polynomial o represen his decay. There are many significan ransfer funcion weighs beginning a lag 3. I may no be clear how many erms o include in ω (B). For example, if we use only one erm in ω (B) we have 3 ω (B) =ω B, and δ (B) = 1 δb. (8.26) 3 In his case, we have a form of he Koyck (1954) disribued lag model (see Secion and Secion 4.2 of Pankraz 1991) wih a hree period delay. Ofen a visual inspecion of he esimaed TF weighs is sufficien o deermine a reasonable and parsimonious represenaion for he ransfer funcion, v(b). Any delay in he process can be seen in any iniial esimaed weighs ha are saisically indisinguishable from 0. If here is a cu-off paern in he esimaed weighs, hen we are well served by using he linear ransfer funcion form for v(b). Tha is, we can use v(b) = V(B), where V(B) is comprised of only he significan erms ha have been esimaed. If he esimaed weighs have a die-ou paern, hen we may be well served by using he raional polynomial represenaion for v(b). In some cases i may be relaively easy o deermine appropriae forms for ω (B) and δ (B). However, ofen i is difficul o read a paern in he weighs. In such cases, he corner mehod proposed by Liu and Hanssens (1982) can be used o deermine he orders of hese operaors.

272 TRANSFER FUNCTION MODELING 8.27 The corner mehod When a se of esimaed TF weighs exhibis a die-ou paern, we can use he corner mehod o idenify he orders in a corresponding raional ransfer funcion, ω(b) / δ (B). The mehod employs a corner able ha we will now describe. The corner able proposed by Liu and Hanssens (1982) consiss of deerminans of marices composed of he TF weighs. For he row f (f = 0, 1, 2,... ) and column g (g = 1, 2, 3,... ) he value in posiion (f,g) is he deerminan of a marix using vf g + 1hrough v f+ g 1. The specific form of his marix can be found in Liu and Hanssens (1982) or Appendix 5A of Pankraz (1991). If he orders associaed wih ω (B) and δ (B) are b, s, and r (as defined in equaions (8.13) and (8.14) ), hen he corner able has he following paern: b s f g 1 2 r r+1 r b b x x x x x s+b-1 x x x x x s+b x x x 0 0 s+b+1 x x x r The symbol x denoes a erm ha may be differen from 0, while he symbol 0 denoes a erm ha is no significanly differen from 0. Noe ha in he above able, he elemens in he firs b rows and in he lower righ-hand corner (beginning a he row labeled s+b and column r+1) are all zeros. The CORNER paragraph produces a able of values (no symbols). The values are normalized so ha he larges value of he firs column is In pracice, he esimaed values of he TF weighs are subjec o random error. As a resul, we will usually find some

273 8.28 TRANSFER FUNCTION MODELING small values insead of he indicaed zero values. However, we will noe eiher sudden increases in values (in going from he row labeled b-1 o he row labeled b) or sudden decreases (in going ino he lower righ-hand corner). Furher, because of sampling flucuaions, he corner able may no have a clear cu paern. However, we may sill be able o deermine some good candidaes for b, s and r. We should always apply he principal of parsimony in such cases and ry o rely on a small number of parameers in whaever models we deermine from he able. I is useful o noe ha in pracice i is ypically he case ha he order of δ(b) (i.e., he r value) is seldom greaer han 1. To illusrae he use of he corner able in he SCA Sysem, we will consruc a able from he esimaed weighs V0 hrough V10. The CORNER paragraph will consruc and display he able. In he CORNER paragraph, he TF weighs need o be he values of a single variable. We can append he esimaes V0 hrough V10 ogeher using he JOIN paragraph (see Appendix B) and hen reques a corner able by sequenially enering he following commands (SCA oupu is edied for presenaion purposes, and lines are superimposed in he corner able displayed): -->JOIN OLD ARE V0 TO V10. NEW IS TFWEIGHTS. -->CORNER TFWEIGHTS CORNER TABLE FOR THE TRANSFER FUNCTION WEIGHTS IN TFWEIGHT NOTE: "*****" (IF ANY) MEANS THAT THE ENTRY CANNOT BE COMPUTED We observe hree rows of zero values, indicaing a delay of b=3. The row of non-zero values begin in he row labeled 3. A corner begins in he row labeled 4 and column labeled 2. As a resul, he value of r is 1, and he value of s is 4-3 = 1. Hence he operaors in he raional polynomial represenaion are 3 ω (B) =ω B and δ (B) = 1 δ B. These operaors are he same as hose in (8.26). Moreover, V3 provides an iniial esimae for he parameer ω Specifying and esimaing he idenified model In Secions and above, we have idenified he following model 3 ωb (1 B)SALES = C + (1 B)LEADING + (1 θb)a 1 δb. (8.27)

274 TRANSFER FUNCTION MODELING 8.29 We have reasonable iniial esimaes of C and ω in CNST and V3, respecively. We can specify (8.27) in a sraighforward manner (and uilize he esimaes obained previously) by enering -->TSMODEL SALESMDL. MODEL IS SALES(1) = CNST + --> (V3*B**3)/(1-D1*B)LEADING(1) + (1-THETA*B)NOISE. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE THETA SALES MA 1 1 NONE.1000 We can esimae his model using he EXACT mehod for θ (and reain he residual and esimaed disurbance erms for diagnosic checking purposes) by enering he following commands (SCA oupu is edied, and only he final esimaion summary is provided): -->ESTIM SALESMDL. -->ESTIM SALESMDL. METHOD IS EXACT. HOLD RESIDUALS(RES). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE THETA SALES MA 1 1 NONE TOTAL SUM OF SQUARES E+05 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+01 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-01 RESIDUAL STANDARD ERROR E+00

275 8.30 TRANSFER FUNCTION MODELING Diagnosic checks of a ransfer funcion model The fied values for ω0 and δ are consisen wih our prior esimaes or conjecures. In addiion he esimae of θ is highly significan. A his ime, we need o diagnosically check our model. We do his in much he same manner as for an ARIMA model (see Secion 5.1.5). Our wo basic concerns remain he same as before. Tha is, (1) Is he model saisically consonan wih our assumpions, and (2) Does he model make sense? Since we have more model assumpions han in he ARIMA case, our checks relaing o (1) increase (as discussed below). Moreover, since we are using more variables in our model, i is also useful o consider (3) Does our model perform beer han eiher a simple ARIMA model or oher simple alernaive models? The checks under (2) and (3) relae o model inerpreaion and model performance. They may no relae direcly o he more basic checks for adherence o model assumpions, bu hey can be imporan when more han one model are considered for a problem. Ofen he checks employed here relae o specific concern(s) of he praciioner. If inferences based on he srucure of he process are imporan, hen appropriae checks may include he signs and magniudes of esimaes, or how well a model adheres o known or assumed axioms ha apply o he problem a hand. If forecasing is a concern, hen pos-sample forecass may be made from various models. A pos-sample check can be conduced when we wihhold a porion of daa (a he end of he series) from modeling, hen examine how well a model forecass hese values. Regardless how we rea (2) and (3) above, we mus be concerned wih how well a model adheres o he assumpions of he model. As in he case of ARIMA models, we employ wo basic ools for his purpose: (a) Visual inspecion of residuals (i.e., plos of he residuals), and (b) Checks for correlaion. In a diagnosic check of an ARIMA model, he only check for correlaion was he ACF of he residual series. This again is an imporan check of a ransfer funcion model. In addiion, we need o check for he presence of correlaion beween our explanaory variables and he residual series. This check is necessary due o our assumpion of independence beween he explanaory variables and he disurbance. The cross correlaion funcion is used as a check here. I is discussed in more deail below. Anoher naural check (relaed o (a) above) is a check for ouliers ha may have affeced he form of our model or biased he esimaes. Oher diagnosic checks are discussed in Secion 11.3 of Box and Jenkins (1970) and Chaper 6 of Pankraz (1991).

276 TRANSFER FUNCTION MODELING 8.31 For he curren example, we will focus on he following hree imporan checks: a visual inspecion of he residuals, checks of correlaion, and an oulier check. A plo of he residual series for our curren model is shown in Figure 8.3. No obvious paern, nor spurious observaion, is readily apparen. Figure 8.3 Time plo of residuals from ransfer funcion model for SALES Correlaion funcions involving he residual series The ACF of he residuals is compued and displayed below o examine if here is any overall inadequacy of he ransfer funcion model. Since all sample auocorrelaions are wihin a 95% confidence limi of zero, his par of diagnosic checking reveals no model inadequacy. -->ACF VARIABLE IS RES. MAXLAG IS 12. TIME PERIOD ANALYZED TO 126 NAME OF THE SERIES RES EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I I I XXI XI IXXX IXX XXXI XI XXXI +

277 8.32 TRANSFER FUNCTION MODELING IXX I XI + The ACF provides a measure of our currenly observed values (or residuals) of a ime series are relaed o values a prior ime periods (lags). We can also consruc a measure of associaion beween he currenly observed values (or residuals) of one series wih he values of anoher series a curren and prior ime periods. One such measure is he cross correlaion funcion (CCF). For an ineger l, he lag l cross correlaion beween Y and X is he correlaion beween Y and. X l According o he definiion of he CCF, i should be immediaely apparen ha here is a difference beween he lag l cross correlaion beween X and Y and he lag l cross correlaion beween Y and X. For posiive values of l, he lag l cross correlaion beween Y and X is a measure of how he series X is a leading indicaor for while he lag l cross correlaion beween and Y, X Y is a measure of how he series Y is a leading indicaor for X. We may noe ha he lag l cross correlaion beween Y and X is he same as he lag -l cross correlaion beween X and Y. Hence we can compue he CCF beween wo series for boh posiive and negaive lags. The difference is in wha is perceived o be he leading and lagging series. When he compuaion of he CCF for wo variables is requesed, he SCA Sysem compues boh he lag -l and lag l values of he CCF. Since he (saionary) inpu variables of a ransfer funcion are assumed o be independen of he disurbance, he CCF beween such a series and a should have no significan values. This provides us wih a diagnosic check of our fied model. If he model is adequae, hen here should be no significan cross correlaions beween an inpu series and he residuals, excep for hose aribuable o sampling variaion. If significan cross correlaions are found, especially a low lags, hen we have an indicaion of an inadequae model. In pracice, we compue he CCF beween he residuals of he ransfer funcion model (sored here in RES) and he residuals of he ARIMA model of an inpu series. In his way we are cerain of compuing he CCF beween wo (assumed) saionary series. As noed in Secion 8.4.1, an ARIMA model was fi for LEADING. The residuals of his model were sored in RESLEAD. We can compue he CCF beween RES and RESLEAD by enering -->CCF RES, RESLEAD. MAXLAG IS 12. TIME PERIOD ANALYZED TO 126 NAMES OF THE SERIES RES RESLEAD EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) CORRELATION BETWEEN RESLEAD AND RES IS -.09 CROSS CORRELATION BETWEEN RES(T) AND RESLEAD(T-L) ST.E

278 TRANSFER FUNCTION MODELING 8.33 CROSS CORRELATION BETWEEN RESLEAD(T) AND RES(T-L) ST.E I IX XXXI XXXI IXX XXXI IXXX XXXI XXI I I XXI I XXI XXXI I IX IXX XXXXI I XXXXI IXXXXX I I I IX + Noe we obain summary informaion on boh series, RES and RESLEAD, he lag 0 correlaion (i.e., a measure of any conemporaneous associaion), and he lagged correlaions when RESLEAD leads RES and when RES leads RESLEAD. The CCF gives no reason o doub he adequacy of he model. We can obain he ACF and CCF simulaneously by compuing cross correlaion marices (CCM). The CCM paragraph beween residual series produces a sequences of such marices. The diagonal elemens are he values of he ACF of each series and he off-diagonal elemens of hese marices are he values of he CCF (presened according o which series leads he oher). We expec all values of hese marices o be insignifican. Addiional informaion concerning he CCM paragraph may be found in Liu e al (1986). Oulier deecion and esimaion Anoher valuable diagnosic ool is a check for ouliers in he model. As noed in Chaper 7, ouliers can have an imporan effec in an analysis. We should be aware of any ouliers, and ake appropriae acions. If we desire, we can use he OESTIM paragraph in lieu of he ESTIM paragraph in he fiing of our ransfer funcion models. If he OESTIM paragraph is used, hen he SCA Sysem will auomaically check for ouliers and hen esimae heir effecs joinly wih he parameers of he model. If he OESTIM paragraph is

279 8.34 TRANSFER FUNCTION MODELING used o esimae (8.27), we obain he following (SCA oupu is edied for presenaion purposes): -->OESTIM SALESMDL. -->OESTIM SALESMDL. METHOD IS EXACT. HOLD RESIDUALS(RES). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE THETA SALES MA 1 1 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E+00 RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E+00 An addiive oulier is deeced a =92. Since here is only one oulier and is effec is no large, we obain essenially he same parameer esimaes as before. We can also use he OFILTER or OUTLIER paragraph o deec ouliers in a fied model. If we use he OFILTER paragraph for he above fied model, we obain he same oulier (and he same effec) as shown above. However, if we use he OUTLIER paragraph afer he ESTIM paragraph, we do no deec any oulier if only AO and IO are considered, and obain he following resul if AO, IO, and LS are considered. -->OUTLIER SALESMDL. TYPES ARE AO,IO,LS. INITIAL RESIDUAL STANDARD ERROR = TIME ESTIMATE T-VALUE TYPE LS ADJUSTED RESIDUAL STANDARD ERROR =.23197

280 TRANSFER FUNCTION MODELING 8.35 The discrepancies occur because he OESTIM and OFILTER paragraphs use a more elaborae algorihm han he OUTLIER paragraph, and he oulier a =92 is only marginally greaer han 3. The level shif a =10 deeced by he above OUTLIER paragraph is no reliable since i is oo close o he beginning of he series. When ouliers have large effecs, he OUTLIER paragraph usually produces similar resuls o hose of he OESTIM and OFILTER paragraphs. When he resuls are differen, hose obained from he OESTIM and OFILTER paragraphs are usually more reliable Forecasing from a ransfer funcion model Our curren esimaed model appears o be adequae, we may now wish o use i for forecasing. In he case of an ARIMA model, we are able o use he FORECAST paragraph direcly for his purpose since only one variable is involved. In he case of an inervenion model, we can also use he FORECAST paragraph direcly assuming ha informaion is provided for all necessary binary (inervenion) series. As in he case of an inervenion model, he forecass of he oupu (response) variable are dependen on he forecass (or known values) of any inpu variables. In our curren example, we have reained he las 24 observaions of boh series for he purpose a pos-sample check of forecass for SALES from he ARIMA model alone and from he ransfer funcion model. This is presened in Secion In he remainder of his secion we will preend ha here are only 126 observaions for each series. In order o obain forecass of SALES for his case, we also mus obain forecass for our inpu variable, LEADING. Values forecased for LEADING will be used o forecas SALES according o he esimaed ransfer funcion model. In order o forecas LEADING, we need o consruc an ARIMA model for i. This was done in he preliminary modeling phase of our ransfer funcion model building process (see Secion 8.4.1). I was found ha LEADING was well represened as an ARIMA(0,1,1) process. The model informaion for LEADING is sored under he label LEADMDL. We can use he SFORECAST paragraph o produce forecass from boh he model LEADMDL and SALESMDL. The SFORECAST paragraph (for he compuaion of forecass from a simulaneous ransfer funcion model) is discussed in Liu e al (1986). Here we will use he FORECAST paragraph wice in order o forecas SALES. Firs, we will forecas 24 values from he end of he LEADING series. Since we will use hese values in he forecas of SALES, we will append he forecased values o he end of LEADING. We can accomplish his by enering -->FORECAST LEADMDL. NOFS IS 24. JOIN. NOTE: THE EXACT METHOD FOR COMPUTING RESIDUALS IS USED FORECASTS, BEGINNING AT

281 8.36 TRANSFER FUNCTION MODELING TIME FORECAST STD. ERROR ACTUAL IF KNOWN Now ha we have compued forecass for LEADING, we can forecas SALES using he model SALESMDL by enering -->FORECAST SALESMDL. IARIMA IS LEADING(LEADMDL). NOFS IS 24. NOTE: THE EXACT METHOD FOR COMPUTING RESIDUALS IS USED FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN

282 TRANSFER FUNCTION MODELING The IARIMA senence is used o specify he name of he ARIMA model associaed wih each inpu series. Here we specify ha he name of he ARIMA model associaed wih LEADING is LEADMDL. Since we have already appended forecased values o LEADING, i may appear ha he IARIMA senence is redundan. This is no he case for wo imporan reasons. Firs, we are able o disinguish sochasic series from deerminisic series (since we can also incorporae inervenions ino our ransfer funcion model if we so desire). If we do no specify an ARIMA model for an inpu series hrough he IARIMA senence, hen ha series will be reaed as a deerminisic series. A second reason for he use of IARIMA is o provide he SCA Sysem wih necessary informaion for he compuaion of he sandard errors of he forecass. These sandard errors will depend on he ransfer funcion model, is residual sandard error, and he residual sandard error of each ARIMA model of he inpu series. The values of SALES, is forecass and sandard error limis are displayed in Figure 8.4. Figure 8.4 Forecas plo of SALES from a ransfer funcion model Comparing forecass of SALES from an ARIMA and ransfer funcion model We have consruced wo models for he series SALES. One is an ARIMA(0,1,1) model and he oher is he ransfer funcion model obained above. I is useful o compare he forecasing performance of hese wo models. Since we have reserved he las 24 observaions of SALES, we may conduc a pos-sample check of forecasing performance. In Table 8.2, we lis he one-sep ahead forecass made from origins 126 hrough 149 obained for SALES using boh he ARIMA(0,1,1) model and he ransfer funcion model. A plo of hese forecass, ogeher wih he acual values of SALES in he period, is given in Figure 8.5. The one-sep-ahead forecass of SALES using he ARIMA(0,1,1) model may be obained by enering -->FORECAST SALESM. NOFS IS 1. ORIGINS ARE 126 TO 149.

283 8.38 TRANSFER FUNCTION MODELING The oupu from his paragraph is suppressed. Similarly, we can use eiher he SFORECAST paragraph or sequenially obain one-sep-ahead forecass of LEADING and SALES using he ransfer funcion SALESMDL from origins 126 hough 149. The SCA commands and oupu for he laer are no presened here. From Table 8.2 and Figure 8.5, i is immediaely eviden ha he ransfer funcion forecass beer rack he daa as compared o he ARIMA forecass. The beer ransfer funcion forecass occur because he auxiliary informaion enables he model o beer anicipae he movemen of sales. Since he univariae model for sales has no leading indicaor, i canno anicipae is own changes, hence is forecass amoun o a reflecion of he amoun of sales in he prior hisorical daa. Table 8.2 Comparison of one-sep-ahead forecass of SALES using differen mehods Time Index Acual Sales ARIMA forecas Forecas error T.F. forecas Forecas error Roo mean squared error = Maximum absolue error = Roo mean squared error = Maximum absolue error = 0.33

284 TRANSFER FUNCTION MODELING 8.39 Figure 8.5 Comparison of one-sep-ahead forecass of SALES using differen mehods 8.5 Transfer Funcion Idenificaion Wih Several Explanaory Variables A modeling sraegy was presened in Secion 8.3 and illusraed for he case of a singleinpu ransfer funcion. In he case of a muliple-inpu ransfer funcion, we assume a model of he general form where series Y, (8.28) = C +ν 1(B)X 1 +ν 2(B)X 2 + +ν m(b)xm + N v i(b) X and i is he ransfer funcion (eiher in linear or raional polynomial form) for he inpu N is he disurbance. The general modeling sraegy for muliple-inpus is he same as ha of a single-inpu. In paricular, he preliminary invesigaion, esimaion, diagnosic checking and forecasing porions of he process are exacly he same as ha oulined in Secions 8.3 and 8.4. The LTF mehod for model idenificaion for wo inpus was illusraed in Secion In addiion o he assumpion of uni-direcional relaionships, a basic assumpion of ransfer funcion models is ha all inpu series are independen of he disurbance erm. However, he inpu series (explanaory variables) hemselves may be correlaed. If he inpu

285 8.40 TRANSFER FUNCTION MODELING series are uncorrelaed wih one anoher, hen he esimaed TF weighs for each inpu series will be virually he same if hey are obained separaely or joinly (using eiher he LTF mehod employed in Secion or he CCF mehod described in Secion 8.7.1). If he inpu series are correlaed, hen model idenificaion can become more complicaed. The CCF mehod is exremely difficul o use in he muliple-inpu case when he inpu series are correlaed. The LTF mehod can sill be applied, bu some care should be aken. The number of ransfer funcion weighs esimaed for each series should no be oo large as hese esimaes can be highly correlaed. Also, i is a good idea o delee insignifican erms or inpu variables whenever possible. Alering componens of a ransfer funcion model is discussed in Secion Adjusmens for Trading Days and Calendar Variaion In his secion, we demonsrae an applicaion of a ransfer funcion model. Many economic and business daa are compiled and repored on a monhly basis. Such daa are ofen subjeced o variaion relaed o he composiion of he calendar, as well as he occurrence of radiional fesivals or holidays. The firs phenomenon is known as rading day variaion (Young, 1965). This ype of variaion arises because he aciviy of a ime series varies wih he days of he week. Examples of such monhly series are reail and wholesale sales, and elephone or raffic volumes. The second phenomenon, a holiday effec, occurs because consumer behavior paerns and business aciviies vary depending upon wheher a paricular monh conains a specific holiday or no. Some radiional holidays (e.g., Easer, Chinese New Year and Passover) are se according o lunar calendars and heir occurrences ypically vary beween wo adjacen monhs from year o year. Informaion o adjus for hese effecs mus be incorporaed in he model. Oher fixed dae holiday effecs (e.g., Chrismas and New Year's) can be accouned for in a model wih he inclusion of a seasonal componen. If calendar variaion is no considered in he modeling process, unsaisfacory resuls may occur. A number of auhors including Hillmer, Bell and Tiao (1981), Hillmer (1982), Cleveland and Grupe (1983), Salinas (1983), Bell and Hillmer (1983), and Salinas and Hillmer (1987b) have proposed simple mehods o accoun for rading day variaion in ARIMA modeling. Liu (1980, 1986) suggesed modificaions of ARIMA models o accoun for calendar variaion and recommended he LTF mehod for model idenificaion. The following model has been employed for a ime series, subjec o calendar variaion: Y (possibly ransformed), Y, (8.29) = f( ω,x ) + N % % where f is a funcion of ω, a vecor of parameers, and X, a vecor of fixed independen % %of variables observed a im e, and N is he disurbance erm he model. We can see ha he form of (8.29) is similar o ha of a ransfer funcion model (depending on he funcional form of f), excep he funcion is specified raher han idenified.

286 TRANSFER FUNCTION MODELING 8.41 Trading Day Effecs Trading day effecs can be handled in a sraighforward manner. If we le W i, i = 1, 2,..., 7, represen he number of imes he day i occurs in monh, hen he funcion f can be wrien as f( ω,x. (8.30) ) =ω 1W1 +ω ww2 + +ω7w7 % % If we subsiue (8.30) ino (8.29), we see ha we have a ransfer funcion model in he form of a regression wih serially correlaed error erms (see Secion 8.1). I has been shown ha he above represenaion can resul in mulicollineariy and a ransformaion of he values Wi should be used (Hillmer, 1982 or Bell and Hillmer, 1983). One useful ransformaion is Di = Wi W7 In his ransformaion, i = 1, 2,..., 6,. (i = 1, 2,..., 6) reflecs he number of occurrences of a day of a week relaive o he number of Sundays in he monh, while D7 reflecs he oal number of days in he monh. Furher discussions regarding he parameers associaed wih hese erms can be found in Hillmer (1982), Bell and Hillmer (1983), and Liu (1986). Holiday Effecs D7 = W1 + W2 + + W7 D i If he effec due o a specific holiday is relaively consan over he years, hen he funcion f can be represened as f( ω,x, (8.31) ) =ω1h1 % % where H 1 represens he proporion of he holiday in he -h monh. If he holiday effec increases or decreases linearly over ime, hen f( ω,x, (8.32) ) =ω 1H1 +ω2h2 % % where H2 = H 1 *K. K is 1 for observaions in he firs year, 2 for observaions in he second year, and so on. Again, he use of eiher (8.31) or (8.32) in (8.29) is a represenaion of regression wih correlaed errors. Example: Monhly Ouward Saion Movemens To illusrae he incorporaion of rading days ino an ARIMA model, we consider he monhly ouward saion movemens (i.e., disconnecions) of he Wisconsin Telephone Company from January 1951 hrough Ocober The daa are lised in Table 8.3 and a plo of he daa is given in Figure 8.6. The series was sudied by Thompson and Tiao (1971) and Liu (1986). The daa are sored in he SCA workspace under he name CALLOUT.

287 8.42 TRANSFER FUNCTION MODELING Table 8.3 Monhly ouward saion movemens of he Wisconsin Telephone Company (January Ocober 1968) (Read daa across a line) Figure 8.6 Monhly ouward saion movemens of he Wisconsin Telephone Company (January Ocober 1968) All bu he las wo years of daa are used for modeling. The las wo years of daa are reserved for evaluaion of forecasing performance. Thompson and Tiao (1971) analyzed he naural logarihm of he daa in order o obain a more homogeneous variance. We can use an analyic saemen (see Appendix A) o ransform he daa (no shown here). The logged daa are sored in LNCALL. The ACF of LNCALL for he firs 190 observaions depics nonsaionary behavior (oupu no shown). We now consider he ACF using boh firs and welfh differencing for he firs 190 observaions. SCA oupu shown below is edied.

288 TRANSFER FUNCTION MODELING >ACF LNCALL. DFORDER IS 1, 12. SPAN IS 1, DIFFERENCE ORDERS (1-B ) (1-B ) TIME PERIOD ANALYZED TO 190 NAME OF THE SERIES LNCALL EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) I XXXXXXXXXX+XXXI XXXI IXXXX+XXXXXX XXXX+XXXXI IX IXXXXXX XXXXXXI I IXXXXX+XX XXXX+XXXXXI IXXXXX+X I X+XXXXXXI IXXXXXX+XX XXXI XXXI IXXXXX XI XXXXI IXXXXXX XXI XXXXXI IXXXXXXX+XX XXXXXXXI XI IXXXXXXX XXXXXXI IX IXXXX XXXXI I IXXXXXX XXXXXXI IXX IXXXX XXXXXXXXI + The differencing operaors appear o achieve saionariy, bu he paern of he ACF is confusing. The final model deermined and fi by Thompson and Tiao (1971) had he form (1 φ. (8.33) 1B )(1 φ 2B )Y = (1 θ1b θ2b θ3b )a This model is no easy o inerpre. Thompson and Tiao (1971) sugges ha φ1 and θ1 may be due o he accouning procedure adoped by he Wisconsin Telephone Company. However, hey also remark ha an analys of Bell Canada hough ha hese may be he resul of he variaion in he number of working days in he monhs covered by he daa. As a

289 8.44 TRANSFER FUNCTION MODELING resul, i may be informaive o model LNCALL in he presence of possible rading days (i.e., working days). We will work wih he model of equaion (8.29) wih funcional represenaion given in (8.30) and he ransformed number of rading days per monh. We can generae he necessary rading day informaion hrough he DAYS paragraph (see Appendix C.1.1) by enering -->DAYS VARIABLES ARE D1 TO D7. BEGIN 1951, 1. END 1968,10. TRANSFORM. The rading day informaion is sored in he SCA workspace in he variables D1 hrough D7 (he SCA oupu o he above command is no shown here). Because of our prior knowledge regarding differencing, we posulae ha our ransfer funcion model has he form (1 B)(1 B )Y. (8.34) = (1 B)(1 B )( ω 1D1 +ω 2D 2 + +ω 7D 7) + N Our model idenificaion procedure is an applicaion of he LTF mehod for he case of muliple-inpus. Here i is reasonable o assume he TF weighs for each inpu involve only he conemporaneous erm. Hence he purpose of using he LTF mehod is o verify he exisence of he rading days effecs and o deermine a model for N. We have no direc knowledge of he model o use for N. Following he LTF mehod oulined in Secion 8.3.2, we will esimae (8.34) and iniially approximae N wih a muliplicaive AR(1) and AR(12) model. We can hen examine he esimaed disurbance erm o consruc an ARIMA model for N. We may proceed wih he following SCA commands (alhough he oupu from hese commands is suppressed for presenaion purposes) -->TSMODEL CALLMDL. MODEL IS LNCALL(1,12) = (0)D1(1,12) + (0)D2(1,12) --> + (0)D3(1,12) + (0)D4(1,12) + (0)D5(1,12) + (0)D6(1,12) --> + (0)D7(1,12) + 1/(1-PHI1*B)(1-PHI2*B**12)NOISE. -->ESTIM CALLMDL. SPAN IS 1,190. HOLD RESIDUALS(RES), DISTURBANCE(NT). The model specificaion used in he TSMODEL paragraph above uses a shorhand noaion (see Secion 8.7.6). The ACF of RES (no shown) is no clean, bu no severe anomalies are found. Hence we have some confidence in he esimaed TF weighs for each inpu series. We now examine he ACF of he esimaed disurbance series (held in he SCA workspace under he name NT). The SCA oupu has been edied slighly. -->ACF NT TIME PERIOD ANALYZED TO 190 NAME OF THE SERIES NT EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO)

290 TRANSFER FUNCTION MODELING I XXXXXXXX+XXXI I IXX XXI IX XI IX XXI IXX XXI IXXXX+XXX XXXXX+XXXXI IXX IXXX XXXXI IXXX XI IX I I XI IXXX XXXI I IXXX XXXXXI IXXXX XXI I I I IX I XXI IXXX I + The above ACF is ha of he classic airline model (see Secion 5.3). Hence we should model N wih muliplicaive MA(1) and MA(12) facors. We will now use he TSMODEL o change our NOISE componen, hen esimae he model. Since we have MA parameers, we will firs use he condiional likelihood algorihm, hen he exac mehod. We show all SCA commands below, bu only provide he resuls from he final fied model. -->TSMODEL CALLMDL. CHANGE (1-THETA1*B)(1-THETA2*B**12)NOISE. -->ESTIM CALLMDL. SPAN IS 1, >ESTIM CALLMDL. SPAN IS 1,190. METHOD IS EXACT. HOLD RESIDUALS(RES). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- CALLMDL PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 D1 NUM. 1 0 NONE D2 NUM. 1 0 NONE D3 NUM. 1 0 NONE D4 NUM. 1 0 NONE

291 8.46 TRANSFER FUNCTION MODELING 5 D5 NUM. 1 0 NONE D6 NUM. 1 0 NONE D7 NUM. 1 0 NONE THETA1 LNCALL MA 1 1 NONE THETA2 LNCALL MA 2 12 NONE TOTAL SUM OF SQUARES E+02 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+00 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-02 RESIDUAL STANDARD ERROR E-01 The diagnosic checks of his model reveal no inadequacies. This fied model is simpler han ha of Thompson and Tiao, and accouns for working days in a monh. To evaluae he forecasing performance of he above model wih ha of Thompson and Tiao, he roo mean squared error (RMSE) of one-sep-ahead forecass during he pos-sample period (November 1966 hrough Ocober 1968) are considered. In order o compue his value, we need o use he FORECAST paragraph o compue 24 one-sep-ahead forecass. We can accomplish his, and reain he necessary forecass in he SCA workspace by enering -->FORECAST CALLMDL. ORIGINS ARE 190 TO 213. NOFS ARE 1 FOR > HOLD FORECASTS(F1 TO F24). We obain he following (SCA oupu is edied) TIME FORECAST STD. ERROR ACTUAL IF KNOWN The RMSE for he above model is , and he RMSE for ha of Thompson and Tiao (1971) is Hence he forecasing performances of he wo models are similar, bu he model wih rading day effecs is simpler and easier o inerpre.

292 TRANSFER FUNCTION MODELING Oher Transfer Funcion Relaed Topics This secion provides an overview of opics relaed o ransfer funcion models and modeling. Much of he maerial presened in his secion can be considered advanced or of occasional use. As a resul, his secion can be skipped, and seleced opics referenced as required. The maerial presened, and he secion conaining i, are: Secion Topic CCF mehod for ransfer funcion idenificaion Deermining wha is wrong in a ransfer funcion model Modifying a ransfer funcion model Consrains on model parameers Esimaions of ransfer funcions conaining a denominaor polynomial Noaional shorhands Simulaion of a ransfer funcion model Compuing he ransfer funcion weighs of a ransfer funcion model The CCF mehod for ransfer funcion idenificaion In Secion we noed ha here are wo disinc procedures for he deerminaion of he TF weighs of an inpu series and he form of he disurbance erm N. We explained he LTF mehod in Secion and used i in he remainder of he secion. Box and Jenkins (1970) proposed a mehod for he single-inpu case. We refer o his procedure he CCF mehod and i is now discussed. This procedure has a number of significan difficulies and should be used wih cauion. However, since i was he only procedure deailed by Box and Jenkins (1970) and has ofen been cied in subsequen exs, i remains a frequenly used mehod. The CCF mehod is based on he cross correlaion funcion (described in Secion 8.4.6). The mehod was employed by Box and Jenkins (1970, page 370) as a means o obain necessary modeling informaion wihou esimaing many parameers. Informaion in his mehod is obained sequenially, raher han joinly as in he LTF mehod. An imporan basis of he CCF mehod is he fac ha if he inpu series is a whie noise process, hen he values of posiive lags of he CCF beween Y and X are proporional o he TF weighs of v(b). Since X is usually no a whie noise process, we need o creae one. X We assume we can represen erm for simpliciy) we have X wih an ARIMA model. Tha is, (ignoring a consan φ. (8.35) x(b)x =θx(b)e

293 8.48 TRANSFER FUNCTION MODELING If we le we have α(b) be a raional polynomial filer where α (B) = { φx(b)/ θ x(b)}, hen from (8.35) φ (B) α = = x (B)X X e θx (B). (8.36) The filer (B) effecively ransforms X o a whie noise process. Suppose we now apply his filer o all series in he ransfer funcion model (again omiing he consan erm for simpliciy) Y. (8.37) = v(b)x + N We obain he following or α α (B)Y. (8.38) = v(b)[ α (B)X ] +α(b)n y, (8.39) = v(b)e + n where y =α(b) Y and n =α(b)n. In equaion (8.39) we have creaed a new ransfer funcion model having he sam e ransfer funcion form as in (8.37) (i.e., v(b) ), bu wih an inpu series ha is approximaely a whie noise process. Hence if we compue he CCF of e and, hen we obain direc informaion on he TF weighs of v(b). If we muliply h e y values of he non-negaive lags of his CCF by he sandard error of resul by he sandard e rror or e (i.e., muliply he CCF values by σ esimaes for he ransfer funcion weighs of v(b). y y and hen divide he / σ ), hen we obain The process of creaing a whie noise series from he inpu series in (8.36) is known as prewhiening. The componen e of (8.39) is referred o as he prewhiened inpu series, and he componen y is referred o as he fied oupu series. Unforunaely, novices o ransfer funcion modeling ofen confuse he series ha is prewhiened wih he series ha is filered. As a resul, someimes i is believed ha boh series represen whie noise processes. In is applicaion, e is he residual series for he ARIMA model buil for he inpu series X. If we replac e X by Y in such an ARIMA model, we will filer Y in he same ma nner as X. The resulan series is used wih he previously obained residual series o compue esimaes of he TF weighs. The filered series is no used hereafer. The esimaed TF weighs are used o deermine a raional polynomial represenaion for v(b). A ransfer funcion model is hen fied using his raional polynomial represenaion and wih N = a. The residuals from his fi are examined in order o deermine a model for N. To illusrae prewhiening and he esimaion of he TF weighs using he CCF mehod, we will consider Series M daa used in Secion 8.4. Sandard ARIMA modeling echniques e

294 TRANSFER FUNCTION MODELING 8.49 indicae ha an ARIMA(0,1,1) model may be appropriae for LEADING. We can specify and esimae his model by sequenial enering (SCA oupu is suppressed or edied) -->TSMODEL LEADMDL. MODEL IS LEADING(1) = (1-TH*B)NOISE. -->ESTIM LEADMDL. HOLD RESIDUALS(RESLEAD). SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- LEADMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH LEADING MA 1 1 NONE TOTAL SUM OF SQUARES E+03 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+02 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E-01 RESIDUAL STANDARD ERROR E+00 The residuals from he above fi are sored in he SCA workspace under he label RESLEAD. We can now filer he oupu variable SALES using he above model, LEADMDL, by enering (SCA oupu is suppressed) -->FILTER LEADMDL. OLD IS SALES. NEW IS FSALES. As a resul of he above command, our filered series is sored in he SCA workspace under he name FSALES. We can compue he cross correlaion funcion for FSALES and RESLEAD by enering -->CCF FSALES, RESLEAD. MAXLAG IS 12. HOLD CCF(VCCF). The HOLD senence is used in order o reain he values of he CCF ha are compued in he SCA workspace. In he above command, we specify ha hese values be sored under he label VCCF. We obain TIME PERIOD ANALYZED TO 126 NAMES OF THE SERIES FSALES RESLEAD EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) CORRELATION BETWEEN RESLEAD AND FSALES IS.09 CROSS CORRELATION BETWEEN FSALES(T) AND RESLEAD(T-L) ST.E

295 8.50 TRANSFER FUNCTION MODELING CROSS CORRELATION BETWEEN RESLEAD(T) AND FSALES(T-L) ST.E I XI XXI XXI XXI IX I XXI IX I IX I IXXX IXX IXX I IXXX+XXXXXXXXXXXXX IXXX+XXXXXXX IXXX+XXXX IXXX+XXX IXXXX+XX IXXXXX IXXXXX IXX IXXXX IX + We noe ha he values of he CCF when FSALES leads RESLEAD (i.e., he negaive lags above) are all saisically indisinguishable from zero. This confirms he validiy of a uni-direcional represenaion for our model. In addiion, he summary informaion of FSALES and RESLEAD provides us wih he sandard error of each series, and , respecively. If we muliply he CCF values by he quoien (2.0464/0.2918), we will have esimaes for he TF weighs. We can use an SCA analyic saemen (see Appendix A) for his purpose. We can hen prin he las 13 values of he resulan variable, as hese are he esimaed values of hrough v. v >WEIGHTS = VCCF*(2.0464/0.2918) -->PRINT WEIGHTS. SPAN IS 13, In Secion we used he LTF mehod o esimae he above values from he model 10 1 (1 B)SALES = C + (v0 + + v10b )(1 B)LEADING + a 1 φ B The esimaes of he 11 TF weighs using his model were

296 TRANSFER FUNCTION MODELING 8.51 The wo ses of esimaes are in good agreemen. We can use he esimaed weighs obained using he CCF mehod in he CORNER paragraph o deermine orders for he raional polynomial represenaion of he ransfer funcion. Again, we only wish o use he las 13 esimaed weighs. The variable conaining he esimaed weighs, here named WEIGHTS, is edied using he SELECT paragraph (see Appendix B) before we consruc a corner able. The SCA oupu ha follows has been edied, and lines have been superimposed on he corner able. -->SELECT WEIGHTS. SPAN IS (13,25). -->CORNER WEIGHTS CORNER TABLE FOR THE TRANSFER FUNCTION WEIGHTS IN WEIGHTS NOTE: "*****" (IF ANY) MEANS THAT THE ENTRY CANNOT BE COMPUTED The above able indicaes ha b=3 and r=s=1, he same as was deermined in Secion The CCF mehod has produced he similar esimaes for he TF weighs associaed wih he inpu series LEADING as he LTF mehod. The effor required o produce hese values using he LTF mehod (see Secion 8.4.2) consised of fiing wo models. The resuls from he firs fied model indicaed he need for differencing (based on he esimae of he AR parameer and he ACF of he esimaed disurbance erm). The second fi (wih differencing) provided us wih more refined esimaes of he ransfer funcion weighs. Moreover, he esimaed disurbance erm from he fi can be immediaely used o deermine an ARIMA model for N. More effor was required for ransfer funcion idenificaion using he CCF mehod. Firs an ARIMA model was consruced and esimaed for he inpu series LEADING. Nex he oupu series SALES was filered by his model. The CCF of he filered oupu and prewhiened inpu series (i.e., he residuals of he ARIMA model for LEADING) was produced. The values of he CCF were hen scaled o obain he esimaed TF weighs. Moreover, he CCF mehod has no ye provided any informaion on N. We sill mus esimae a ransfer funcion o obain a useful series for he idenificaion of a model for N. We have saed ha here are some significan drawbacks wih he CCF mehod, as compared o he LTF mehod, for he idenificaion of a ransfer funcion model. Clearly, he effor required is a drawback of he CCF mehod. Anoher imporan obsacle for he CCF mehod is is sequenial approach. Any missep in he process (e.g., incorrec ARIMA model for he inpu series, inadequae prewhiening of he inpu series, or deermining a less han adequae represenaion of he ransfer funcion) affecs all fuure work. Moreover, and

297 8.52 TRANSFER FUNCTION MODELING perhaps mos imporanly, he CCF mehod canno be exended direcly o he muliple-inpu case. For hese reasons, we recommend he use of he LTF mehod for ransfer funcion modeling Deermining wha is wrong in a ransfer funcion model Some diagnosic checks of an esimaed ransfer funcion model were given in Secion Such checks provide us wih informaion on wheher a fied model is adequae or no. In he even ha our model is no adequae, i is useful o know which componen(s) of he model require correcive acion. In his secion we provide some insigh ino he diagnosic measures ha direc us oward his end. (A) Srucure for he ransfer funcion is correc, bu he srucure for he disurbance erm is no In single-equaion ransfer funcion models, he inpu series is assumed o be independen of he errors of he disurbance erm. As a resul he CCF beween he series should be clean (i.e., insignifican). In he case of he sales daa, we observed ha he CCF beween he residuals of he ARIMA model for LEADING (i.e., RESLEAD) and he residuals of he fied model had no significan values. Suppose he srucure of he ransfer funcion for he inpu series is correc, bu he ARIMA model for N is no correc. In such a case he CCF beween he residuals of he ARIMA model for he inpu series, ê, and he residuals of he ransfer funcion fi, â, will no show significan values. However, he series â will exhibi significan auocorrelaions. We can hen use he esim aed disurbance erm o consruc a more appropriae model for N. (B) Srucure for he ransfer funcion is incorrec If we do no have an adequae represenaion of he ransfer funcion, hen boh he CCF beween ê and â and he ACF ofa will exhibi significan values or sysemaic paerns. This w ill be rue regardless of wheher he ARIMA model for N is correc or no. I may be possible o use he informaion conained in he CCF beween ê and â o correc he deficiency in he model for he ransfer funcion. However, i may be more convenien o re-examine he ransfer funcion weighs and revise he srucure of he ransfer funcion accordingly. More informaion on his can be found in Secion 11.3 of Box and Jenkins (1970) and in Secion 12.4 of Vandaele (1983) Modifying a ransfer funcion model A specified ransfer funcion may be modified in he same manner as an inervenion model (see Secion 6.7.1). Specifically, a model may be modified by adding or deleing inpu series as well as changing he exising ransfer funcions or he disurbance erm. This is

298 TRANSFER FUNCTION MODELING 8.53 accomplished hrough he inclusion of he ADD, CHANGE, or DELETE senence in he TSMODEL paragraph. To illusrae hese capabiliies, suppose ha we have he already specified following modified version of he ransfer funcion model used in his chaper (only a porion of he MODEL senence is given below). SALES(1) = C0 + (V3*B**3)/(1 D1*B)LEADING(1) + (W0 + W1*B)PRICES(1) + (1 TH * B)NOISE (8.40) As in he res of his chaper, we assume he name SALESMDL was used o hold he model. The ADD senence The ADD senence is used in TSMODEL paragraph o modify an exising ransfer funcion model by he addiion of new inpu series. Any new explanaory erm mus be represened compleely. For example, if he componen (WW1*B)(1-B)ORDERS is o be added o SALESMDL, hen he following command suffices TSMODEL SALESMDL. ADD (WW1*B)ORDERS(1) I is imporan ha he labels of parameers used in he ADD senence as well as he label of he inpu series be differen from any labels in he exising model. More han one variable may be added o an exising model by joining each erm wih an addiion symbol (+). The CHANGE senence The CHANGE senence is used in he TSMODEL paragraph o modify operaors of exising componens wihin a ransfer funcion model. In he SALESMDL of (8.40), here are hree componens associaed wih he variable names LEADING, PRICES and NOISE. The change is made by a complee re-specificaion of affeced componens. Hence he senence has a synax similar o ha of ADD senence. For example, if he ARMA operaor of he disurbance in (8.40) is o be changed o {1 /(1 φ B)}a, hen he following TSMODEL paragraph suffices TSMODEL SALESMDL. CHANGE 1/(1-PHI*B)NOISE. I is imporan o emphasize ha only operaors of exising componens of a ransfer funcion model are affeced by he CHANGE senence. As in he ADD senence, if more han one componen are o be changed, hen each componen mus be joined wih an addiion symbol (+). The SCA Sysem will no process a CHANGE senence involving variables no presen in he exising model. The CHANGE senence neiher adds nor delees componens from he model, i only changes exising componens. The CHANGE senence may be used o modify a componen specified in an ADD senence when boh senences are used wihin he same TSMODEL paragraph. In such

299 8.54 TRANSFER FUNCTION MODELING siuaion, he SCA Sysem firs processes he ADD senence and hen he CHANGE senence regardless of he order in which hey are wrien. The DELETE senence The DELETE senence is used in a TSMODEL paragraph o modify an exising ransfer funcion model by deleing specified explanaory variables or he consan erm from he model. The former is accomplished by simply specifying he name(s) of he explanaory variable(s) o be deleed. For example, if he variable PRICES is o be removed from he model SALESMDL, he following command suffices TSMODEL SALESMDL. DELETE PRICES. To delee he consan erm from SALESMDL, we simply ener TSMODEL SALESMDL. DELETE CONSTANT. Here we do no need o ener he variable name, he keyword CONSTANT is recognized as he consan erm. A consan erm can only be added by respecificaion of a model hrough he MODEL senence Consrains on model parameers Consrains on he parameers of a ransfer funcion model are accommodaed in he same manner as in an ARIMA or inervenion model. If we include he FIXED- PARAMETER senence in he TSMODEL paragraph, we can specify he names of parameers ha we wish o remain a heir currenly specified values during esimaion. For example, if we wish o fix he value of δ in he SALESMDL of Secion 8.4 o is mos recenly esimaed value, we should include he senence FIXED-PARAMETER IS D1. in he TSMODEL paragraph. A parameer can be fixed o any value in his manner. This may require he use of an analyic saemen (see Appendix A) o define a value and he use of he logical senence UPDATE wihin he TSMODEL paragraph o clear a model's memory of he parameer value and rese i o anoher. For example, if we wished o mainain he value of D1 as.70 during remaining esimaions, we could sequenially ener -->D1 = >TSMODEL SALESMDL. FIXED-PARAMETER IS D1. UPDATE. In addiion o holding parameer values a fixed levels, we can consrain one or more parameers o be equal o one anoher during esimaion. The CONSTRAINT senence is used for his purpose. For example, if we wish o re-esimae he final fied model held in SALESMDL wih he δ parameer equal o he MA parameer we can ener

300 TRANSFER FUNCTION MODELING >TSMODEL SALESMDL. CONSTRAINT IS (D1, TH). All parameers whose names are specified wihin he same parenheses are held equal during esimaion. More han one se of consrains can be specified, wih commas used o separae ses of parenheses, bu a parameer label can be only specified once. We can also consrain parameers o be held equal o oher parameers during esimaion by using he same label for he parameers. Thus, i is imporan o use differen labels for model parameers if we do no wan o impose an equaliy consrain. Once a consrain is placed on a parameer, eiher fixed a a paricular value or held equal o one or more parameers, he consrain remains in place during all subsequen esimaions. A consrain can only be removed by he re-specifying he model using he MODEL senence of he TSMODEL paragraph Esimaion of ransfer funcions conaining a denominaor polynomial A ransfer funcion can be eiher in linear form, ω (B), or in raional polynomial form, ω(b)/ δ(b). As in he case of inervenion models, special aenion is required in he esimaion of ransfer funcion models in which a denomi naor polynomial (i.e., δ (B) ) is presen. The esimaion procedure used by he SCA Sysem is fairly robus; in ha in mos cases any non-zero iniial esimaes of parameers will lead o he convergence o a final se. However, problems can arise in he case of a ransfer funcion ha conains a denominaor polynomial (e.g., ω/(1 δb) ). In hese cases, i is ofen imporan ha reasonable iniial esimaes of parameers in he numeraor polynomial (i.e., ω (B) ) be provided. If reasonable iniial esimaes are no provided, he esimaion process may resul in an overflow error and cause he esimaion process o erminae. If he LTF mehod is being used for he idenificaion of a ransfer funcion, hen he above problem can be easily avoided. The LTF mehod uses he linear form approximaion, V(B), o obain he esimaes for v 0, v 1,..., v k. If we find ha he raional polynomial form is a preferable way o characerize he ransfer funcion, we can use some of he esimaed TF weighs as iniial esimaes for he parame ers of ω (B). For example, if we wish o use 3 ω (B) = ( ω 0 +ω1b)b, hen we should use he esimae v3 as an iniial esimae of ω 0 and v 4 for ω 1. Noe we are simply maching he lag orders o deermine our iniial esimaes. W e used his procedure when modeling he SALES daa in Secion 8.4. If he CCF mehod is used, hen we should scale he values of he CCF (as done in Secion 8.7.1) and hen mach lag orders as done above.

301 8.56 TRANSFER FUNCTION MODELING Noaional shorhands The noaional shorhand available for ARIMA model specificaion (see Secion 5.4.5) exends o ransfer funcion model specificaion as well. To illusrae his shorhand, consider he ransfer funcion model (1 B)(1 B )Y = C + ( ω 0 +ω 1B +ω 2B +ω3b )(1 B)(1 B )X. (8.41) 12 + (1 θ B)(1 θ B )a Suppose he names of he series Y and longhand ranscripion of (8.43) may be 1 12 X are YDATA and XDATA, respecively. A YDATA((1-B)(1-B**12)) = CONST + (W0 + W1*B + W2*B**2 + W3*B**3)XDATA((1-B)(1-B**12)) + (1-THETA1*B)(1-THETA2*B**12)NOISE. (8.42) The basic informaion used by he SCA Sysem from (8.42) are he orders of he backshif operaors in each differencing, numeraor, denominaor auoregressive or moving average operaor and he labels associaed wih all parameers. In fac, he labels are no essenial unless we wish o mainain parameer esimaes wihin variables or if consrains are used on parameers. Operaors having parameers can also be specified using he form (orders of backshif operaors; parameer values or labels) The porion parameer values or labels allows for eiher specific numeric values or labels of variables holding he iniial esimaes. This porion is opional if we only wish o specify he orders of he backshif operaors. As a resul, he following are all equivalen o (8.42) provided all parameers are esimaed wihou consrain (see Secion 8.7.4) YDATA(1,12) = CONST + (0,1,2,3; W0, W1, W2, W3)XDATA(1,12) + (1; THETA1)(12; THETA2)NOISE. YDATA(1,12) = CONST + (0 TO 3; W0 TO W3)XDATA(1,12) (1 - THETA*B)(12; THETA2)NOISE YDATA(1,12) = CONST + (0 TO 3; W0 TO W3)XDATA(1,12) + (1)(12)NOISE YDATA(1,12) = CONST + (0 TO 3)XDATA(1, 12) + (1)(12)NOISE Noe ha we are also able o mix noaional specificaions, depending on which form is mos convenien.

302 TRANSFER FUNCTION MODELING Simulaion of a ransfer funcion model The SIMULATE paragraph may be used o simulae an ARIMA model or a ransfer funcion model. The simulaion of an ARIMA model and deails regarding he use of he SIMULATE paragraph were presened in Secion The use of he SIMULATE paragraph for he simulaion of a ransfer funcion model is idenical as is use for he simulaion of an ARIMA model, excep for he presence of inpu series. The SIMULATE paragraph will firs generae a noise sequence using a pseudo random number generaor. This sequence is hen used according o a ransfer funcion model specified previously using he TSMODEL paragraph. In he case of he simulaion of a ransfer funcion model, he daa for all inpu series mus already be presen in he SCA workspace when he SIMULATE paragraph is execued. Hence he daa of he inpu series mus be provided in some fashion. An inpu series can be one ha has been ransmied previously, or have been simulaed from a previous use of he SIMULATE paragraph. Recall ha he logical senence SIMULATION mus be included in he TSMODEL paragraph ha specifies he model o be simulaed. In addiion, we mus be cerain ha each inpu series of he model boh exiss and has enough daa for he specified simulaion. To illusrae he simulaion of a ransfer funcion, we will simulae an inpu series and an oupu series. Specifically, we will simulae X and Y so ha and (1 0.6B)X = e, (8.43) 0.4B Y = X + (1 0.75B)a 1 0.8B, (8.44) wih σ e = 2.5 and σ a = 1.5. We will simulae 200 observaions for X and Y and sore he daa in XDATA and YDATA, respecively. We can specify he above models by enering: -->TSMODEL XSIM. MODEL IS (1-0.6*B)XDATA = NOISE. SIMULATION. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- XSIM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED XDATA RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST 1 0 NONE XDATA AR 1 1 NONE.6000

303 8.58 TRANSFER FUNCTION MODELING -->TSMODEL YSIM. MODEL IS YDATA = (0.4*B)/(1-0.8*B)XDATA + --> (1-0.75*B)NOISE. SIMULATION. SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- YSIM VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED YDATA RANDOM ORIGINAL NONE XDATA RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST 1 0 NONE XDATA NUM. 1 1 NONE XDATA DENM 1 1 NONE YDATA MA 1 1 NONE.7500 Noe ha we can direcly specify values for all parameers of he above models. Also noe ha he logical senence SIMULATION is included in boh paragraphs. We can sequenially: (1) specify a seed value for simulaion purposes (see Secion 5.4.3); (2) simulae XDATA; and (3) simulae YDATA by enering he following (SCA oupu is suppressed): -->GSEED = >SIMULATE MODEL IS XSIM. NOBS IS > NOISE IS N(0.0, 6.25). SEED IS GSEED. -->SIMULATE MODEL IS YSIM. NOBS IS > NOISE IS N(0.0, 2.25). SEED IS GSEED. -->SELECT XDATA, YDATA. SPAN IS (51, 250). The NOISE senence is used o specify he variaion of each of he error sequences. We inenionally simulae more han 200 observaions and hen selec only he las 200 values of XDATA and YDATA. We do his o ensure ha any poenial irregulariies in he beginning of he recursive compuaion of values are eliminaed. We now have 200 values in boh XDATA and YDATA. If we desire, we can check o see how consonan hese series are o X and Y by compuing he values of saisics based on (8.43) and (8.44). In paricular: 12.0 (1) µ x = = 30.0 ; (1.6) (2) he ACF for X i s (.6) l, l =1,2,... ; (3) he seady sae gain of he ransfer funcion is g = (4) µ = gµ = 66.0 ; and y x.4 (1.8) = 2 ;

304 TRANSFER FUNCTION MODELING 8.59 (5) v = 0 and he values of he remaining TF weighs are (.4) (.8) l 1, l=1,2,3,... 0 This is no done here. Insead, we will esimae ωb YDATA = C + XDATA + (1 θb)a 1 δb o see how close our esimaes are o he rue model (8.44). A summary from an exac esimaion of his model is given below SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- YMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED YDATA RANDOM ORIGINAL NONE XDATA RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V1 XDATA NUM. 1 1 NONE D1 XDATA DENM 1 1 NONE THETA YDATA MA 1 1 NONE TOTAL SUM OF SQUARES E+04 TOTAL NUMBER OF OBSERVATIONS RESIDUAL SUM OF SQUARES E+03 R-SQUARE EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL VARIANCE ESTIMATE E+01 RESIDUAL STANDARD ERROR E+01 The esimaed values of C, ω, δ and θ (8.66, 0.40, 0.79 and 0.82 respecively) are in reasonable o good accord wih he values used in he simulaion. All diagnosic checks of his model suppor is validiy Compuing he TF weighs of a ransfer funcion model In Secion 5.4.8, we discussed he use of he WEIGHT paragraph o compue he pi or psi-weighs of an ARIMA model. The WEIGHT paragraph can also be used o compue he TF weighs ( v 0, v 1, v 2,... ) for each ransfer funcion of a model specified previously (using he TSMODEL paragraph ). In he case of a ransfer funcion model, he pi and psi-weighs compued from he model are hose corresponding o he disurbance erm. To illusrae he use of he WEIGHT paragraph for a ransfer funcion model, we consider he final esimaed model sored in SALESMDL (see Secion 8.4.5). The fied model is (approximaely)

305 8.60 TRANSFER FUNCTION MODELING B (1 B)SALES =.35 + (1 B)LEADING + (1.626B)a 1.724B or B SALES = LEADING + N 1.724B, where (1 B)N = (1.626B)a. The TF weighs of a ransfer funcion are compued according o v(b) δ (B) =ω(b). j 3 = 4.726(.724) for j 3. The pi- For he above model, we see v0 = v1 = v2 = 0 and weighs for he model are compued from π(b)(1.626b) = (1 B) ; v j As a resul, π =, π 1 =.374, 0 1 j 1 π j =.374(.626) for j 2; and ψ 1 = 1, ψ j =.374 for j 1. We can compue he firs 20 of he above TF, pi and psi-weighs by enering -->WEIGHT SALESMDL. PIWEIGHTS IN NTPI. PSIWEIGHTS IN NTPSI. --> TFWEIGHTS IN SALESTF. MAXIMUM IS 20. If our ransfer funcion model has more han one inpu (explanaory) variables, hen one variable label mus be specified in he TFWEIGHTS senence for each inpu variable of he model. We can display he sored informaion by enering -->PRINT NTPI. NO LABEL. FORMAT IS '5F10.4' E-04 -->PRINT NTPSI. NO LABEL. FORMAT IS '5F10.4' >PRINT SALESTF. NO LABEL. FORMAT IS '5F10.4' These values are hose described above.

306 TRANSFER FUNCTION MODELING 8.61 SUMMARY OF THE SCA PARAGRAPHS IN CHAPTER 8 This secion provides a summary of hose SCA paragraphs employed in his chaper. The synax for many paragraphs is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of a paragraph, while he full display presens all possible modifying senences of a paragraph. In addiion, special remarks relaed o a paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are FILTER, CCF, CORNER, TSMODEL, ESTIM, FORECAST, SIMULATE and WEIGHT. Legend (see Chaper 2 for furher explanaion) v i r w : variable or model name : ineger : real value : keyword

307 8.62 TRANSFER FUNCTION MODELING FILTER Paragraph The FILTER paragraph is used o filer a ime series o a new series according o a specified ime series model. A discussion of he use of filering is found in Secion A special case of his procedure is known as pre-whiening. Common filering for all inpu and oupu series is also useful when he linear ransfer funcion (LTF) mehod is employed. Synax for he FILTER Paragraph FILTER MODEL model-name. OLD ARE v1, v2, ---. NEW ARE v1, v2, ---. Required senence: MODEL Senences Used in he FILTER Paragraph MODEL senence The MODEL senence is used o specify he label (name) of a previously defined univariae ime series model ha will be used o filer he variable(s) specified in he OLD senence. OLD senence The OLD senence is used o specify he names of he series o be filered. If his senence is omied, he oupu variable of he univariae model specified in he MODEL senence will be filered. NEW senence The NEW senence is used o specify he variable(s) where he filered series are sored. The number of variable(s) in his senence mus be he same as ha in he OLD senence if specified. The defaul are he variable(s) of he OLD senence.

308 TRANSFER FUNCTION MODELING 8.63 CCF Paragraph The CCF paragraph is used o compue he cross correlaion funcion beween wo specified ime series. The paragraph also displays for each series some descripive saisics including he sample mean, sandard deviaion and a -saisic on he significance of a consan erm. Synax for he CCF Paragraph Brief synax CCF VARIABLES ARE v1, v2. DFORDERS ARE i1, i2, ---. MAXLAG IS i. Required senence: VARIABLE Full synax CCF VARIABLES ARE v1, v2. DFORDERS ARE i1, i2, ---. MAXLAG IS i. SPAN IS i1, i2. HOLD CCF(v), SDCCF(v). Required senence: VARIABLE Senences Used in he CCF Paragraph VARIABLES senence The VARIABLES senence is used o specify he names of he series o be analyzed. Two series names mus be specified. DFORDERS senence The DFORDERS senence is used o specify he orders of differencing o be applied on each series when differencing is he saionary-inducing ransformaion being used. For example, he order associaed wih he differencing operaor (1-B) is 1 and ha of 12 2 ( 1 B ) is 12. If a power of an operaor is o be used (for example, (1 B) ) hen he differencing order mus be repeaed he appropriae number of imes (in his example, 1, 1). The defaul is none. MAXLAG senence The MAXLAG senence is used o specify he maximum order of CCF o be compued. The defaul is 36.

309 8.64 TRANSFER FUNCTION MODELING SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: CCF SDCCF : he sample CCF of he series : he sandard deviaions of he sample CCF of he series CORNER Paragraph The CORNER paragraph is used o compue he corner able for a sequence of TF (ransfer funcion) weighs. See Secion for more informaion. Synax for he CCF Paragraph CORNER VARIABLE IS v. SIZE IS NROWS(i1), NCOLS(i2). Required senence: VARIABLE Senences Used in he CORNER Paragraph VARIABLES senence The VARIABLES senence is used o specify he name of he variable ha conains he TF weighs from which he corner able will be compued. SIZE senence The SIZE senence is used o specify he number of rows (NROWS) and columns (NCOLS) for he corner able. Assuming he number of TF weighs is k, he defaul value for NROWS is (k+2)/2 and NCOLS is k/2.

310 TRANSFER FUNCTION MODELING 8.65 TSMODEL Paragraph The TSMODEL paragraph is used o specify or modify a ransfer funcion model. The paragraph is also used for he specificaion or modificaion of an ARIMA or inervenion model. The synax descripion for hese usages is provided in Chapers 5 and 6, respecively. For each model specified in a TSMODEL paragraph, a disinguishing label or name mus also be given. A number of differen models may be specified, each having a unique name, and subsequenly employed a a user's discreion. Moreover, he label also enables he informaion conained under i o be modified. Synax for he TSMODEL Paragraph Brief synax TSMODEL NAME IS model-name. MODEL IS model. Required senence: NAME Full synax TSMODEL NAME IS model-name. MODEL IS model. ADD componens of a model. CHANGE componens of a model. DELETE CONSTANT. FIXED-PARAMETERS ARE v1, v2, ---. CONSTRAINTS ARE (v1,v2,---), ---, (v1,v2,---). VARIANCE IS v. SHOW./NO SHOW. CHECK./NO CHECK. ROOTS./NO ROOTS. SIMULATION./NO SIMULATION. UPDATE./NO UPDATE. Required senence: NAME Senences Used in he TSMODEL Paragraph NAME senence The NAME senence is used o specify a unique label (name) for he model specified in he paragraph. This label is used o refer o his model in oher ime series relaed paragraphs or if he model is o be modified.

311 8.66 TRANSFER FUNCTION MODELING MODEL senence The MODEL senence is used o specify a ransfer funcion model. ADD senence The ADD senence is used o specify componen erms ha will be added o an exising model. More informaion is provided in Secion CHANGE senence The CHANGE senence is used o modify componen erms of an exising model. More informaion is provided in Secion DELETE senence The DELETE senence is used o delee explanaory variables or he consan erm from an exising ransfer funcion model. An explanaory variable is deleed by simply lising is name. The consan erm is deleed by specifying he keyword CONSTANT. Once he consan erm is deleed, i can only be re-insered using he MODEL senence. FIXED-PARAMETER senence The FIXED-PARAMETER senence is used o specify he parameers whose values will be held consan during model esimaion, where v's are he parameer names. See Secion for a brief discussion of his senence. The defaul condiion is ha no parameers are fixed. CONSTRAINT senence The CONSTRAINT senence is used o specify ha he parameers wihin each pair of parenheses will be consrained o have he same value during model esimaion. See Secion for a brief discussion of his senence. The defaul condiion is ha no parameers are consrained o be equal. VARIANCE senence The VARIANCE senence is used o specify a variable where he value of he noise variance is or will be sored. If a value for he variable is known, his value will be used as iniial variance in esimaion and he final esimaed value of he variance will be sored in his variable for fuure esimaion or in forecasing. Oherwise he variance is calculaed from he residual series derived from he specified model and parameer esimaes. Noe ha he SCA Sysem designaes an inernal variable for he VARIANCE senence so ha he specificaion of his senence is opional. SHOW senence The SHOW senence is used o display a summary of he specified model. Defaul is SHOW. The summary includes series name, differencing (if any), span for daa, parameer labels (if any) and curren values for parameers. CHECK senence The CHECK senence is used o check wheher all roos of he AR, MA, and denominaor polynomials lie ouside he uni circle. The defaul is NO CHECK.

312 TRANSFER FUNCTION MODELING 8.67 ROOTS senence The ROOTS senence is used o display all roos of he AR, MA and denominaor polynomials. The defaul is NO ROOTS. SIMULATION senence The SIMULATION senence is used o specify ha he model will be used for simulaion purposes. Ordinarily his senence is no specified. See Secion or for more deails. The defaul is NO SIMULATION. UPDATE senence The UPDATE senence is used o specify ha parameer values of he model are updaed using he mos curren informaion available. The defaul is NO UPDATE. In he defaul case, parameer values are updaed only afer execuion of he ESTIM paragraph raher han immediaely. ESTIM Paragraph The ESTIM paragraph is used o conrol he esimaion of he parameers of a ransfer funcion. Synax of he ESTIM Paragraph Brief synax ESTIM MODEL v. HOLD RESIDUALS(v). Required senence: MODEL Full synax ESTIM MODEL v. METHOD IS w. STOP-CRITERIA ARE MAXIT(i), LIKELIHOOD(r1), ESTIMATE(r2). SPAN IS i1, i2. HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). OUTPUT LEVEL(w), PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senence: MODEL

313 8.68 TRANSFER FUNCTION MODELING Senences Used in he ESTIM Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model o be esimaed. The label mus be one specified in a previous TSMODEL paragraph. METHOD senence The METHOD senence is used o specify he likelihood funcion used for model esimaion. The keyword may be CONDITIONAL for he condiional likelihood or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is CONDITIONAL. STOP senence The STOP senence is used o specify he sopping crierion for nonlinear esimaion. The argumen, i, for he keyword MAXIT specifies he maximum number of ieraions (defaul is i=10); he argumen, r1, for he keyword LIKELIHOOD specifies he value of he relaive convergence crierion on he likelihood funcion (defaul is r1=0.0001); and he argumen, r2, for he keyword ESTIMATE specifies he value of he relaive convergence crierion on he parameer esimaes (defaul is r2=0.001). Esimaion ieraions will be erminaed when he relaive change in he value of he likelihood funcion or parameer esimaes beween wo successive ieraions is less han or equal o he convergence crierion, or if he maximum number of ieraions is reached. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, for which he daa will be analyzed. The defaul is he maximum span available for he series. HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: RESIDUAL : he residual series FITTED : he one-sep-ahead forecass (fied values) of he series VARIANCE : variance of he noise DISTURBANCE : he esimaed disurbance series of he model OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for seleced saisics. Conrol is achieved in a wo sage procedure. Firs, a basic LEVEL of oupu (defaul NORMAL) is designaed. Oupu may hen be increased (decreased) from his level by use of PRINT (NOPRINT). The keywords for LEVEL and oupu displayed are: BRIEF : esimaes and heir relaed saisics only

314 TRANSFER FUNCTION MODELING 8.69 NORMAL DETAILED : RCORR : ITERATION, CORR, and RCORR where he keywords on he righ denoe: ITERATION : he parameer and covariance esimaes for each ieraion CORR : he correlaion marix for he parameer esimaes RCORR : he reduced correlaion marix for he parameer esimaes (i.e., a display in which all values have no more han wo decimal places and hose esimaes wihin wo sandard errors of zero are displayed as dos,. ). FORECAST Paragraph The FORECAST paragraph is used o compue he forecas of fuure values of a ime series based on a specified ransfer funcion model. All inpu variables used in he model mus have daa in he forecas period. If necessary, an explanaory variable mus be forecased before forecasing from he ransfer funcion model (see Secion 8.4.7). 2 The FORECAST paragraph requires he curren esimae of he variance σ o compue sandard errors of forecass. The variance for he esimaed model is always sored inernally during he execuion of he ESTIM paragraph, bu he inernal esimae is overwrien a each subsequen execuion of a ESTIM paragraph for he same model. Synax of he FORECAST Paragraph Brief synax FORECAST MODEL v. OFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. ARIMA ARE v1(model-name), v2(model-name), ---. Required senence: MODEL

315 8.70 TRANSFER FUNCTION MODELING Full synax FORECAST MODEL v. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. IARIMA ARE v1(model-name), v2(model-name), ---. JOIN. /NO JOIN. METHOD IS w. HOLD FORECASTS(v1,v2,---), STD_ERRS(v1,v2,---). OUTPUT PRINT(w), NOPRINT(w). Required senence: MODEL Senences Used in he FORECAST Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he model for he series o be forecased. The label mus be one specified in a previous TSMODEL paragraph. NOFS senence The NOFS senence is used o specify for each ime origin he number of ime periods ahead for which forecass will be generaed. The number of argumens in his senence mus be he same as ha in he ORIGINS senence. The defaul is 24 forecass for each ime origin. ORIGINS senence The ORIGINS senence is used o specify he ime origins for forecass. The defaul is one origin, he las observaion. IARIMA senence The IARIMA senence is used o specify he label associaed wih ARIMA model of each sochasic inpu series of a ransfer funcion model. The variable name of each inpu series mus be lised and, in parenheses, he name (label) for is Box-Jenkins ARIMA model. JOIN senence The JOIN senence is used o specify ha he forecass calculaed should be appended o he variable of he model relaive o he specified origin. If more han one origin is specified only he las will be used. The defaul is NO JOIN. METHOD senence The METHOD senence is used o specify he likelihood funcion used for he compuaion of he residual series employed in forecasing. The keyword may be CONDITIONAL for he condiional likelihood, or EXACT for he exac likelihood funcion. See Secion for a discussion of hese wo likelihood funcions. The defaul is EXACT.

316 TRANSFER FUNCTION MODELING 8.71 HOLD senence The HOLD senence is used o specify hose values compued for paricular funcions o be reained in he workspace. Only hose saisics desired o be reained need be named. Values are placed in he variable named in parenheses. The defaul is ha none of he values of he above saisics will be reained afer he paragraph is used. The values ha may be reained are: FORECASTS STD_ERRS : forecass for each corresponding ime origin : sandard errors of he forecass a he las ime origin OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu displayed for various saisics. The defaul condiion is PRINT(FORECASTS); ha is, o display forecas values for each ime origin. To suppress his, specify NOPRINT(FORECASTS). SIMULATE Paragraph The SIMULATE paragraph is used o generae daa according o a user specified univariae ime series model. See Secion for more informaion on his paragraph. A ransfer funcion model mus have been specified previously using he TSMODEL paragraph. Daa for all explanaory variables mus have been eiher ransmied o he SCA workspace or simulaed prior o he simulaion of he response variable of he ransfer funcion model. The paragraph is also used o generae daa according o a user specified disribuion. More informaion on his can be found in Chaper 12 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Synax for he SIMULATE Paragraph SIMULATE VARIABLE IS v. MODEL IS model-name. NOISE IS disribuion (parameers) or VARIABLE(v). NOBS IS i. SEED IS i. Required senences: MODEL, NOISE and NOBS

317 8.72 TRANSFER FUNCTION MODELING Senences Used in he SIMULATE Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he variable o sore he simulaion resuls. The senence is no required if a univariae ime series is generaed. If he senence is no specified, he variable name used in he MODEL senence of he TSMODEL paragraph is used o sore he resuls. MODEL senence The MODEL senence is used o specify he name (label) of he model o be simulaed. The model may be an ARIMA model specified in a TSMODEL paragraph. The senence SIMULATION mus also appear in he TSMODEL paragraph. NOISE senence The NOISE senence is used o specify he noise sequence for he simulaed ime series model. Eiher he disribuion for generaing he noise sequence or he name of a variable conaining values o be used as he sequence is specified. The following disribuions can be used: U(r1,r2) : uniform disribuion beween r1 and r2 N(r1,r2) : normal disribuion wih mean r1 and variance r2 MN(v1,v2) : mulivariae normal disribuion wih mean vecor v1 and covariance marix v2. Noe ha v1 and v2 mus be names of variables defined previously. NOBS senence The NOBS senence is used o specify he number of observaions o be simulaed. SEED senence The SEED senence is used o specify an ineger or he name of a variable for saring he random number generaion. When a variable is used, he seven digi value is used as a seed if i is no defined ye, or he value of he variable is used if he variable is an exising one. Afer he simulaion, he variable conains he seed las used. The number of digis for he seed mus no be more han 8 digis. The defaul is

318 TRANSFER FUNCTION MODELING 8.73 WEIGHT Paragraph The WEIGHT paragraph is used o compue he TF, pi and psi-weighs of a ransfer funcion model. The pi and psi-weighs correspond o he disurbance erm. The WEIGHT paragraph can also be used o compue he pi and psi-weighs of an ARIMA model (see Secion 5.4.8). Synax of he WEIGHT paragraph WEIGHT MODEL model-name. PIWEIGHTS IN v. PSIWEIGHTS IN v. TFWEIGHTS IN v1, v2, ---. MAXIMUM IS i. CUTOFF IS r. Required senences: MODEL Senences Used in he WEIGHT Paragraph MODEL senence The MODEL senence is used o specify he label (name) of he ransfer funcion model for which pi, psi or ransfer funcion weighs are o be compued. The label mus be he one specified in a previous TSMODEL paragraph. PIWEIGHTS senence The PIWEIGHTS senence is used o specify he name of he variable o sore he piweighs associaed wih he disurbance erm of he ransfer funcion model. PSIWEIGHTS senence The PSIWEIGHTS senence is used o specify he name of he variable o sore he psiweighs associaed wih he disurbance erm of he ransfer funcion model. TFWEIGHTS senence The TFWEIGHTS senence is used o specify he names of he variables o sore he TF weighs for he ransfer funcion model. The number of variables specified in his senence mus be less han or equal o he number of ransfer funcion componens in he model. The weighs associaed wih he firs ransfer funcion componen are sored in he firs variable, he weighs associaed wih he second ransfer funcion componen are sored in he second variable, and so on. MAXIMUM senence The MAXIMUM senence is used o specify he maximum number of weighs o be compued. The defaul is 100 for all weighs o be compued.

319 8.74 TRANSFER FUNCTION MODELING CUTOFF senence The CUTOFF senence is used o specify a cuoff value o limi he number of weighs ha will be sored. The las weigh sored represens he las value greaer han or equal o (in absolue value) he cuoff value. Noe ha he specificaion of a cuoff value will cause he variables ha sore he weighs o have differen lenghs. The defaul cuoff value is 0; ha is, all weighs will be sored. REFERENCES Abraham, B., and Ledoler, J. (1983). Saisical Mehods for Forecasing. New York: Wiley. Bell, W.R. and Hillmer, S.C. (1983). Modeling Time Series wih Calendar Variaion. Journal of he American Saisical Associaion, 78: Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden Day. (Revised ediion published in 1976). Cleveland, W.P. and Grupe, M.R. (1983). Modeling Time Series When Calendar Effecs are Presen. Applied Time Series Analysis of Economic Daa (ed. Arnold Zellner). U.S. Deparmen of Commerce: Cochrane, D. and Orcu, G.H. (1949). Applicaion of Leas Square Regression o Relaions Conaining Auocorrelaed Error Terms. Journal of he American Saisical Associaion, 44: Hildreh, G. and Lu, J.Y. (1960). Demand Relaions wih Auocorrelaed Disurbances. Michigan Sae Universiy Agriculural Experimen Saion, Technical Repor 276. Hillmer, S.C., (1982). Forecasing Time Series wih Trading Day Variaion. Journal of Forecasing, 1: Hillmer, S.C., Bell, W.R. and Tiao, G.C. (1981). Modeling Consideraions in he Seasonal Adjusmen of Economic Time Series. Proceedings of he Conference on Applied Time Series Analysis of Economic Daa (ed. Arnold Zellner). U.S. Deparmen of Commerce, Bureau of he Census: Koyck, L.M. (1954). Disribued Lags and Invesmen Analysis. Amserdam: Norh Holland. Liu, L.-M. (1980). Analysis of Time Series wih Calendar Effecs Managemen Science, 26: Liu, L.-M. (1986). Idenificaion of Time Series Models in he Presence of Calendar Variaion. Inernaional Journal of Forecasing, 2: Liu, L.-M. (1987). Sales Forecasing Using Muli-Equaion Transfer Funcion Models. Journal of Forecasing 6: Liu, L.-M. and Hanssens, D.M. (1982). Idenificaion of Muliple-Inpu Transfer Funcion Models. Communicaions in Saisics A 11:

320 TRANSFER FUNCTION MODELING 8.75 Liu, L.-M. and Hudak, G.B. (1985). Unified Economeric Model Building Using Simulaions Transfer Funcion Equaions. Time Series Analysis: Theory and Pracice 7: Amserdam: Elsevier Science Publishing. Liu, L.-M., Hudak, G.B., Box, G.E.P., Muller, M.E. and Tiao, G.C. (1986). The SCA Saisical Sysem: Reference Manual for Forecasing and Time Series Analysis. DeKalb, IL: Scienific Compuing Associaes. Pankraz, A. (1991). Forecasing wih Dynamic Regression Models. New York: Wiley. Pierce, D.A. (1971). Leas Square Esimaion in he Regression Model wih AuoregressiveMoving Average Errors. Biomerika, 64: Salinas, T.S. (1983). Modeling Time Series wih Trading Variaion. PhD hesis, The Universiy of Kansas. Salinas, T.S. and Hillmer, S.C. (1987a). Mulicollineariy Problems in Modeling Time Series wih Trading-Day Variaion. Journal of Business and Economic Saisics, 5: Salinas, T.S. and Hillmer, S.C. (1987b). Time Series Model Idenificaion in he Presence of Trading Day Variaion. American Saisical Associaion 1987 Proceedings of he Business and Economic Saisics Secion: Thompson, H.E. and Tiao, G.C. (1971). Analysis of Telephone Daa: A Case Sudy of Forecasing Seasonal Time Series. The Bell Journal of Economics and Managemen Science, 2: Vandaele, W. (1983). Applied Time Series Analysis and Box-Jenkins Models. New York: Academic Press. Wei, W.W.S. (1990). Time Series Analysis: Univariae and Mulivariae Mehods. Redwood Ciy, CA: Addison-Wesley. Young, A.H. (1965). Esimaing Trading-Day Variaion in Monhly Economic Time Series. Technical Paper 12, Bureau of he Census.

321

322 CHAPTER 9 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING In his chaper we discuss he use of various general exponenial smoohing mehods for forecasing. There are many possible ways o forecas a ime series. The main emphasis of forecasing mehods presened hus far is a model-based approach advocaed by Box, Jenkins, Tiao, and ohers. Tradiionally, however, forecasing has been performed using various empirical mehods. Some of hese mehods were developed employing saisical heory, while ohers were developed mainly based on empirical experiences. These mehods share a similar characerisic. Tha is, he forecass are based essenially on smoohing (averaging) pas values of a ime series using some ype of decreasing weighing scheme. In paricular, hese weighs ofen follow an exponenially decreasing paern. As a resul, his mehod of forecasing is ofen referred o as general exponenial smoohing. We can access he exponenial smoohing mehods of he SCA Sysem hrough he GFORECAST paragraph. We will only provide a cursory discussion of various mehods in he remainder of his chaper. More complee informaion can be found in Abraham and Ledoler (1983), Harvey (1984), Makridakis and Wheelwrigh (1978), Makridakis, Wheelwrigh and McVee (1986), Mongomery and Johnson (1976), Box and Jenkins (1970), Bowerman and O'Connell (1987), Brown (1962), Brown and Meyer (1961), Muh (1960) and references conained herein. In his chaper, Secions 9.1 hrough 9.7 provide basic informaion on he available general exponenial smoohing mehods in he SCA Sysem. These mehods are: (1) Simple exponenial smoohing (Secion 9.1), (2) Double exponenial smoohing (Secion 9.2), (3) Hol's wo parameer exponenial smoohing (Secion 9.3), (4) Winers' addiive seasonal exponenial smoohing (Secion 9.4), (5) Winers' muliplicaive seasonal exponenial smoohing (Secion 9.5), (6) General exponenial smoohing using seasonal indicaors (Secion 9.6), and (7) General exponenial smoohing using harmonic (rigonomeric) funcions (Secion 9.7) One or more examples of each smoohing mehod are provided in each secion. Secion 9.8 presens some commenary on forecasing using general exponenial smoohing mehods.

323 9.2 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING The exponenial smoohing mehods (1) hrough (3) are mos ofen used in forecasing non-periodic (non-seasonal) ime series. Mehods (4) hrough (7) are only appropriae for periodic (seasonal) ime series, in paricular monhly or quarerly ime series. Compuaional algorihms and compuer programs employed in he exponenial smoohing capabiliies of he SCA Sysem are adaped from hose presened in Abraham and Ledoler (1983). Relaionship beween general exponenial smoohing and ARIMA models Abraham and Ledoler (1983, 1986) have invesigaed relaionships beween various exponenial smoohing mehods and ARIMA models. They show various equivalence relaionships beween forecass from general exponenial smoohing and forecass from ARIMA models. As a resul, in each of he nex seven secions, corresponding ARIMA models are provided whenever possible for each smoohing mehod. This informaion may be useful in ligh of he discussion in Secion 9.8 regarding forecasing using ARIMA models (or model-based approaches) and general exponenial smoohing mehods. Missing daa In he modeling of ime series using ARIMA or ransfer funcion models, we are able o employ special compuaional algorihms or use oher procedures o idenify and esimae a model for ime series wih missing daa (see Secions and 7.7). Alhough coded missing daa can be idenified, he compuaional algorihms in he SCA Sysem's exponenial smoohing capabiliies have no special way for dealing wih missing daa. If missing daa are presen in a ime series, we may wish o replace hese values by some appropriae values (see Secion 5.4.2) using analyic saemens (see Appendix B), or he PATCH paragraph (see Appendix C). If missing daa are presen in a series, hen he firs occurrence of a non-missing value and he occurrence of he nex missing daa poin are noed inernally. Only he non-missing daa in his span are used in he calculaion of smoohed forecass. 9.1 Simple (Single) Exponenial Smoohing Simple (or single) exponenial smoohing is a forecasing mehod ha assumes he mean of a series is consan over shor periods of ime (i.e., locally consan). The mean level is allowed o change slowly over ime, bu i is assumed ha here is no overall rend in he series. In such a case, i is reasonable o forecas all fuure observaions by giving more weigh o he mos recen observaions and less o disan pas observaions. There are many choices for such a weighing scheme. One choice is o use weighs ha will decrease exponenially wih he age of he observaions. In his case he forecas of he fuure observaion made from ime =n (denoed by Ẑ() l ) can be calculaed from Z n +l 2 Ẑ() n l = (1 ω )[Zn +ω Zn 1+ω Z n 2 + ] (9.1) where ω is called he discoun coefficien (-1 < ω < 1). We can also express (9.1) as 2 Ẑ (9.2) n( l) =α [Z n + (1 α )Z n 1+ (1 α ) Z n 2 + ] = Sn n

324 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.3 where α= 1 ω is used. The value α is called he smoohing consan, and S n is called he smoohed saisic a ime =n. We may noe ha in he above derivaion of simple exponenial smoohing, he forecass from a fixed ime origin are he same. This is reasonable as simple exponenial smoohing assumes a locally consan mean ha is no subjec o rends. This approach differs slighly from he radiional implemenaion of simple exponenial smoohing in obaining muli-sep ahead forecass (see Secion 9.1.3). We can also express he smoohed saisic S as a funcion of and S. (9.3) n =α Z n + (1 α)sn 1 [1] S n is also referred o as a single exponenially smoohed saisic and may be denoed as S n. [1] If we repea he above smoohing procedure using Sn in place of Z n, we produce a new smoohed saisic [2] [1] [2] Sn =α S n + (1 α)sn (9.4) called he double smoohed saisic (see Secion 9.2). Repeaed applicaions of he [3] smoohing procedure produce exponenial smoohed saisics of higher orders (i.e., S n, riple smoohed saisic). n Zn Sn Calculaion of S n Since S =α Z + (1 α)s, i is rue ha n n n 1 S =α [Z + (1 α )Z + + (1 α ) Z ] + (1 α) n 1 n n n 1 1 S. (9.5) Thus a value for α and an iniial value for S 0 mus be eiher specified or deermined in order o begin he generaion of he smoohed saisic. The SCA Sysem does no esimae any model parameers for any of he general exponenial smoohing mehods. As a resul, we mus specify a value for α. For informaion concerning he deerminaion of α, see Abraham and Ledoler (1983) or Makridakis, Wheelwrigh and McVee (1986). Since S 0 is he level of he series a is beginning (i.e., ime zero), i is reasonable o esimae i b y averaging he firs few observaions. Some auhors consider he average of he firs wo observaions, S 0 = (Z 1 + Z 2 )/ 2, while ohers (Makridakis and Wheelwrigh, 1978) advocae he choice of S0 = Z1. In pracice he choice of S0 is usually no imporan for a reasonably long series. The defaul choice in he SCA Sysem is S 0 + (Z1+ Z 2)/ 2. This defaul can be changed wih he inclusion of he START senence (see he synax a he end of his chaper). Depending upon he assumpions made, S n can be he forecas for all fuure values or be used as par of he calculaion of Ẑ() l. We discuss his in more deail below. n n 0

325 9.4 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Relaion o ARIMA models The forecass from simple exponenial smoohing are equivalen o hose from he ARIMA(0,1,1) model (Box and Jenkins, 1970) (1 B)Z = (1 θb)a where θ= 1 α=ω, wih α and ω as defined above. (9.6) For his ARIMA(0,1,1) model, he minimum mean squared error forecass (Box and Jenkins, 1970) of all fuure observaions, Z n +l ( l = 1,2,... ), are given by he laes exponenially weighed average, S. n Some remarks on muli-sep ahead forecass The one-sep-ahead forecas for simple exponenial smoohing is given by Ẑ(1) = S =α Z + (1 α )S. (9.7) n n n n 1 If we use equaion (9.7) o obain he wo-sep-ahead forecas, Ẑ(2), we have n S =α Z + (1 α) S. (9.8) n+ 1 n+ 1 n If we now replace he unknown observaion Z n + 1wih is forecas, Ẑ(1) n Sn =, we obain Ẑ(). (9.9) n l = Sn+ 1 =α S n + (1 α )Sn = Sn If we coninue o use above derivaion for he hree-sep-ahead forecas, four-sep-ahead forecas, and so on, we will see ha he muli-sep-ahead forecass for simple exponenial smoohing are all he same (i.e., Ẑ() n l = Sn). This is exacly wha was presened earlier in (9.2). Some auhors (e.g., Makridakis and Wheelwrigh, 1978) and sofware packages proceed differenly in he calculaion of muli-sep-ahead forecass. In order o obain he wo-sepahead forecas, Ẑ(2) n, he unknown value Z n + 1is replaced by he laes available observaion, Z. As a resul n Z ˆ n(2) =α Z n + (1 α )S ˆ n =α Z n + (1 α)z n (1). (9.10) Similarly, in he calculaion of he hree-sep-ahead forecas, observaion, Z n, resuling in Z n + 2is replaced by he las Z(3) ˆ =α Z + (1 α)z(2) ˆ. (9.11) n n n If we coninue in his manner, we obain

326 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.5 Z ˆ ( l) =α Z + (1 α)z ˆ ( l 1), l = 2,3,.... (9.12) n n n Using his approach, we see ha he muli-sep-ahead forecass from a fixed origin will vary somewha wih he forecas lead ime, l. This sligh variaion in forecass has no been shown o be beer han he fixed value forecass based on minimum mean square error. I is ineresing o observe ha equaion (9.12) is no valid for he firs forecas (i.e., for l=1) since i becomes Z ˆ. (9.13) n(1) =α Z ˆ n + (1 α )Z n(0) =α Z n + (1 α )Zn = Zn This resul is in conflic wih hose of simple exponenial smoohing. However, he formulaion described by (9.12) has become a radiional means o implemen simple exponenial smoohing. In order o be consisen wih he radiional resuls of simple exponenial smoohing, he recursive formula employed in Makridakis and Wheelwrigh (1978) is used in he SCA Sysem o generae muli-sep-ahead forecass. If we wish o be cerain o obain he mulisep forecass Ẑ() n l = Sn, we should specify and forecas from an ARIMA(0,1,1) model (see Chaper 5). In addiion o he compuaion of forecass, he laer approach also provides us wih sandard errors of he forecass. In using he ARIMA approach, we can also esimae he discoun coefficien (since ω=θ based on he series Examples of simple exponenial smoohing We now illusrae he use of he GFORECAST paragraph wih wo examples. In he firs example, we explain he SCA oupu produced; and in he second example, we compare he forecass obained wih hose from an ARIMA (0,1,1) model. Example: Growh raes of Iowa nonfarm income For our firs example, we consider he growh raes of Iowa nonfarm income. The daa, Series 2 of Abraham and Ledoler (1983), are quarerly growh raes from he second quarer of 1948 hrough he fourh quarer of The daa, lised in Table 9.1 and shown in Figure 9.1, are sored in he SCA workspace under he label GROWTH.

327 9.6 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Table 9.1 Quarerly growh raes of Iowa nonfarm income, 1948/II /IV from Abraham and Ledoler (1983) (Read daa across he line) Figure 9.1 Quarerly growh raes of Iowa nonfarm income (1948/II /IV) Abraham and Ledoler (1983, page 93) deermined ha he minimum sum of squared errors of one-sep-ahead forecass occurs for α abou We can obain forecass for he nex 5 quarers by enering -->GFORECAST GROWTH. METHOD IS SIMPLE. WEIGHT IS NOFS ARE 5. There are hree required senences in he above use of he GFORECAST paragraph. We need o specify he variable o forecas (GROWTH), he mehod o use (SIMPLE o indicae simple exponenial smoohing), and a value for α (0.11). The NOFS senence is used o specify ha only 5 forecass from he las observaion are desired. The defaul number of forecass produced is 24. The following oupu is produced SIMPLE EXPONENTIAL SMOOTHING FOR THE SERIES GROWTH SMOOTHING CONSTANT INITIAL S0 DERIVED FROM 2 OBSERVATIONS L STEP AHEAD FORECASTS FOR GROWTH FROM TIME ORIGIN 127 MSE(ALPHA) =.92632

328 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.7 TIME FORECAST The SCA oupu includes a summary of how he forecass are obained. We see ha simple exponenial smoohing is used, wih a smoohing consan of 0.11 and S0 is based on he average of he firs wo observaions (see Secion for more informaion on S0). Five forecass are compued and displayed. The forecas for n=128 is compued using equaion (9.5). All remaining forecass are based on equaion (9.12). The value lised as MSE(ALPHA),.92632, is he sum of squared errors of he one-sep-ahead forecass (made from =1,2,...,n-1) divided by he number of observaions (here 127). Example: Series A of Box and Jenkins As a second example of he use GFORECAST paragraph for simple exponenial smoohing, we use Series A of Box and Jenkins (1970). The daa, sored in he SCA workspace under he label SERIESA, was modeled previously (see Secion hrough 5.1.6) as an ARIMA (0,1,1) model. We found he esimae of he MA parameer o be approximaely 0.7. Hence we should obain abou he same one-sep-ahead forecas if we use he smoohing consan = 0.3. We ener he following -->GFORECAST SERIESA. METHOD IS SIMPLE. WEIGHT IS > ORIGINS ARE 195, 196, 197. NOFS ARE 5. The command above is similar o he one used in he previous example. An addiional senence, ORIGINS, is included. This senence is used o specify he forecas origin(s) o use. The defaul origin used is from he las observaion (here 197). We have specified ha forecass will be produced from he las 3 observaions. As a resul, we can compare forecas values o observed values, as well as comparing he one-sep-ahead forecas from 197 wih ha obained previously. We obain he following oupu SIMPLE EXPONENTIAL SMOOTHING FOR THE SERIES SERIESA SMOOTHING CONSTANT INITIAL S0 DERIVED FROM 2 OBSERVATIONS L STEP AHEAD FORECASTS FOR SERIESA FROM TIME ORIGIN 195 MSE(ALPHA) =.99910E-01 TIME FORECAST L STEP AHEAD FORECASTS FOR SERIESA FROM TIME ORIGIN 196 MSE(ALPHA) =.10067

329 9.8 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING TIME FORECAST L STEP AHEAD FORECASTS FOR SERIESA FROM TIME ORIGIN 197 MSE(ALPHA) = TIME FORECAST The oupu is similar o ha of he previous example, excep forecas informaion is provided for hree separae origins. The hree differen MSE values are aribuable o he hree differen sample sizes used o compue forecass. We can compare he one-sep-ahead forecass obained for he hree origins ( from 195, from 196, and from 197) wih he acual values (17.20 for 196 and for 197) and he forecased value obained from model fied previously ( from he 197 forecas origin). We see ha he exponenially smoohed model slighly under forecass boh of he acual values. The wo forecass from he same origin are almos idenical, as hey should be. 9.2 Double Exponenial Smoohing Double exponenial smoohing assumes ha a ime series follows a linear rend model near he observaion Z, so ha Z n =β +β j+ a. (9.14) n+ j 0 1 n+ j The esimaes for β0 and β1 are obained hrough discouned leas squares (see Abraham and Ledoler, 1983 or Mongomery and Johnson, 1976). I can be shown ha β ˆ = 2S S [1] [2] 0 n n ˆ α [1] [2] β 1 = (Sn S n ) 1 α (9.15) [1] [2] where Sn and S n are single and double smoohed saisics respecively, as given in equaions (9.3) and (9.4). If we subsiue he esimaes of (9.15) back ino he linear rend model, we obain he following forecass [1] [2] Ẑ n( l α α ) = 2 + Sn 1+ S n for =1,2,... 1 α l 1 α l l (9.16)

330 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Calculaion of Ẑ() n l As in he case of simple exponenial smoohing, he GFORECAST paragraph requires ha we specify a value for he smoohing consan, α, for he calculaion of he l -h sep ahead forecas, Ẑ() n l. The SCA Sysem calculaes he iniial values for he smoohed [1] [2] saisics Sn and S n from he leas squares esimaes of β 0 and β 1 in he linear rend model. The SCA Sysem provides an appropriae se of values. However, we can also specify he number of observaions o be used in his regression (see he START senence in he synax a he end of his chaper). Deails regarding he mehod of calculaion may be found in Abraham and Ledoler (1983) Relaion o ARIMA models The forecass from double exponenial smoohing are equivalen o hose from he resriced ARIMA(0,2,2) model (1 B) Z = (1 θb) a 2 2 where θ= 1 α, wih α he smoohing consan. (9.16) Examples of double exponenial smoohing We will use wo examples o illusrae forecasing using double exponenial smoohing. Boh examples are discussed in Abraham and Ledoler (1983). The firs example is also used in Secion 9.3. Example: Weekly hermosa sales The firs example uses 52 observaions consising of weekly hermosa sales. The daa, firs used by Brown (1962, page 431) and also by Abraham and Ledoler (1983, page 110), are lised in Table 9.2 and displayed in Figure 9.2. The daa are sored in he SCA workspace under he label THERM. Table 9.2 Weekly hermosa sales from Brown (1962) (Read daa across he line)

331 9.10 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Figure 9.2 Weekly hermosa sales The plo in Figure 9.2 shows ha here is an upward rend in he daa. Hence he use of simple exponenial smoohing (ha assumes a mean level ha is consan locally) is no appropriae. Abraham and Ledoler (1983, page 115) deermine ha he value for he smoohing consan should be approximaely We can forecas using his weigh by enering -->GFORECAST THERM. METHOD IS DOUBLE. WEIGHT IS NOFS ARE 10. The command above is similar o ha used for simple exponenial smoohing, excep ha DOUBLE is specified as he mehod. We also limi he number of forecass o 10 from he las observaion. We obain DOUBLE EXPONENTIAL SMOOTHING FOR THE SERIES THERM SMOOTHING CONSTANT INITIAL S0 AND T0 DERIVED FROM THE FIRST 2 OBSERVATIONS L STEP AHEAD FORECASTS FOR THERM FROM TIME ORIGIN 52 MSE(ALPHA) = TIME FORECAST The oupu is similar o ha provided for simple exponenial smoohing, excep ha wo iniial [1] [2] values, S0 and T0 (for S 0 and S 0, respecively) are deermined. Please see Secion for deails.

332 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.11 Example: Universiy enrollmen The second example forecass he oal annual suden enrollmen a he Universiy of Iowa for he academic years beginning in wih he fall of 1951 hrough he spring of The daa, lised in Table 9.3 and displayed in Figure 9.3, are used by Abraham and Ledoler (1983, page 116) and are sored in he SCA workspace under he label ENROLL. Table 9.3 Toal annual suden enrollmen a he Universiy of Iowa, 1951/52 hrough 1979/80 (Read daa across he line) Figure 9.3 Universiy of Iowa suden enrollmen ( ) Again, a rend is eviden in he daa. Abraham and Ledoler (1983, page 117) find ha he opimal smoohing consan is To produce forecass for he nex hree academic years, we can ener -->GFORECAST ENROLL. METHOD IS DOUBLE. WEIGHT IS NOFS ARE 3. DOUBLE EXPONENTIAL SMOOTHING FOR THE SERIES ENROLL SMOOTHING CONSTANT INITIAL S0 AND T0 DERIVED FROM THE FIRST 2 OBSERVATIONS L STEP AHEAD FORECASTS FOR ENROLL FROM TIME ORIGIN 29 MSE(ALPHA) =.74956E+06 TIME FORECAST

333 9.12 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.3 Hol s Two Parameer Exponenial Smoohing An alernaive mehod for forecasing in he presence of a linear rend was proposed by Hol (1957). In Hol's represenaion, we assume we have a linear rend model wih ime varying mean and slope. Thus forecass from ime =n are based on he model of he form Z =µ +β j+ a, (9.17) n+ j n n n+ j where µ n and βn are he level and slope a ime =n. The forecass of fuure observaions a =n are given by where Z()=µ ˆ n l ˆ ˆ n +βn l, (9.18) µ ˆ =α Z + (1 α )( µ ˆ ˆ +β ), and n 1 n 1 n 1 n 1 β ˆ =α ( µ ˆ µ ˆ ) + (1 α ) β ˆ n 2 n n 1 2 n 1 The updaing equaions above (for µ and β ) conain wo smoohing consans. The value α 1 is he smoohing consan for he level (µ ), and α 2 is he smoohing consan for he slope ( β ) Calculaion of forecass and relaion o double exponenial smoohing As before, he GFORECAST paragraph requires ha we provide he smoohing consans used in he calculaion of he l h sep ahead forecas. Here we mus specify wo smoohing consans, α1 and α2. Esimaes of µ n and β n are calculaed by he SCA Sysem inernally. The Hol mehod of exponenial smoohing is more general han double exponenial smoohing since we use wo smoohing consans. The wo mehods are equivalen if 2 α α 1 = 1 (1 α) and α 2 = (9.19) 2 α Relaion o ARIMA models Forecass derived using Hol s wo parameer exponenial smoohing are equivalen o hose from he ARIMA model 2 2 (1 B) Z, (9.20) = (1 θ1b θ2b )a where θ = 2(1 α ) +α (1 α2 ) and θ = (1 α ), 2 1 wih and α he smoohing consans of Hol s mehod. α1 2

334 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Example: Weekly hermosa sales To illusrae he use of he GFORECAST paragraph o implemen Hol s mehod, we will forecas he weekly hermosa sales of Brown (1962). Forecass for his series, THERM, were compued previously using double exponenial smoohing. Here we will use a value of 0.20 as he smoohing consan for he level, and 0.10 as he smoohing consan for he slope. As in Secion 9.2.3, we will compue 10 forecass from he las observaion. To obain he forecass, we may ener -->GFORECAST THERM. METHOD IS HOLT. WEIGHTS ARE 0.2, 0.1. NOFS ARE 10. The command is almos he same as before wih HOLT subsiuing for DOUBLE in he METHOD senence. Since wo smoohing consans are required for Hol's mehod, we specify wo values in he WEIGHTS senence. We obain he following HOLT'S EXPONENTIAL SMOOTHING FOR THE SERIES THERM SMOOTHING CONSTANTS L STEP AHEAD FORECASTS FOR THERM FROM TIME ORIGIN 52 MSE(ALPHA) =.12833E+08 TIME FORECAST (Forecass using double exponenial smoohing) The forecass using double exponenial smoohing have been super-imposed on he SCA oupu. We noe he forecass are raher similar. The smoohing consan used for double exponenial smoohing was From equaion (9.19), we know ha he wo mehods are equivalen if 2 α 1 = 1 (1.14) =.2604, and α 2 =.14 /(2.14) = Since he smoohing consans used for Hol s mehod are.2 and.1, we should expec reasonable agreemen in he forecass. 9.4 Winers Addiive Seasonal Exponenial Smoohing Mehod Winers (1960) proposed wo exponenial smoohing mehods o forecas ime series ha possess seasonal paerns: an addiive and an muliplicaive mehod. These mehods differ in heir assumpion on how he seasonal componen affecs he ime series. We presen he muliplicaive mehod in Secion 9.5 and he addiive mehod is discussed below.

335 9.14 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Winers addiive mehod assumes ha he daa follow he model Z = T + S + a n+ j n+ j + j n j +, (9.21) where T =µ +β n+ j n nj is a rend componen, S n + jis an addiive seasonal facor, and and β n are he level and slope of he series a ime = n. If he period (season) of he series is s (e.g., 12 for monhly daa, 4 for quarerly daa, ec.), hen he variaion due o seasonal aciviy is accouned for hrough s seasonal facors such ha: (1) S = S = S = i=1,2,...s, and (9.22) (2) i i+ s i+ 2s S1+ S2 + + Ss = 0 Winers addiive mehod is usually appropriae for a ime series in which he ampliude of he seasonal effec does no depend on he mean level of he series (Mongomery and Johnson, 1976). Winers' addiive mehod is an exension of Hol s wo parameer mehod (see Secion 9.3) in which a seasonal erm is included. Forecass for Winers addiive mehods involve weighed updaes of he level, he slope and he seasonal facors. Similar o Hol's mehod, hree differen smoohing consans may be employed for he updaes of µ, β and he seasonal facors. The forecass of fuure observaions are µ n where Z ˆ ( l) =µ ˆ +β ˆ l+ Sˆ l = 1, 2,...,s n n n n+ l s Z ˆ ( l) =µ ˆ +β ˆ l+ Sˆ l = s + 1,s + 2,..., 2s n n n n+ l 2s... µ ˆ =α (Z S ˆ ) + (1 α )( µ ˆ +β ˆ ) n 1 n n 2 1 n 1 n 1 β ˆ =α ( µ ˆ µ ˆ ) + (1 α ) β ˆ ˆ n 2 n n 1 2 n 1 ˆ S n =α3(z n µ n) + (1 α 3)Sn s ˆ Calculaion of Z() ˆ l n For Winer s addiive mehod, we are required o specify hree smoohing consans ( α 1, α2 and α3 ) in he calculaion of he l h sep ahead forecas, Ẑ() n l. These correspond o smoohing consans for he level, rend and seasonal componens, respecively. Esimaes of oher parameers are calculaed by he SCA Sysem inernally. Deails regarding he mehod of calculaion may be found in Abraham and Ledoler (1983).

336 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Relaion o ARIMA models Forecass derived using Winer s addiive exponenial smoohing mehod are equivalen o hose from he ARIMA model = θ θ θ θ s 2 s s+ 1 (1 B)(1 B )Z (1 1B 2B sb s+ 1B )a where θ = 1 α (1+α ), θ = α α, j = 2,3,...,s 1 j 1 2 θ s = (1 α3) α1( α2 α3 ), and θ = (1 α )(1 α ) s wih,, α he smoohing consans. α1 α Examples of Winers addiive smoohing mehod The use of he GFORECAST o compue forecass using Winers addiive mehod is illusraed wih wo examples. The firs example is also used in Secions 9.6 and 9.7. The second example is used o compare forecass using Winers addiive mehod wih ha of a seasonal ARIMA model. Example: Monhly car sales In he firs example, we consider he monhly car sales in Quebec in he period January 1960 hrough December The daa, lised in Table 9.4 and displayed in Figure 9.4, are Series 4 of Abraham and Ledoler (1983) and are sored in he SCA workspace under he label CARS. Table 9.4 Monhly car sales In Quebec, January 1960 o December 1968 (Read daa across he line)

337 9.16 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Figure 9.4 Monhly car sales in Quebec (January December 1968) Abraham and Ledoler (1983, page 170) deermined he opimum values of he smoohing consans o be 0.17, 0.01, and CARS consiss of 108 observaions, bu we will forecas from n=96. In his way we can see how well he forecass mach he las year of daa. We will forecas from his origin in all subsequen uses of his daa se. To compue he forecass, we can ener -->GFORECAST CARS. METHOD IS AWINTERS. SEASONALITY IS > WEIGHTS ARE 0.17, 0.01, ORIGIN IS 96. The number of seasonal facors is dependen on he seasonal period of he daa. Hence we include he SEASONALITY senence. The hree smoohing consans are specified in he WEIGHTS senence. We obain he following WINTERS ADDITIVE SEASONAL EXPONENTIAL SMOOTHING FOR THE SERIES SMOOTHING CONSTANTS CARS L STEP AHEAD FORECASTS FOR CARS FROM TIME ORIGIN 96 MSE(AL1,AL2,AL3) =.26856E+07 TIME FORECAST (Observed)

338 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING The observed values have been super-imposed. The pos-sample RMSE for he forecass is abou Example: Airline daa As a second illusraion of forecasing using Winers addiive mehod, we consider Series G of Box and Jenkins (1970), airline passenger daa. This series was modeled in Secion 5.3. The naural logarihm of he daa, LNAIRPAS, was used for modeling and forecasing. We found ha an adequae model for he daa was, approximaely, (1 B)(1 B )Z = (1.4B)(1.6B )a If we muliply he MA operaors of (9.23), we obain he following (1 B)(1 B )Z = (1.4B.6B +.24B )a (9.23). (9.24) Based on he relaion given in Secion 9.4.2, he forecass obained from he model given in (9.24) should be similar o hose obained from a Winers addiive model wih smoohing consans 0.6, 0.01, and 0.4 (an exac correspondence is no possible here). We can obain he laer forecass by enering -->GFORECAST LNAIRPAS. METHOD IS AWINTERS. ORIGIN IS > WEIGHTS ARE 0.6, 0.01, 0.4. SEASONALITY IS 12. NOFS IS 12. A forecas origin of 132 is used so ha he Winers forecass can be compared o hose based on boh he FORECAST and OFORECAST paragraphs. These forecass are summarized in Table 7.2 of Chaper 7. We see ha he forecass are in reasonable accord. WINTERS ADDITIVE SEASONAL EXPONENTIAL SMOOTHING FOR THE SERIES LNAIRPAS SMOOTHING CONSTANTS L STEP AHEAD FORECASTS FOR LNAIRPAS FROM TIME ORIGIN 132 MSE(AL1,AL2,AL3) =.17155E-02 TIME FORECAST

339 9.18 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.5 Winers Muliplicaive Seasonal Exponenial Smoohing Mehods We now consider he muliplicaive analogue of Winers addiive exponenial smoohing mehod. Winers' muliplicaive mehod assumes ha a ime series follows he model Z = ( µ +β j)s + a n+ j n n n+ j n j +, (9.25) where ( µ n +βnj) is a rend componen, S n + jis a muliplicaive seasonal facor, and µ n and β n are he level and slope of he series a ime =n. The muliplicaive model is usually appropriae for a ime series in which he ampliude of he seasonal paern is proporional o he level of he series (Mongomery and Johnson, 1976). As in he addiive model, a number of seasonal facors are used, depending on he seasonal period. If he seasonal period for he model is s, here are s seasonal facors such ha: (1) Si = Si+ s = S i+ 2s =... i = 1,2,...,s and (9.26) (2) S + S + + S = s 1 2 s Forecass using he muliplicaive mehod are similar o ha of he addiive mehod excep ha a raio replaces an addiive erm in seasonal weighing scheme. The forecass of fuure observaions are where Z ˆ ( l) = ( µ ˆ +β ˆ l)sˆ l = 1, 2,...,s n n n n+ l s Z ( l) = ( µ ˆ +β ˆ l)sˆ l = s + 1,s + 2,..., 2s n n n n+ l 2s... Z n µ ˆ n =α 1 ˆ ˆ + (1 α 1)( µ n 1+βn ) Ŝ 1 n 2 β ˆ =α ( µ ˆ µ ˆ ) + (1 α ) β ˆ n 2 n n 1 2 n 1 ˆ Z S =α + (1 α ˆ µ )Sˆ n n 3 3 n s Calculaion of Ẑ() n l As in he case of he addiive model, we are required o specify hree smoohing consans (, and α ) in he calculaion of he l h sep ahead forecas, Ẑ() l. α1 α2 3 n

340 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.19 Esimaes of oher parameers are calculaed by he SCA Sysem inernally. Deails regarding he mehod of calculaion may be found in Abraham and Ledoler (1983) Relaion o ARIMA models There is no exac equivalen ARIMA model corresponding o Winers muliplicaive mehod (Abraham and Ledoler, 1986). Alhough here is no equivalen ARIMA model, he ARIMA model (1 B ) Z = (1 θb θ B θ B )a s 2 2 2s 1 2 2s leads o very similar forecas funcions Example: Beer shipmens To illusrae he use of he GFORECAST paragraph o compue forecass based on Winers muliplicaive mehod, we consider shipmen daa from a beer producer. The daa, Series 8 in Abraham and Ledoler (1983), are he oal shipmens in consecuive four-week periods. As a resul, he seasonaliy for he series is 13. The daa, lised in Table 9.5 and displayed in Figure 9.5, are sored in he SCA workspace under he label BEERSHIP. Table 9.5 Beer shipmen daa, four-week oals (Read daa across he line) Figure 9.5 Beer shipmens (four-week oals) Due o he limied number of observaions, Abraham and Ledoler (1983, page 173) could no clearly decide wheher an addiive or muliplicaive model would be more appropriae. For illusraion, all smoohing consans were chosen o be We will do he same here. To compue he forecass, we may ener

341 9.20 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING -->GFORECAST BEERSHIP. METHOD IS MWINTERS. SEASONALITY IS > WEIGHTS ARE 0.05, 0.05, NOFS IS 26. ORIGIN IS 39. WINTERS MULTIPLICATIVE SEASONAL EXPONENTIAL SMOOTHING FOR THE SERIES BEERSHIP SMOOTHING CONSTANTS L STEP AHEAD FORECASTS FOR BEERSHIP FROM THE ORIGIN 39 MSE(AL1,AL2,AL3) =.10791E+07 TIME FORECAST (Observed) The observed values of BEERSHIP have been superimposed on he SCA oupu. 9.6 General Exponenial Smoohing Using Seasonal Indicaors In addiion o Winers mehods, he SCA Sysem provides wo oher general exponenial smoohing mehods for forecasing models of he form Z =µ+β + S + a. (9.27) In he general exponenial smoohing mehod employing seasonal indicaors, he seasonal componen, S, is described by indicaors for each of he s seasonal periods. where S = δ 1I1 +δ 2I2 + +δ s I j 1, if is in he j-h seasonal period = 0, oherwise I s

342 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.21 I is assumed ha δ+δ δ= s 0, as he seasonal componens are defined as disances from he overall linear rend. For a seasonal ime series, we may also be able o represen S by fewer parameers using rigonomeric (harmonic) funcions. Such a mehod is given in Secion Forecass from he model Forecass are compued direcly from equaion (9.27). The parameers of he model are compued using discouned leas squares (see Abraham and Ledoler 1983, Mongomery and Johnson 1976) in which pas observaions are discouned exponenially by he discoun coefficien ω= 1 α. We are required o specify a smoohing consan, α, o use in he calculaions as well as he seasonal period s Relaion o ARIMA models The forecass from general exponenial smoohing wih seasonal indicaors are equivalen o hose from he ARIMA model where ω= 1 α. = θ θ s s s (1 B)(1 B )Z (1 B)(1 B )a, Example: Monhly car sales To illusrae he use of he GFORECAST paragraph o forecas a series using seasonal indicaors, we consider he monhly car sales in Quebec from January 1960 hrough December The daa were used previously in Secion when he Winers addiive mehod was employed. Abraham and Ledoler (1983) found ha he effec of an observaion died ou slowly for his daa. As a resul, we will use 0.05 as our smoohing consan below. We can forecas he series by enering -->GFORECAST CARS. METHOD IS SINDICATOR. SEASONALITY IS > WEIGHT IS ORIGIN IS 96. As in Secion 9.4.3, he forecas origin is n=96. We can hen compare he RMSE for he forecass here wih hose obained previously. We obain GENERAL EXPONENTIAL SMOOTHING FOR THE SERIES LINEAR TREND MODEL WITH SEASONAL INDICATORS SMOOTHING CONSTANT CARS

343 9.22 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING SMOOTHING VECTOR FINV*F(0) E L STEP AHEAD FORECASTS FOR CARS FROM TIME ORIGIN 96 MSE(ALPHA) =.22645E+07 TIME FORECAST (Observed) As before, he observed values are superimposed. The pos-sample RMSE for he forecass is abou (compared o using he Winers addiive mehod). Alhough he reducion in RMSE can be aribued o he greaer number of parameers in he model, we see he value of using seasonal indicaors for his series. 9.7 General Exponenial Smoohing Using Harmonic Funcions General exponenial smoohing using harmonic funcions provides forecass for he model of equaion (9.27); ha is, Z =µ+β + S + a where he seasonal componen, S, is described as a linear combinaion of rigonomeric funcions. If m harmonics are specified, is wrien as S 2π 4π 2πm S = A1sin +φ 1 + A2sin +φ Amsin + φm s s s where Ai and φi are he ampliude and phase shif of he sine funcion wih frequency 2π i/s. For discree ime series, he larges number of harmonics ha can be considered is m = s/2. In

344 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.23 mos applicaions only he firs few harmonics are used, hus represening a more parsimonious represenaion of han he previous one using indicaor funcions. S Forecass from he model Forecass are compued direcly from equaion (9.27). The parameers of he model are compued using discouned leas squares (see Abraham and Ledoler 1983, or Mongomery and Johnson 1976) in which pas observaions are discouned exponenially by he discoun coefficien ω= 1 α. We are required o specify a smoohing consan, α; seasonal period, s; and number of harmonics, m, o be used in he calculaion of forecass Relaion o ARIMA models The forecass derived using general exponenial smoohing wih harmonic funcions are equivalen o cerain ARIMA models. The exac form of he ARIMA model is dependen upon he choice of s and m. For example, for s = 12 and m = 1, he corresponding ARIMA model is given by 2 2 (1 B) (1 3B + B ) Z (1 B) (1 3B θ B )a wih 1 = θ θ + θ= α Example: Monhly car sales To illusrae he use of he GFORECAST paragraph o forecas a series using harmonic funcions, we again consider he car sales daa (used previously in Secions and 9.6.3). As in Secion 9.6.3, we will use 0.05 as he smoohing consan and forecas from n=96. To forecas he series we may ener -->GFORECAST CARS. METHOD IS HARMONIC. SEASONALITY IS 12, 3. --> WEIGHT IS ORIGIN IS 96. The command above is almos idenical o ha used in Secion 9.6.3, bu wih HARMONIC replacing SINDICATOR. The only subsanive change is he inclusion of a second value in he SEASONALITY senence. The addiional value, 3, indicaes he number of harmonic funcions o use. I is a required value. The choice of m=3 here will resul in he use of 6 parameers in he seasonal componen (compared o 12 in Secion 9.6.3). I may be insrucive o observe he effec on RMSE. I should be higher han before; bu i will be ineresing o observe he amoun of increase, if any. We obain he following GENERAL EXPONENTIAL SMOOTHING FOR THE SERIES LINEAR TREND MODEL WITH 3 ADDED HARMONICS SMOOTHING CONSTANT CARS SMOOTHING VECTOR FINF*F(0).8591E E E E E E E E-01

345 9.24 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING L STEP AHEAD FORECASTS FOR CARS FROM TIME ORIGIN 96 MSE(ALPHA) =.23928E+07 TIME FORECAST (Observed) As in he preceding examples, he observed values are superimposed on he SCA oupu. The pos-sample RMSE for he forecass is abou The value falls beween he RMSE for he forecass using seasonal indicaors (1555.6) and ha using he Winers addiive mehod (2005.8), as was expeced. 9.8 Forecasing Using Exponenial Smoohing Mehods in Comparison o ARIMA Modeling Since forecasing using exponenial smoohing mehods is equivalen o forecasing using cerain corresponding ARIMA models (see Abraham and Ledoler 1983, 1986), here is a quesion of when o employ he GFORECAST paragraph. There are several reasons o employ ARIMA analysis raher han exponenial smoohing mehods: (1) Selecion of a paricular exponenial smoohing mehod is equivalen o he idenificaion of an ARIMA model for a ime series. However, here are only a limied number of exponenial smoohing mehods and he selecion of such mehods is usually based on a visual inspecion of he ime series. ARIMA modeling provides more reliable ools in he idenificaion of appropriae models. (2) Smoohing consans in exponenial smoohing mehods are usually chosen arbirarily, while he parameers in ARIMA models can be esimaed wih known saisical properies.

346 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.25 (3) Exponenial smoohing forecass lead o minimum mean square error forecass provided an ARIMA process corresponds o he smoohing mehod being used. (4) I is difficul o compue sandard errors of muli-sep-ahead forecass using exponenial smoohing mehods. However, here may be several reasons why exponenial smoohing mehods may be considered: (1) The ime series o be forecas could be very shor, hence parameer esimaes from ARIMA models may no be reliable. (2) The ime series may have many ouliers or inervenions ha will require considerable effor o accoun for heir presence in ARIMA modeling. (Such modeling effors are reduced grealy by using he OESTIM and OFORECAST paragraphs, see Chaper 7.) Smoohing mehods may be more robus o ouliers since he smoohing consan(s) are pre-specified, raher han esimaed based on ime series daa. (3) A forecaser may be proficien enough o adequaely choose a smoohing mehod by visual inspecion of a ime series, or here may be hisorical evidence o suppor use of a paricular smoohing mehod. The GFORECAST paragraph is provided in he SCA Sysem in order o provide a more compleeness of forecasing mehods.

347 9.26 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING SUMMARY OF THE SCA PARAGRAPH IN CHAPTER 9 This secion provides a summary of he SCA paragraph employed in his chaper. The synax is presened in boh a brief and full form. The brief display of he synax conains he mos frequenly used senences of he paragraph, while he full display presens all possible modifying senences of he paragraph. In addiion, special remarks relaed o he paragraph may also be presened wih he descripion. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. In his secion, we provide a summary of he GFORECAST paragraph. Legend (see Chaper 2 for furher explanaion) v i r w : variable name : ineger : real value : keyword GFORECAST Paragraph The GFORECAST paragraph is used o compue forecass of a ime series using one of he general exponenial smoohing mehods discussed in Secions 9.1 hrough 9.7. Alhough here is only one paragraph, he synax presened below is divided for he forecas of nonseasonal and seasonal ime series, and includes all of he mehods discussed above.

348 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.27 Synax for he GFORECAST Paragraph (1) For non-seasonal ime series (using simple or double exponenial smooh, or Hol s mehod) Brief synax GFORECAST VARIABLES ARE v1, v2, ---. METHOD IS w. WEIGHTS ARE r1, r2. NOFS ARE i1, i2, ---. Required senences: VARIABLES, METHOD and WEIGHTS Full synax GFORECAST VARIABLES ARE v1, v2, ---. METHOD IS w. WEIGHTS ARE r1, r2. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. START IS i. OUTPUT IS PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senences: VARIABLES, METHOD and WEIGHTS (2) For seasonal ime series (using Winers mehods, seasonal indicaors or harmonic funcions) Brief synax GFORECAST VARIABLES ARE v1, v2, ---. METHOD IS w. WEIGHTS ARE r1, r2, r3. SEASONALITY IS i1, i2. NOFS ARE i1, i2, ---. Required senences: VARIABLES, METHOD, WEIGHTS and SEASONALITY

349 9.28 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING Full synax GFORECAST VARIABLES ARE v1, v2, ---. METHOD IS w. WEIGHTS ARE r1, r2, r3. SEASONALITY IS i1, i2. NOFS ARE i1, i2, ---. ORIGINS ARE i1, i2, ---. START IS i. OUTPUT IS PRINT(w1, w2, ---), NOPRINT(w1, w2, ---). Required senences: VARIABLES, METHOD, WEIGHTS and SEASONALITY Senences Used in he GFORECAST Paragraph VARIABLES senence The VARIABLES senence is used o specify he ime series o be forecased. One or more han one ime series can be specified. All series specified will be forecased using he same mehod. This is a required senence. METHOD senence The METHOD senence is used o specify he exponenial smoohing mehod o be employed in forecasing. The valid keywords are: SIMPLE DOUBLE HOLT AWINTERS MWINTERS SINDICATOR HARMONIC : simple (single) exponenial smoohing mehod : double exponenial smoohing mehod : Hol's wo parameer mehod : Winers' addiive mehod : Winers' muliplicaive mehod : smoohing using seasonal indicaors : smoohing using harmonic funcions Only one mehod may be specified. This is a required senence. WEIGHT senence The WEIGHT senence is used o specify values for he smoohing consan(s) for each mehod. The number of smoohing consans required is 1 for he mehods SIMPLE, DOUBLE, SINDICATOR and HARMONIC, 2 for he HOLT mehod, and 3 for he AWINTERS and MWINTERS mehods. This is a required senence. SEASONALITY senence The SEASONALITY senence is used o specify he seasonal period, i1, for he ime series o be forecased. This senence is required only if he mehod AWINTERS, MWINTERS, SINDICATOR, or HARMONIC is used. When he HARMONIC mehod

350 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING 9.29 is used, he value i2 is required o specify he number of harmonic funcions, m, o be used in forecasing (see Secion 9.7). NOFS senence The NOFS senence is used o specify he number of forecass o be generaed from each ime origin. The number of argumens in his senence mus be he same as ha in he ORIGINS senence. The defaul is 24 forecass from each ime origin. ORIGINS senence The ORIGINS senence is used o specify he ime origins for forecass. The defaul is a single origin, he las observaion. START senence The START senence is used o specify he number of observaions used o deermine he iniial values for forecas compuaion (see Secions and 9.2.2). The Sysem provides an appropriae value and he user does no need o specify his senence. OUTPUT senence The OUTPUT senence is used o conrol he amoun of oupu prined for compued saisics. Conrol is achieved by increasing or decreasing he basic level of oupu by use of PRINT or NOPRINT, respecively. The keyword for PRINT and NOPRINT is: ESTIMATES: FORECASTS: esimaes for cerain values in compuing forecass forecas values for each ime origin The defaul condiion is PRINT(FORECASTS). ACKNOWLEDGEMENT Scienific Compuing Associaes graefully appreciaes he assisance of Professors Bovas Abraham and Johannes Ledoler in he developmen of he GFORECAST paragraph.

351 9.30 FORECASTING USING GENERAL EXPONENTIAL SMOOTHING REFERENCES Abraham, B. and Ledoler, J. (1983). Saisical Mehods for Forecasing. New York: Wiley. Abraham, B. and Ledoler, J. (1986). Forecas Funcions Implied by Auoregressive Moving Average Models and Oher Relaed Forecas Procedures. Inernaional Saisical Review 54: Bowerman, B.L. and O'Connell, R.T. (1987). Time Series Forecasing: Unified Conceps and Compuer Implemenaion, 2nd ediion. Norh Sciuae, MA: Duxbury. Box, G.E.P., and Jenkins, G.M. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden Day. (Revised ediion published 1976). Brown, R.G. (1962). Smoohing, Forecasing and Predicion of Discree Time Series. Englewood Cliffs, NJ: Prenice-Hall. Brown, R.G. and Meyer, R.F. (1961). The Fundamenal Theorem of Exponenial Smoohing. Operaions Research 9: Harvey, A.C. (1984). A Uniied View of Saisical Forecasing Procedures. Journal of Forecasing 3: Hol, C.C. (1957). Forecasing Trends and Seasonals by Exponenially Weighed Moving Averages. O.N.R. Memorandum, No.52, Carnegie Insiue of Technology. Makridakis, S. and Wheelwrigh, S. (1978). Ineracive Forecasing. San Francisco: Holden Day. Makridakis, S., Wheelwrigh, S., and McVee, V. (1986). Forecasing Mehods and Applicaions, 2nd ediion. New York: Wiley. Mongomery, D.C. and Johnson, L.A. (1976). Forecasing and Time Series Analysis. New York: McGraw-Hill. Muh, J.F. (1960). Opimal Properies of Exponenially Weighed Forecass. Journal of he American Saisical Associaion 55: Winers, P.R. (1960). Forecasing Sales by Exponenially Weighed Moving Averages. Managemen Science 6:

352 APPENDIX A ANALYTIC FUNCTIONS AND MATRIX OPERATIONS The SCA Sysem provides a wide array of analyic funcions and marix operaions o augmen is saisical capabiliies. This appendix provides basic informaion regarding hese analyic capabiliies. More complee informaion can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. A.1 Basic Operaions The SCA Sysem reas a variable in is workspace as a marix. For example, a scalar variable is sored as a 1x1 marix, and a vecor variable is sored as a nx1 marix. By soring daa in his manner, analyic operaions can be compued more efficienly. To illusrae he use of some basic mahemaical operaions in he SCA Sysem, suppose he following vecors are sored in he SCA workspace XDATA = 200 YDATA 50 ZDATA = 8 = If we wish o add XDATA and YDATA ogeher, soring he resuls in NEWDATA, we simply ener -->NEWDATA = XDATA + YDATA NEWDATA now conains he resuls. The SCA Sysem will no display he resul auomaically. However, we can prin he conens of NEWDATA by enering -->PRINT NEWDATA We also have access o common mahemaic funcions. For example -->CDATA = LN(YDATA) -->SDATA = SQRT(ZDATA) sores he naural logarihm of each elemen of YDATA and he square roo of each elemen of ZDATA in CDATA and SDATA, respecively. We are no limied o he number of operaions used in an assignmen saemen. For example, suppose we ener -->RESULT = ZDATA * SQRT(YDATA) - (LN(XDATA) + 2 )

353 A.2 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS For corresponding elemens in XDATA, YDATA and ZDATA, we will ake he naural logarihm of XDATA and add he value 2. This quaniy is subraced from he produc of ZDATA and he square roo of YDATA. The SCA Sysem will follow he usual order of mahemaical operaions for an expression. The following order is observed 1s Evaluaion of a funcion 2nd Exponeniaion (**) 3rd Muliplicaion or division 4h Addiion or subracion The above hierarchy is firs applied o all parenheical expressions. The order is applied again using resulan values, if any, as operaions are read in a lef o righ fashion. A.2 Trigonomeric and Hyperbolic Funcions We have access o he following rigonomeric and hyperbolic funcions: sin, cos, an (and heir inverses), sinh, cosh, and anh. We need o keep in mind ha he argumens of sin, cos, an, sinh, cosh, and anh are in radians and resuls of he inverses of sin, cos, and an will be in radians. For his reason, i is useful o know how o obain π and he conversion facor beween radians and degrees wihin he SCA Sysem. 1 π= 2*ACOS(0) (i.e., 2cos (0)) o π 1 = radians = [ACOS(0) / 90]radians radian = [90 / ACOS(0)] degrees A.3 Saisical and Probabiliy Disribuion Funcions The SCA Sysem provides a wide array of commonly used saisical funcions and probabiliy disribuion funcions. The disribuion funcions include he cumulaive disribuion (and inverse disribuion) of he sandard normal, suden s, χ2, F and Bea disribuions. Saisical Funcions To illusrae some saisical funcions, suppose he variable X1 consiss of he following 17 values 16, 22, 21, 20, 23, 21, 19, 15, 13, 23, 17, 20, 29, 18, 22, 16, 25 We can compue and reain he sample mean, median and he geomeric mean of X1 by enering

354 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS A.3 -->X1MEAN = MEAN (X1) -->X1MEDIAN = MEDN (X1) -->X1GEOM = GMEN (X1) We can display hese values by enering -->PRINT X1MEAN, X1MEDIAN, X1GEOM X1MEAN IS A 1 BY 1 VARIABLE X1MEDIAN IS A 1 BY 1 VARIABLE X1GEOM IS A 1 BY 1 VARIABLE VARIABLE X1MEAN X1MEDIAN X1GEOM COLUMN--> ROW In similar fashion we can calculae and reain he variance or sandard deviaion of he daa. Descripive saisics can also be obained hrough he DESCRIBE paragraph (see Chaper 4 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis). Probabiliy Disribuion Funcions (CDF) We can quickly deermine he cumulaive disribuion of a value following a sandard normal,, χ2, F, or Bea disribuion. For example, he CDF for a value of 1.57 of a - disribuion wih 16 degrees of freedom can be compued (and sored in he variable CVALUE) by enering -->CVALUE = CDFT (1.57, 16) Similarly, we can obain values of criical levels from hese disribuions using he inverse cumulaive disribuion funcion. For example, he z-value used for a 90% confidence inerval for a sandard normal disribuion is We can confirm his by compuing he inverse CDF of he sandard normal for he value.95. We can obain his by enering -->ZSCORE = IDFN(.95) A.4 Marix Operaions To illusrae some of he available marix operaions in he SCA Sysem, we will assume he following marices are in he SCA workspace ADATA = 3 1 BDATA =

355 A.4 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS We can perform marix muliplicaion using he symbol #. (Noe ha elemen by elemen muliplicaion occurs if we use he symbol *.) For example, if we ener -->C1DATA = BDATA # ADATA <resul> hen C1DATA conains he above marix produc. (Noe: To display C1DATA we need o employ he PRINT paragraph. We have insered he values of he resulan marix above for reference only. We shall coninue o do his hroughou his appendix.) The marix produc ADATA # BDATA has no sense, since he marices are no conformable. However, he ranspose of ADATA is conformable wih BDATA, and we can compue his marix produc by enering -->C2DATA = T(ADATA)#BDATA <resul> We may also compue he Kronecker produc of ADATA and BDATA, he race of BDATA and he Cholesky decomposiion of BDATA, among oher operaions. We can compue he deerminan, inverse, and adjoin marix of BDATA by enering -->DETB = DET(BDATA) <resul> [5] -->BINVERSE = INV(BDATA) <resul>

356 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS A.5 -->ADJOINTB = DETB * BINVERSE <resul> Eigenvalues We can compue he eigenvalues and eigenvecors of any real marix. For example, suppose we have he following marix in he SCA workspace EDATA = We can compue is eigenvalues and eigenvecors by enering -->EIGEN EDATA. VALUES IN EVAL. VECTORS IN EVEC. EIGENVALUES FOR THE MATRIX EDATA EIGENVECTORS FOR THE MATRIX EDATA E The VALUES and VECTORS senences were specified so ha he compued eigenvalues and corresponding marix of eigenvecors would be mainained in he SCA workspace (under he labels EVAL and EVEC, respecively). A.5 Summary of Analyic Funcions and Synax for he EIGEN Paragraph Lised below is a brief lis of he analyic capabiliies in he SCA Sysem. More complee informaion is available in Chaper 4 of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. ABS(A) -- absolue value of each elemen in variable A AND -- A AND B; logical operaor on binary scalars ACOS(A) -- inverse cosine of each elemen in variable A ASIN(A) -- inverse sine of each elemen in variable A ATAN(A) -- inverse angen of each elemen in variable A

357 A.6 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS CDFB(X,A,B)-- cumulaive disribuion funcion of bea disribuion wih scale parameers A and B; 0 X 1 CDFC(X,N) -- cumulaive disribuion funcion of chi-square disribuion wih N degrees of freedom; X posiive CDFF(X,M,N) -- cumulaive disribuion funcion of F-disribuion wih M and N d.f.; X posiive CDFN(X) -- sandard normal cumulaive disribuion funcion CDFT(X,N) -- cumulaive disribuion funcion of Suden's -disribuion wih N degrees of freedom CDP(A,B) -- column direc produc of marices A and B CHOL(A) -- Cholesky decomposiion of marix A COS(A) -- cosine of each elemen in variable A COSH(A) -- hyperbolic cosine of elemens in variable A DET(A) -- deerminan of marix A EQ -- A EQ B; logical comparison over all elemens EIGEN -- see he EIGEN paragraph EXP(A) -- exponenial funcion applied o elemens in A FACT(A) -- facorial value for each elemen in A GAMA(A) -- gamma funcion applied o elemens in A GE -- A GE B; logical comparison over all elemens GMEN(A) -- geomeric mean of he elemens in variable A GT -- A GT B; logical comparison over all elemens IDFB(X,A,B) -- inverse disribuion funcion of bea disribuion wih scale parameers A andb; 0 X 1 IDFC(X,N) -- inverse disribuion funcion of chi-square disribuion wih N d.f.; 0 X 1 IDFF(X,M,N) -- inverse disribuion funcion of F-disribuion wih M and N d.f.; IDFN(X) -- 0 X 1 inverse disribuion funcion of sandard normal disribuion (also known as he PROBIT funcion); 0 X 1 IDFT(X,N) -- inverse disribuion funcion of -disribuion wih N d.f.; 0 X 1 INT(A) -- larges ineger value of each elemen of A INV(A) -- inverse of marix A KP(A,B) -- Kroneker produc of marices A and B LE -- A LE B; logical comparison over all elemens LN(A) -- naural logarihm of each elemen in A LOG(A) -- base 10 logarihm of each elemen in A LT -- A LT B; logical comparison over all elemens MAX(A) -- maximum value of he elemens in A MEAN(A) -- arihmeic mean of he elemens of A MEDN(A) -- median value of he elemens of A MIN(A) -- minimum value of he elemens in A MMAX(A,B) -- elemen by elemen maximum value in A and B MMIN(A,B) -- elemen by elemen minimum value in A and B MOD(A,B) -- modular arihmeic; A(i,j)(modula B(i,j)) NCOL(A) -- number of columns in marix A

358 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS A.7 NE -- A NE B; logical comparison over all elemens NMIS(A) -- number of missing values in A NOT -- NOT A; logical operaor on binary scalars NROW(A) -- number of rows in marix A OR -- A OR B; logical operaor on binary scalars PACK(A) -- append columns of marix A ino a single column vecor RDP(A,B) -- row direc produc of marices A and B SIGN(A,B) -- ransfer of he sign of an elemen of B o he absolue value of he corresponding elemen of A SIN(A) -- sine of each elemen in A SINH(A) -- hyperbolic sine of each elemen in A SQRT(A) -- square roo of each elemen in A STD(A) -- sample sandard deviaion of elemens of A STD1(A) -- unbiased sample s. dev. of elemens of A SUM(A) -- arihmeic sum of all elemens in A T(A) -- ranspose of he marix A TAN(A) -- angen of each elemen in A TANH(A) -- hyperbolic angen of each elemen in A TR(A) -- race of he marix A VAR(A) -- sample variance of he elemens of A VAR1(A) -- unbiased sample variance of he elemens of A + -- A + B; elemen by elemen addiion - -- A - B; elemen by elemen subracion * -- A * B; elemen by elemen muliplicaion / -- A / B; elemen by elemen division ** -- A**B ; elemen by elemen exponeniaion # -- A # B; marix muliplicaion Synax for he EIGEN Paragraph The EIGEN paragraph is used o compue and display he eigenvalues and eigenvecors of any real marix. The EIGEN paragraph begins wih he paragraph name, EIGEN, and may be followed by various modifying senences. Senences ha may be used as modifiers for his paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no lised as required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line, excep he las line, mus be he coninuaion characer,. Legend (see Chaper 2 for furher explanaion): v : variable name w : keyword

359 A.8 ANALYTIC FUNCTIONS AND MATRIX OPERATIONS EIGEN MATRIX IS v. VALUES IN v. VECTORS IN v. ORDER IS w. Required senence: MATRIX Senences Used in he EIGEN Paragraph MATRIX senence The MATRIX senence is used o specify he name of he marix for which eigenvalues and eigenvecors will be compued. VALUES senence The VALUES senence is used o specify he name of he variable o sore he compued eigenvalues of he marix. VECTORS senence The VECTORS senence is used o specify he name of he variable o sore he compued eigenvecors of he marix. Eigenvecors are sored columnwise; ha is, he firs column corresponds o he firs eigenvalue, and so on. ORDER senence The ORDER senence is used o specify he order ha he eigenvalues and heir corresponding eigenvecors will be sored. The keyword may be DESCENDING or ASCENDING. The defaul is DESCENDING.

360 APPENDIX B DATA GENERATION, EDITING AND MANIPULATION The SCA Sysem provides several capabiliies o generae, edi and manipulae daa sored in he SCA workspace. This appendix provides seleced informaion on capabiliies o generae and edi daa ha are no necessarily of a ime series. More complee informaion can be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. Feaures discussed in his appendix, and he secion conaining hem, are: Secion Feaure(s) B.1 Generaion of a vecor or marix variable B.2 Modificaion of he exising values of a variable B.3 Manipulaion of variables Appendix C provides informaion on he generaion and ediing of ime series daa. B.1 Generaing Daa: he GENERATE Paragraph We can use he GENERATE paragraph o creae daa, eiher by direc value specificaion or following one of wo paerns, and sore he daa wihin a vecor or marix. We will illusrae he use of he paragraph wih some examples. B.1.1 Generaing a vecor We will now creae four variables, each sored in he SCA workspace as a 10x1 (column) vecor of daa. Variables creaed illusrae he various manners ha daa can be creaed. Firs, we will generae and prin he daa. Aferwards, we will explain wha has been creaed. -->GENERATE VECTOR1. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 5. THE SINGLE PRECISION VARIABLE VECTOR IS GENERATED -->GENERATE VECTOR2. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 2, 0 FOR 3. THE SINGLE PRECISION VARIABLE VECTOR2 IS GENERATED -->GENERATE VECTOR3. NROW ARE 10. PATTERN IS STEP (1.0, 0.5). THE SINGLE PRECISION VARIABLE VECTOR3 IS GENERATED -->GENERATE VECTOR4. NROW ARE 10. PATTERN IS RATE (1.0, 2.0). THE SINGLE PRECISION VARIABLE VECTOR4 IS GENERATED

361 B.2 DATA GENERATION, EDITING AND MANIPULATION -->PRINT VECTOR1, VECTOR2, VECTOR3, VECTOR4 VARIABLE VECTOR1 VECTOR2 VECTOR3 VECTOR4 COLUMN--> ROW In each use of he GENERATE paragraph, we specified he number of rows of daa (NROW) o be creaed as 10. The defaul number of rows and columns o creae is 1. Hence, unless we are creaing a scalar, we need o specify he number of rows or/and columns in our variable. In he above example, we direcly enered he values ha comprise VECTOR1 and VECTOR2. In VECTOR1, he VALUES of he firs 5 poins are se o 0 and he nex 5 are se o 1. In VECTOR2, he firs 5 poins are se o 0, he nex 2 are se o 1, and he remaining 3 are se o 0. A PATTERN is used o generae he daa in boh VECTOR3 and VECTOR4. VECTOR3 follows a STEP funcion. Is firs value is 1.0, and each successive value is 0.5 more han he las value. Tha is, for STEP (a, b) our daa are described as Xi = a + (i 1)b, i = 1,2,... The daa in VECTOR4 follows a geomeric paern. The iniial value is 1.0 and successive values are 2.0 imes he previous value. Thus, when we specify he geomeric RATE (a,b), our daa follow he paern = = i 1 Xi a *b, i 1,2,... Use of analyic funcions We can use he GENERATE paragraph in conjuncion wih analyic funcions or ediing capabiliies of he SCA Sysem (see Appendix A, laer secions of his Appendix, and The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies) o creae variables wih more inricae srucure. For example, we could have also creaed VECTOR2 above by firs generaing a vecor of zeros by enering -->GENERATE VECTOR2. NROW ARE 10. VALUES ARE 0 FOR 10. THE SINGLE PRECISION VARIABLE VECTOR2 IS GENERATED

362 DATA GENERATION, EDITING AND MANIPULATION B.3 Then we could recode he 6h and 7h observaions as 1 using he simple assignmens -->VECTOR2(6) = >VECTOR2(7) = 1.0 As a more inricae illusraion, suppose we are o sudy 15 years of quarerly sales daa of a corporaion. The end of he fiscal year is June, and some of he sale aciviy in he second quarer are relaed o end of year quoas or bonuses. We inend o isolae he second quarer by including an indicaor variable ha is 1 for a second quarer and 0 oherwise. We can use he GENERATE paragraph and row direc produc (RDP) analyic funcion for his purpose. Firs we will generae wo vecors, one will describe he yearly paern of he indicaor (i.e., 0, 1, 0, 0). The second vecor represens he number of imes his paern should be applied. We can ener -->GENERATE VECTOR5. NROW ARE 4. VALUES ARE 0, 1, 0, 0 THE SINGLE PRECISION VARIABLE VECTOR5 IS GENERATED -->GENERATE VECTOR6 NROW ARE 15. VALUES ARE 1 FOR 15. THE SINGLE PRECISION VARIABLE VECTOR6 IS GENERATED We now compue he row direc produc (see Appendix A and The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies) o creae our desired indicaor variable. We will call his variable INDC1. -->INDC1 = RDP(VECTOR5, VECTOR6) We have creaed a variable wih 60 values, all are 0 excep for he 2nd, 6h, 10h, and so on. These values are all 1. We can see his by prining INDC1. -->PRINT INDC1. FORMAT IS 8F10.2. INDC1 IS A 60 BY 1 VARIABLE B.1.2 Generaing a marix We can also use he GENERATE paragraph o creae marices. In such cases, we mus include informaion regarding he number of rows and columns of he marix (NROW and NCOL, respecively) and he manner in which we wan daa sored. For example, we can creae a 4 x 4 ideniy marix by enering

363 B.4 DATA GENERATION, EDITING AND MANIPULATION -->GENERATE MATRIX1. NROW ARE 4. NCOL ARE 4. --> VALUES ARE 1 FOR 4. ORDER IS DIAGONAL. THE SINGLE PRECISION VARIABLE MATRIX1 IS GENERATED We have specified ha he ORDER o sore daa is along he DIAGONAL. In his manner, he four values specified are enered sequenially along he diagonal of he marix. All off diagonal elemens are se o zero. If no ORDER is specified, values are sored column by column. Tha is, daa is enered in he firs column from op o boom, hen he second column, hird column, and so on. Hence if we ener -->GENERATE MATRIX2. NROW ARE 4. NCOL ARE 4. PATTERN IS STEP(1.0, 2.0) we creae he following marix We can also choose o have daa sored row by row, symmerically or skew symmerically. In symmeric sorage, daa are sored row by row in he lower riangle of he marix and values of he upper riangle are se equal o heir corresponding lower riangular enry. Skew symmeric sorage is similar, excep he values of he upper riangle are se equal o he negaive of heir corresponding lower riangular enry. We illusrae his ype of daa sorage in he nex secion. Use of analyic funcions Analyic funcions (see Appendix A) can be used in conjuncion wih he GENERATE paragraph o creae marices of more complicaed srucure. For example, earlier we creaed an indicaor variable corresponding o he second quarer of each year in a fifeen year period. Now we will consruc a four-column marix whose columns consis of he indicaors for he firs, second, hird and fourh quarers of a year for he same fifeen year period. To accomplish his we will use he 4 x 4 ideniy marix generaed earlier and sored as MATRIX1. Each of is columns represens an indicaor associaed wih a quarer of a given year. We also need a marix equivalen o he number of imes his periodic paern should appear. We can hen use he RDP funcion as before o creae he desired marix. -->GENERATE MATRIX3. NROW ARE 15. NCOL ARE 4. VALUES ARE 1 FOR 60. THE SINGLE PRECISION VARIABLE MATRIX3 IS GENERATED -->INDC2 = RDP(MATRIX1, MATRIX3) We will prin he firs 11 rows of he resulan marix, INDC2, o observe he paern we have creaed.

364 DATA GENERATION, EDITING AND MANIPULATION B.5 -->PRINT INDC2. SPAN IS 1, 11. INDC2 IS A 60 BY 4 VARIABLE VARIABLE INDC2 INDC2 INDC2 INDC2 COLUMN--> ROW To illusrae skew symmeric sorage and analyic operaions, we now creae a 4x4 marix whose lower ridiagonal and diagonal elemens are 1 and whose upper ridiagonal elemens are 0. -->GENERATE MATRIX4. NROW ARE 4. NCOL ARE 4. VALUES ARE 1 FOR 16. THE SINGLE PRECISION VARIABLE MATRIX4 IS GENERATED -->GENERATE MATRIX5. NROW ARE 4. NCOL ARE 4. --> PATTERN IS STEP (1.0, 0.0). ORDER IS SKEWSYMMETRIC. THE SINGLE PRECISION VARIABLE MATRIX5 IS GENERATED -->MATRIX6 = (MATRIX4 + MATRIX5)/2 + MATRIX1 MATRIX4 is a 4x4 marix of 1 s. MATRIX5 is a 4x4 marix whose lower riangular elemens are 1 s and whose oher elemens (including he diagonal) are 1 s. Adding hese marices ogeher zeroes ou he upper riangle and he diagonal. All values in he resulan lower riangular marix (excluding he diagonal) are 2. If we divide his resul by 2 and add he ideniy marix (MATRIX1) we obain our desired marix. We can observe MATRIX5 and he resulan MATRIX6 by enering -->PRINT MATRIX5, MATRIX6. FORMAT IS (4F8.1,2X,4F8.1) MATRIX5 IS A 4 BY 4 VARIABLE MATRIX6 IS A 4 BY 4 VARIABLE VARIABLE MATRIX5 MATRIX5 MATRIX5 MATRIX5 MATRIX6 MATRIX6 MATRIX6 MATRIX6 COLUMN--> ROW

365 B.6 DATA GENERATION, EDITING AND MANIPULATION B.2 Modificaion of Daa in a Variable To illusrae he modificaion of daa in a variable in he SCA workspace, we will suppose he daa lised in he able below represen he percen concenraion of a cerain chemical in he yield of some process. The daa are sored in he SCA workspace under he label CONC. The value is used o denoe a missing value. Percen concenraion of chemical in a process yield (Read daa across a line) Use of analyic saemens The value of he 9h observaion, , sands ou. I may be his is a simple enry error ha mus be correced. If he value should be 26.10, we can quickly change i by enering -->CONC(9) = We can do he same wih daa sored in marix form, all we need o do is o indicae he (i,j) posiion. Analyic saemens are also convenien for scaling daa. For example, suppose he independen variables of a regression are X1DATA and X2DATA, wih he values of X1DATA beween 1,000,000 and 5,000,000 and he values of X2DATA beween 10 and 25. For compuaional purposes, i is useful o have hese wo variables around he same scale. We can scale X1DATA by enering -->X1DATA = X1DATA/ If we also wan he daa in our second variable o represen a percenage relaive o he firs erm, we can ener -->X2DATA = X2DATA/X2DATA(1) * 100 Recoding ranges of values For he daa of CONC, suppose we know ha he minimum percen of concenraion in he yield is 23 and he maximum is 30. Values ouside hese limis are due o measuremen

366 DATA GENERATION, EDITING AND MANIPULATION B.7 errors, and i is imporan ha he limis no be exceeded wihin our analysis. If we are using regression (see Chaper 4), we know ha missing enries are excluded auomaically, provided he inernal missing value code is used for hese values. Hence, we wan o do he following: Recode all values over 30.0 o 30.0, Recode all values under 23.0 o 23.0, and Assign he inernal missing value code o any value ha is presenly We can accomplish his direcly using he RECODE paragraph. If we ener -->RECODE CONC. NEW IS CONC2. VALUES ARE (0.0, 23.0, 23.0), (30.0, 100.0, 30.0), (-1.0, -1.0, MISSING). hen all daa wihin he range 0.0 o 23.0 is recoded o 23.0; all daa wihin he range 30.0 o is recoded o 30.0; and he value -1.0 is recoded o he inernal missing value code. The alered daa are sored in he new variable CONC2. If no NEW variable is specified, hen he daa are sored in he original variable, CONC. B.3 Manipulaion of Variables To illusrae some of he capabiliies o manipulae daa wihin SCA, we will suppose he following variables are in he SCA workspace: A1DATA C1 C2 C A1DATA is sored as a 10x2 marix, while C1, C2, and C3 are each vecors of daa.

367 B.8 DATA GENERATION, EDITING AND MANIPULATION Selecing and omiing cases We can selec or omi cases of one or more variables according o eiher is index or is value. For example, suppose we only wish o work wih he firs 8 cases of C1, C2 and C3. We can ener eiher or -->SELECT C1,C2,C3. NEW ARE D1,D2,D3. SPAN IS (1,8). -->OMIT C1,C2,C3. SPAN IS (9,10). for his purpose. In he SELECT paragraph, daa are sored in he new variables D1, D2, and D3. In he OMIT paragraph, daa are sored in he original variables since no NEW variables are specified. We can also selec or omi cases based on he values assumed by he variable. For example, suppose we only wan o use he daa in C1 wih values under 9.0, and he corresponding enries of C2 and C3. We can accomplish his by enering -->SELECT C1, C2, C3. VALUES ARE (0.0, 8.9) Here, all rows, excep he 2nd and 3rd, are reained for all variables. We can specify more han one range of indices or values. For example, suppose we wish o omi all values over 7.0 and under 4.0 from C3 (and accompanying cases in C1 and C2). If we ener -->OMIT C3, C1, C2. VALUES ARE (7.0, 100.0), (0.0, 4.0). hen C1, C2, and C3 will consis of he following C1 C2 C The five rows of C1, C2 and C3 in which C3 had values eiher over 7.0 or under 4.0 have been removed. We may observe ha C1 and C2 sill conain values in he excluded ranges. These values have no been deleed since he SELECT and OMIT paragraphs only apply he selecion (or deleion) crieria o he firs column of he firs variable specified. Corresponding enries from all oher specified variables are hen eiher seleced or omied. If we wan he values of C1 and C2 o be wihin designaed ranges, we need o sequenially apply he OMIT or SELECT commands o he variables wih C1, hen C2, as he firs variable.

368 DATA GENERATION, EDITING AND MANIPULATION B.9 Appending daa C1, C2, and C3 are each 10x1 vecors. We can creae one 30 x 1 vecor by appending C2 o he end of C1 and C3 o he end of his resul by enering -->JOIN C1,C2,C3. NEW IS D1. The resulan vecor is sored in D1. If no NEW variable is specified, hen he resulan vecor is sored in he firs variable specified. We can also append marices ogeher, provided he number of columns of all marices is he same. We canno append vecors o he end of marices. As an illusraion, suppose we wan o append C1 o he firs column of A1DATA and C2 o he second column of A1DATA. We mus firs creae a marix consising of columns C1 and C2. We can creae his marix, say CMAT, by enering -->AUGMENT C1, C2. NEW IS CMAT. We can now append CMAT o A1DATA by enering -->JOIN A1DATA, CMAT A1DATA will be changed o a 20x2 marix.

369 B.10 DATA GENERATION, EDITING AND MANIPULATION SUMMARY OF THE SCA PARAGRAPHS IN APPENDIX B This secion provides a summary of hose SCA paragraphs employed in his appendix. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are GENERATE, RECODE, OMIT, SELECT, JOIN, and AUGMENT. Legend (see Chaper 2 for furher explanaion) v i r w(.) : variable name : ineger : real value : keyword (wih argumen) GENERATE Paragraph The GENERATE paragraph can be used o creae values of a new variable according o user specified condiions. A se of daa may be generaed in one of wo ways. One echnique is o specify compleely every value of he se. Daa may also be creaed according o a paern ha increases from a specified iniial value according o a user specified sep size, or rae. The wo mehods (VALUES and PATTERN) are muually exclusive and hey may no boh be specified in he same paragraph. The generaed values are hen sored ino a variable in a user specified order.

370 DATA GENERATION, EDITING AND MANIPULATION B.11 Synax for he GENERATE Paragraph GENERATE VARIABLE IS v. NROW IS i. NCOL IS i. ORDER IS w. VALUES ARE r1, r2, ---. or PATTERN IS w1(r1,r2), w2(r1,r2). Required senences: VARIABLE, and eiher VALUES or PATTERN Senences Used in he GENERATE Paragraph VARIABLE senence The VARIABLE senence is used o specify he name of he vecor or marix o sore values ha are generaed. NROW senence The NROW senence is used o specify he number of rows of values for he variable o be generaed. The defaul is 1. NCOL senence The NCOL senence is used o specify he number of columns of values for he variable o be generaed. The defaul is 1. ORDER senence The ORDER senence is used o specify he order for placing he generaed values in a marix. Keywords available are: COLUMNWISE -- values are sored in column 1 firs, hen column 2, ec. (This is he defaul) ROWWISE -- values are sored in row 1 firs, hen row 2, ec. DIAGONAL -- values are sored in he diagonal elemens of he marix, all offdiagonal elemens are se o zero. The marix mus be square. Tha is, he value specified in he NROW senence mus be he same as ha specified in he NCOL senence. SYMMETRIC -- values are sored in he lower riangular par of he marix, row by row. Values in he upper riangular par are se equal o he corresponding lower riangular elemens. SKEWSYMMETRIC -- values are sored in he lower riangular par of he marix row by row. Values in he upper riangular par are se equal o he negaive of he corresponding lower riangular elemens.

371 B.12 DATA GENERATION, EDITING AND MANIPULATION VALUES senence The VALUES senence is used o specify he values o be placed in he variable. The number of values o be specified is NROW*NCOL if he ORDER is COLUMNWISE or ROWWISE, NROW*(NROW+1)/2 if SYMMETRIC or SKEWSYMMETRIC, and NROW if DIAGONAL. Noe ha he VALUES and he PATTERN (defined below) senences are muually exclusive, only one of hem can appear in he paragraph. PATTERN senence The PATTERN senence is used o specify he paern o be used o generae values. The keywords are STEP or RATE. The STEP opion will generae an arihmeic sequence wih iniial value r1 and incremen r2 (i.e., he sequence r1, r1+r2, r1+2*r2,...), and he RATE opion will generae a geomeric sequence wih iniial value r1 and rae r2 (i.e., r1, 2 r1*r2, r1* r2,...). If boh STEP and RATE are specified, he resul will be he sum of he wo sequences. The PATTERN senence mus be specified if he VALUES senence is no specified. RECODE Paragraph The RECODE paragraph is used o modify or recode he values of an exising variable. Resuls may be sored in a new or exising variable. The enries of an exising old variable falling in a specified range of values are changed o anoher specified value. Values in a variable may also be modified using analyic saemens (see Appendix A). Synax for he RECODE Paragraph RECODE OLD IS v. NEW IS v. PRECISION IS w. VALUES ARE (r1,r2,r3),(r1,r2,r3), ---. Required senences: OLD and VALUES Senences Used in he RECODE Paragraph OLD senence The OLD senence is used o specify he name of he variable o be recoded. NEW senence The NEW senence is used o specify he name of he variable in which he edied resuls are sored. If a new name is no specified, he recoded variable will be sored under he old name.

372 DATA GENERATION, EDITING AND MANIPULATION B.13 VALUES senence The VALUES senence is used o specify ses of values consising of a range (r1,r2) and a recoding value, r3. All daa values falling ino he range are changed o he recoding value. The reserved word MISSING (ha may be abbreviaed as MIS) is used o denoe he missing value code and can be used in he riple. To recode missing daa o a specific value, he riple should be specified as (MISSING, MISSING, r) where r is an ineger or real number. PRECISION senence The PRECISION senence is used o specify he precision of he sorage of he recoded variable. The defaul is he precision of he old variable. OMIT and SELECT Paragraphs The OMIT and SELECT paragraphs are used o delee or reain elemens of a variable according o range (span) or value crieria. Elemens are deleed or seleced if he elemen's index falls wihin he specified range(s). The value crierion is used in a similar manner excep ha he value is used insead of he range of he values. In addiion, he OMIT or SELECT paragraph may operae on more han one variable a a ime. If more han one variable is specified in he paragraph, he deleion or selecion crieria is only applied o he elemens of he firs variable while he elemens in he corresponding posiion of all oher specified variables are deleed or seleced according o he acion aken on he enry in he firs variable. When more han one variable is specified, he variables need no have he same number of enries bu he firs variable mus have he larges number of rows. Furhermore, values of a variable can be seleced even if hey have been coded wih a missing value code. Synax for he OMIT and SELECT Paragraphs OMIT OLD ARE v1, v2, ---. NEW ARE v1, v2, ---. SPANS ARE (i1,i2),(i3,i4), ---. VALUES ARE (r1,r2),(r3,r4), ---. MISSING. Required senence: OLD SELECT OLD ARE v1, v2, ---. NEW ARE v1, v2, ---. SPANS ARE (i1,i2), (i3,i4), ---. VALUES ARE (r1,r2), (r3,r4), ---. MISSING. Required senence: OLD

373 B.14 DATA GENERATION, EDITING AND MANIPULATION Senences Used in he OMIT and SELECT Paragraphs OLD senence The OLD senence is used o specify he name(s) of he variable(s) for which values will be deleed or seleced. NEW senence The NEW senence is used o specify he name(s) of he variable(s) where he resuls of he deleion or selecion operaion are sored. The number of variables specified in his senence mus be he same as ha in he OLD senence. The resuls will be sored in he original variables if he NEW senence is omied. SPANS senence The SPANS senence is used o specify he span(s) o be used in he deleion or selecion process. Indices falling in i1 o i2, i3 o i4, ec. will be omied or seleced. VALUES senence The VALUES senence is used o specify he range of values o be deleed or seleced, values r1 o r2, r3 o r4, ec. This crierion applies o he values of he firs variable only, oher variables are deleed or seleced according o he acion aken on he corresponding enry of he firs variable. MISSING senence The MISSING senence is used o specify he deleion or he selecion of he cases which have been coded wih a missing daa code. This crierion applies o he firs variable. Oher variables are deleed or seleced according o he acion aken on he corresponding enry of he firs variable.

374 DATA GENERATION, EDITING AND MANIPULATION B.15 JOIN Paragraph The JOIN paragraph is used o creae a variable by appending he daa of one or more variables o he end of a designaed variable in he SCA workspace. If all presenly defined variables are vecors, he resulan vecor is creaed by appending he enries of he second vecor o he las enry of he firs, he hird o he end of his, and so on. The number of enries in his resulan vecor is equal o he sum of he enries of all he presen vecors. This procedure is he same if all presenly defined variables are marices. However, each marix mus conain he same number of columns. Vecors may no be joined o marices. The precision of he resulan variable may also be specified. Synax for he JOIN Paragraph JOIN OLD ARE v1, v2, ---. NEW IS v. PRECISION IS w. Required senence: OLD Senences Used in he JOIN Paragraph OLD senence The OLD senence is used o specify he names of he variables o be joined. NEW senence The NEW senence is used o specify he name of he variable in which he resuls of he join operaion are sored. If he NEW-VARIABLE senence is no specified, hen he resuls of he join operaion will be sored under he name of he firs variable lised in he OLD senence. PRECISION senence The PRECISION senence is used o specify he precision of he sorage for he joined resuls. The keyword, w, may be eiher SINGLE or DOUBLE. The defaul is he precision of ha of he firs variable lised in he OLD senence.

375 B.16 DATA GENERATION, EDITING AND MANIPULATION AUGMENT Paragraph The AUGMENT paragraph is used o creae a variable by appending he daa of one or more variables side by side. All variables (eiher a vecor or a marix) mus have he same number of rows. The number of columns in he resulan marix is equal o he sum of he columns of all he presen variables. The precision of he resulan variable may also be specified. Synax for he AUGMENT Paragraph AUGMENT OLD ARE v1, v2, ---. NEW IS v. PRECISION IS w. Required senence: OLD-VARIABLES Senences Used in he AUGMENT Paragraph OLD senence The OLD senence is used o specify he names of he variables o be augmened. NEW senence The NEW senence is used o specify he name of he variable in which he resuls of he augmen operaion are sored. If he NEW senence is no specified, hen he resuls of he join operaion will be sored under he name of he firs variable lised in he OLD senence. PRECISION senence The PRECISION senence is used o specify he precision of he sorage for he augmened marix. The keyword, w, may be eiher SINGLE or DOUBLE. The defaul is he precision of ha of he firs variable lised in he OLD senence.

376 APPENDIX C GENERATING AND EDITING TIME SERIES DATA Appendix B provided a review of several SCA capabiliies o generae, edi and manipulae daa in he SCA workspace. This appendix concenraes on hose SCA capabiliies o creae or edi ime series daa in he SCA workspace. Feaures discussed in his appendix, and he secion conaining hem, are: Secion Feaures C.1 Generaion of variables for he modeling of a ime series subjec o rading day variaion or an Easer holiday effec. C.2 Ediing ime series daa by: recoding missing values; lagging or differencing daa; emporal aggregaion; and percen change in a series. C.1 Generaion of Some Useful Time Series The SCA Sysem provides capabiliies for simulaing ARIMA and ransfer funcion models, and for generaing some series useful in a ime series analysis. Daa simulaion is discussed in Chaper 12 of The SCA Saisical Sysem: Reference Manual for General Saisical Analysis. Simulaed ime series daa are usually consonan wih a specific ARIMA model (see Chaper 5) or ransfer funcion model (see Chaper 8). We may also find generaed daa (ha is, daa compleely specified in some manner) o be useful in daa analyses. Such generaed daa include: (a) Indicaor variables. An indicaor variable consiss of binary (i.e., daa ha are eiher 0 or 1) and may be used o represen he ime period(s) a which an inervenion occurs (see Chaper 6 for more informaion on inervenion analysis). The GENERATE paragraph (see Appendix B.1) is very convenien for creaing indicaor variables. (b) (c) The number of Mondays, Tuesdays,..., Sundays in a monh for a specified span of ime. Variables wih such informaion are useful when he effecs of rading days are incorporaed wihin an analysis of monhly ime series. The generaion and use of hese variables are discussed in C.1.1. Weighs represening he proporion of an Easer effec duraion period ha occurs in he monhs prior o Easer for a specified period of ime. The generaion and use of hese variables are discussed in C.1.2.

377 C.2 GENERATING AND EDITING TIME SERIES DATA C.1.1 Generaing daa for he modeling of rading day variaion As noed in Chaper 8, ransfer funcion models can be used o model ime series daa in he presence of cerain calendar variaion. One of hese phenomena is rading day variaion, anoher is he Easer holiday effec. The laer effec is discussed in Secion C.1.2. The DAYS paragraph can be used o generae he variables β 1W1 +β 2W2 + +β7w7. W i, h i=1, 2,..., 7, represens he number of imes he i day of he week (1=Monday,..., 7=Sunday) occurs in he monh. The DAYS paragraph can also provide a ransformaion of Wi. These variables are Di = Wi W 7, i = 1,2,...,6 D7 = W1 + W2 + + W7, where D i (i=1,2,...,6) reflecs he number of imes a day of he week occurs in a monh relaive o he number of Sundays in he monh and monh. is he oal number of days in a To illusrae he use of he DAYS paragraph, we will generae he number of Mondays, Tuesdays,..., Sundays in each monh during he period January 1949 hrough December The ime span used here corresponds o ha of he airline passengers daa (Series G) in Box and Jenkins (1970). The daa are used in Chaper 5 and are sored in he SCA workspace under he label SERIESG. To generae he daa, and sore he values in he variables MON, TUE, WED, THU, FRI, SAT and SUN, respecively, we may ener -->DAYS MON,TUE,WED,THU,FRI,SAT,SUN. BEGIN END 1960,12. The senences BEGIN and END are required senences providing he year and monh of he beginning and ending of he ime span. We will now use he PRINT paragraph o display he firs 12 observaions of SERIESG and he above seven variables. Some of he oupu is edied for presenaion purposes. -->PRINT SERIESG,MON,TUE,WED,THU,FRI,SAT,SUN. SPAN IS 1, > FORMAT IS 'F8.0, 7F4.0'. D7 VARIABLE SERIESG MON TUE WED THU FRI SAT SUN COLUMN--> ROW

378 GENERATING AND EDITING TIME SERIES DATA C As a second illusraion of he DAYS paragraph, we will generae he ransformed series for he same ime span as above. The ransformed values are sored in D1 hrough D7, respecively. We will also specify an eighh variable, DATE. This opional variable will reain row labeling informaion (ha is, year and monh) corresponding o each year and monh in he specified ime span. -->DAYS VARIABLES ARE D1 TO D7, DATE. BEGIN 1949,1. END 1960,12. --> TRANSFORM. The logical senence TRANSFORM is included o specify he generaion of ransformed daa. Noe he complee form of he VARIABLES senence is used (i.e., wih senence name and verb) o enable us o abbreviae he lis of variable names D1, D2, D3, D4, D5, D6, D7, by D1 o D7. (For more informaion on abbreviaions, please see Secion of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. As before, we will display only he firs 12 observaions of SERIESG, and he variables generaed above. Again, some of he oupu is edied for presenaion purposes. -->PRINT VARIABLES ARE DATE, SERIESG, D1 TO D7. SPAN IS 1,12. --> FORMAT IS '2F8.0, 7F4.0' VARIABLE DATE SERIESG D1 D2 D3 D4 D5 D6 D7 COLUMN--> ROW C.1.2 Generaing daa for he modeling of an Easer holiday effec As noed in Chaper 8, a ype of calendar effec known as a holiday effec occurs when consumer paerns or business aciviies vary due o a holiday. A ransfer funcion model can be used o incorporae a variable of weighs associaed wih a holiday effec wihin a ime series model. The EASTER paragraph is used o generae a variable consising of monhly weighs relaed o he Easer holiday. The weighs are based on he assumpion ha Easer has an effec on business aciviies in he period immediaely preceding i. This effec is usually proporional o he amoun of he Easer period ha occurs in he monhs of March

379 C.4 GENERATING AND EDITING TIME SERIES DATA and April each year. Proporions may differ beween series reflecing he variabiliy of when Easer occurs and he duraion of he Easer period for a series. The erm duraion denoes he amoun of ime (i.e., he number of days) prior o Easer in which a series is likely o be affeced. For example, he duraion period for clohing sales may be much longer han ha of floral sales. The variable of weighs generaed by he EASTER paragraph has he value 0 for all monhs, wih he excepions of March and April. The values for hese monhs are he fracions of he duraion period occurring in he monhs. We need o specify he lengh, in days, of his duraion period. The SCA Sysem also displays he dae for Easer in each of he years wihin our designaed ime span. To illusrae he use of he EASTER paragraph, we will generae weighs during he period January 1949 hrough December This ime span is he same as he one used in Secion C.1.2. We will assume he duraion period o be 10 days. To generae a variable, say EASTERWT, we ener -->EASTER EASTRWGT. BEGIN 1949,1. END 1960,12. DURATION IS 10. THE DATES OF EASTER DURING THE REQUESTED TIME SPAN 1949 APRIL APRIL MARCH APRIL APRIL APRIL APRIL APRIL APRIL APRIL MARCH APRIL 17 The variable EASTERWT consiss of he following weighs Non-zero values only occur in he monhs March and April. A weigh has he value 1 when Easer occurs on or before April 1 or afer April 11 (since we have defined he duraion period prior o Easer o be 10 days). When Easer occurs beween April 2 and April 10, inclusive, he weighs for March and April are boh non-zero (wih he sum equal o 1.0), reflecing he proporion of he 10 day duraion period occurring in each monh.

380 GENERATING AND EDITING TIME SERIES DATA C.5 C.2 Ediing Time Series Daa The SCA Sysem provides many capabiliies o edi or modify ime series daa. Missing daa can be recoded and a new series can be creaed by lagging, differencing or aggregaing he observaions of an exising series. In addiion, a series can be creaed by compuing he percen change in he values of a series. To illusrae some ediing capabiliies for ime series daa, we consider he firs 40 observaions of Series C of Box and Jenkins (1970). These daa are assumed o be in he SCA workspace under he label SERIESC. In addiion, we will omi a few values, replacing wih hem wih missing values, in order o illusrae paching capabiliies. The alered daa are sored in he SCA workspace under he label SERIESCP. The daa are lised below. Iniial fory observaions of Series C of Box and Jenkins (1970) (SERIESC) and series wih missing daa (SERIESCP). (Daa are read across a line.) SERIESC SERIESCP SERIESC SERIESCP **** **** SERIESC SERIESCP SERIESC SERIESCP 20.9 **** C.2.1 Paching missing daa Special acions need o be aken when a ime series conains missing observaions. The SCA Sysem provides capabiliies for dealing wih such series. Boh he ACF and PACF paragraphs make necessary compuaional adjusmens for missing observaions when he logical senence MISSING is included in he paragraph. A precise mehod o esimae he values of missing daa in a ime series is employed by he OESTIM paragraph. If he OESTIM paragraph is no available o us, we need o firs recode missing daa before esimaing he parameers of a ime series model. The recoded values should be appropriae so ha hey do no adversely affec an analysis and may reasonably represen he missing daa. In his secion, we explain some ad hoc mehods ha are generally useful. To illusrae he replacemen of missing daa, we consider series SERIESCP. SERIESCP has he missing daa code for he value of he 14h, 15h, and 32nd observaions. We can recode a missing value direcly using an analyic assignmen saemen (see Appendix B). Alernaively, we can employ some ad hoc mehods hrough he PATCH paragraph. The PATCH paragraph provides us wih some laiude in he recoding of ime dependen daa.

381 C.6 GENERATING AND EDITING TIME SERIES DATA Since he values of he missing observaions in SERIESCP are known o us, we are able o assess he validiy of hese mehods in his case. One simple scheme is o replace a missing value wih he average of he values immediaely adjacen o i. Adjacen averaging may be appropriae for nonsaionary nonseasonal ime series. To obain adjacen averaging as a paching mehod for SERIESCP, we can ener -->PATCH SERIESCP. METHOD IS ADJACENT(1). All missing values are replaced by he average of he values of he observaions one ime period from i. If wo or more missing observaions are nex o each oher, a missing value is replaced by he average of is wo neares, and equidisan, non-missing observaions. Here we have THE 14-TH OBSERVATION IS RECODED TO THE 15-TH OBSERVATION IS RECODED TO THE 32-TH OBSERVATION IS RECODED TO Here he 32nd observaion is recoded o Since observaion 15 is missing, he 14h observaion is recoded o he average of he 12h and 16h values. Similarly he 15h value is recoded o he average of observaions 13 and 17. We can average he values of observaions wo ime periods from each missing observaion (or span of missing observaions) by enering -->PATCH SERIESCP. METHOD IS ADJACENT(2). We are informed ha THE 14-TH OBSERVATION IS RECODED TO THE 15-TH OBSERVATION IS RECODED TO THE 32-TH OBSERVATION IS RECODED TO The recoding for he 14h and 15h observaions is as before. We can see ha by changing he argumen in he METHOD senence we can average adjacen informaion ha is farher and farher away from a missing daa poin. This may be appropriae if we wan o average adjacen, December or 1s quarer daa in he case of single missing observaions for seasonal or periodic daa. In such cases he value of he required argumen of ADJACENT may be 12 or 4, respecively. We can also replace missing daa by he mean of all daa, or a periodic mean (for seasonal daa). This mehod of paching may be appropriae for seasonal and nonseasonal ime series ha have no rend over ime. We can use he mean of all non-missing daa as our replacemen value by enering -->PATCH SERIESCP. METHOD IS MEAN(1). As noed above, his is a reasonable way o recode missing daa of a saionary ime series. Since SERIESC is no saionary, and has a downward drif a is beginning, we observe his mehod of recoding is no inappropriae.

382 GENERATING AND EDITING TIME SERIES DATA C.7 THE 14-TH OBSERVATION IS RECODED TO THE 15-TH OBSERVATION IS RECODED TO THE 32-TH OBSERVATION IS RECODED TO If SERIESCP represened quarerly daa, we may wish o use he mean of similar quarers as a pach. We can specify his by enering -->PATCH SERIESCP. METHOD IS MEAN(4). THE 14-TH OBSERVATION IS RECODED TO THE 15-TH OBSERVATION IS RECODED TO THE 32-TH OBSERVATION IS RECODED TO We may only specify one mehod in he PATCH paragraph. If differen mehods are appropriae (e.g., if he srucure of he daa changes over ime), we can combine procedures by invoking he paragraph repeaedly bu wih differen specificaions in non-overlapping ime spans. In addiion, when we pach a series we can also creae a binary indicaor variable o highligh hose ime indices whose values were pached. If he PATCH paragraph is invoked repeaedly for he same series, using he same variable name for he binary indicaor variable produces a indicaor of all changes. C.2.2 Lagging and differencing daa The ime series capabiliies of he SCA Sysem (see Chaper 5) can incorporae differencing in he idenificaion and esimaion of ime series models. However, i is someimes useful o be able o lag or o difference daa separaely. The LAG and DIFFERENCE paragraphs provide hese capabiliies. To illusrae he LAG paragraph, suppose we ener -->LAG SERIESC. LAGS ARE 1, 2. NEW ARE LAGC1, LAGC2. THE ORIGINAL SERIES IS SERIESC THE LAG 1 SERIES IS STORED IN VARIABLE LAGC1, WHICH HAS THE LAG 2 SERIES IS STORED IN VARIABLE LAGC2, WHICH HAS 41 ENTRIES 42 ENTRIES We have generaed wo series, one sored in LAGC1 and he oher in LAGC2. LAGC1 conains he firs lag of SERIESC (ha is, is firs lag order). The i-h enry in LAGC1 is he (i-1)s enry of SERIESC. Hence, LAGC 15 = SERIESC 4, LAGC 120 = SERIESC 19, LAGC 141 = SERIESC 40 The value of LAGC1(1) is necessarily undefined. In like manner, LAGC2 conains he second lag order of SERIESC. As a resul, he conens of hese variables are SERIESC LAGC1 LAGC *** *** ***

383 C.8 GENERATING AND EDITING TIME SERIES DATA A firs lag order is assumed if he LAG senence is no specified. Lagged values are sored as indicaed above so ha informaion is properly aligned if we wish o invesigae relaionships beween he currenly observed value of one variable and a previous (lagged) observaion of anoher variable. We difference daa in a manner similar o lagging. For example, he firs-order differenced series of SERIESC is (1 B)SERIESC = SERIESC B(SERIESC ), or = SERIESC SERIESC 1 The subscrip has been included o indicae how values are obained. We obain his new series by enering -->DIFFERENCE SERIESC. NEW IS DIFFC1. 1 DIFFERENCE ORDERS ARE (1-B ) SERIES SERIESC IS DIFFERENCED, THE RESULT IS STORED IN VARIABLE DIFFC1 SERIES DIFFC1 HAS 40 ENTRIES 4 Similarly we can calculae (1-B)( 1 B )SERIESC. This resul is relaed o wha we have calculaed, since (1 B)(1 B )SERIESC = (1 B )DIFFC1 = DIFFC1 DIFFC1 4 4 We can obain his differenced series by enering -->DIFFERENCE SERIESC. NEW IS DIFFC14. DFORDERS ARE 1, DIFFERENCE ORDERS ARE (1-B ) (1-B ) SERIES SERIESC IS DIFFERENCED, THE RESULT IS STORED IN VARIABLE DIFFC12 SERIES DIFFC12 HAS 40 ENTRIES A parial lising of he values in hese variables is given below SERIESC DIFFC1 DIFFC *** *** *** ***

384 GENERATING AND EDITING TIME SERIES DATA C *** *** C.2.3 Temporal aggregaion Occasionally a ime series is recorded a one ime inerval (for example, monhly or quarerly), bu an analysis uilizes a longer ime inerval (for example, quarerly or yearly). The daa recorded a he more frequen ime inerval mus hen be ransformed by means of emporal aggregaion for he purpose of analysis. For more informaion on emporal aggregaion, please see Chaper 16 of Wei (1990). The AGGREGATE paragraph is used o generae a new ime series hrough he emporal aggregaion of a specified ime series. The generaed series will be calculaed from non-overlapping ime inervals of a lengh ha we specify. Aggregaion is based on eiher he aggregae sum or he aggregae mean of he daa values in each period. When here are fewer daa poins han he specified aggregaion period in eiher he firs or he las group, he mean of he daa available wihin he group is compued and used accordingly. If we choose he aggregae sum as he mehod for aggregaion, he SCA Sysem will firs compue an aggregae mean for each group, and hen muliply his mean by he designaed inerval lengh. This mehod of aggregaion may no be appropriae for a series ha is highly seasonal or has a rend, and ha has an incomplee group a is beginning or end. To illusrae he AGGREGATE paragraph, we will use he airline daa (SERIESG) used previously in his appendix and in Chaper 5. The monhly oals of airline passengers are aggregaed o quarerly oals. We can aggregae he series o a new series, QSERIESG, by enering -->AGGREGATE SERIESG. NEW IS QSERIESG. METHOD IS SUM (3). The mehod SUM(3) indicaes ha he sum of each non-overlapping se of 3 observaions is used for aggregaion. Since we have 144 observaions in SERIESG, we will have 48 observaions in QSERIESG wih no incomplee groups during aggregaion. The values of he variable QSERIESG are shown below

385 C.10 GENERATING AND EDITING TIME SERIES DATA C.2.4 Percenage change in a series Ofen i is useful o analyze he percen change of he values of a ime series raher han he originally recorded observaions of he series. The PERCENT paragraph is used o compue he percen change of values of a ime series and sore he resuls in a new variable. To compue percenages for a series, we need o specify a period and a mehod for compuaion. The period allows us o base calculaions on adjacen poins if he lengh of he period is 1. We can obain a seasonal percen change in monhly daa by using 12 as he lengh of period. Two mehods of compuaion are available. A simple percen change uses he previous observaion as a base; ha is, (Y() Y( i))*100 T( i) A symmeric percen change compuaion uses an average of observed values as a base; ha is, (Y() Y( i))*100, Y() + Y( i) 2 where i is he specified period lengh. The defaul mehod of compuaion is a simple percen change of adjacen poins (i.e., SIMPLE(1)). To illusrae he use of differen periods in compuaions, we will again consider he airline daa, SERIESG. If we wish o use he defaul mehod of compuaion (i.e., he simple percen change of adjacen poins), we can simply ener -->PERCENT SERIESG. NEW IS SERIESG1. The percen changes are sored in he variable SERIESG1. Since he airline daa is seasonal, we can compue a simple seasonal percen change by enering -->PERCENT SERIESG. NEW IS SERIESG2. METHOD IS SIMPLE (12). The percen changes are sored in he variable SERIESG2. We will now use he PRINT paragraph o display he firs 24 observaions of each series (oupu edied for presenaion purposes). -->PRINT SERIESG, SERIESG1, SERIESG2. SPAN IS 1, 24. VARIABLE SERIESG SERIES1 SERIES2 COLUMN--> ROW *** *** *** ***

386 GENERATING AND EDITING TIME SERIES DATA C *** *** *** *** *** *** *** *** *** We see ha he inernal missing value code is used whenever a percen change canno be compued.

387 C.12 GENERATING AND EDITING TIME SERIES DATA SUMMARY OF THE SCA PARAGRAPH IN APPENDIX C This secion provides a summary of he SCA paragraph employed in his appendix. An SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs explained in his summary are DAYS, EASTER, PATCH, LAG, DIFFERENCE, AGGREGATE, and PERCENT. Legend (see Chaper 2 for furher explanaion) v : variable name i : ineger r : real value w( ) : keyword (wih argumen) DAYS Paragraph The DAYS paragraph is used o generae seven variables conaining he number of Mondays, Tuesdays,..., Sundays in a monh for a given period of ime. The number of rows generaed corresponds o he number of monhs in he specified ime span. An opional eighh variable may also be specified o reain row labeling informaion (year and monh) corresponding o each year and monh in he specified ime span. The generaed series may hen be ransformed according o where monh. W i Di = Wi W 7, i = 1,2,3,...,6 D7 = W1 + W2 + + W7, is he number of occurrences of he i-h day (1=Monday,..., 7=Sunday) in he -h

388 GENERATING AND EDITING TIME SERIES DATA C.13 Synax for he DAYS Paragraph DAYS VARIABLES ARE v1, v2, ---, v8. BEGIN IN i1, i2. END IN i1, i2. TRANSFORM./NO TRANSFORM. Required senences: VARIABLES, BEGIN, END Senences Used in he DAYS Paragraph VARIABLES senence The VARIABLES senence is used o specify he names of seven variables in which he number of days of a monh will be sored. The firs seven variables conain informaion regarding he number of days in a monh in he following order: Mondays (v1), Tuesdays (v2),..., Sundays (v7). An eighh variable may also be specified o sore labeling informaion. The labeling informaion consiss of he year and monh corresponding o each row. The number of rows generaed is dependen on he ime span specified in he BEGIN and END senences. One row will be generaed for each monh beween he specified beginning and ending monh, inclusive. BEGIN senence The BEGIN senence is used o specify he beginning year, i1 ( ), and monh, i2 (1 for January,..., 12 for December), from which monhly informaion on days will be generaed. END senence The END senence is used o specify he ending year, i1 ( ), and monh, i2 (1 for January,..., 12 for December), hrough which monhly informaion on days will be generaed. The year and monh specified mus be laer han ha specified in he BEGIN senence. TRANSFORM senence The TRANSFORM senence specifies he generaion of a se of ransformed daa according o he ransformaion defined above. The defaul is NO TRANSFORM. Transformed daa replace he daa sored in he variables specified in he VARIABLES senence as follows: v1( D ), v2( D ),..., v7( D )

389 C.14 GENERATING AND EDITING TIME SERIES DATA EASTER Paragraph The EASTER paragraph is used o generae a variable consising of monhly weighs relaed o he Easer holiday. The weighs (values beween 0 and 1) indicae he proporion of he Easer effec occurring in each monh during he specified ime period. Thus, he weigh is usually 0 for all monhs wih he excepions of March and April. An opional second variable may be creaed o reain row labeling informaion (year and monh) corresponding o each year and monh in he specified ime span. Synax for he Easer Paragraph EASTER VARIABLES ARE v1, v2. BEGIN IN i1, i2. END IN i1, i2. DURATION IS i. Required senences: VARIABLES, BEGIN, END and DURATION Senences Used in he EASTER Paragraph VARIABLES senence The VARIABLES senence is used o specify he label of he variable in which he weighs (beween 0 and 1) relaed o he Easer holiday will be sored. If a second variable is specified, i will be used o sore year and monh labeling informaion. The lengh of he variable generaed depends on he ime span specified in he BEGIN and END senences. BEGIN senence The BEGIN senence is used o specify he beginning year, i1 ( ), and monh, i2 (1 for January,..., 12 for December), from which monhly informaion on Easer effec will be generaed. END senence The END senence is used o specify he ending year, i1 ( ), and monh, i2 (1 for January,..., 12 for December), hrough which monhly informaion on Easer effec will be generaed. The year and monh specified mus be laer han ha specified in he BEGIN senence. DURATION senence The DURATION senence is used o specify he duraion (i.e., he number of days) of he Easer holiday effec prior o each Easer holiday. This is a required senence.

390 GENERATING AND EDITING TIME SERIES DATA C.15 PATCH Paragraph The PATCH paragraph is used o recode missing daa of a ime series by replacing missing values wih one of he following: (1) he average of he wo observaions ha are i indices adjacen o i, (2) he mean of all observaions or hose non-missing observaions i indices apar from he missing value, or (3) a specified value. In addiion, a binary indicaor variable can be creaed o provide a reference variable highlighing hose ime indices whose values were pached. Synax of he PATCH Paragraph PATCH OLD IS v. NEW IS v. METHOD IS w(i). SPAN IS i1, i2. INDICATOR IS v. Required senence: OLD Senences Used in he PATCH Paragraph OLD senence The OLD senence is used o specify he name of he variable conaining missing daa. NEW senence The NEW senence is used o specify he name of he variable o sore he pached series. The defaul is he name specified in he OLD series. METHOD senence The METHOD senence is used o specify he mehod used o recode missing daa in he OLD variable. Keywords and associaed argumens ha may be used o specify he mehod are: (1) ADJACENT(i): all missing daa are recoded o he average of he values of he OLD series wih indices (-i) and (+i), where is he index of he missing value. (2) MEAN(i): all missing daa are recoded o he periodic average of he non-missing values of he OLD series. The argumen i is used o specify he periodiciy of he series. If i=1 hen he overall average of all nonmissing daa will be used o recode he missing observaions.

391 C.16 GENERATING AND EDITING TIME SERIES DATA (3) VALUE(r): all missing daa are recoded o he value r. The mehods are all muually exclusive wihin he execuion of a single paragraph. The defaul is ADJACENT(1). SPAN senence The SPAN senence is used o specify he span of ime indices, i1 o i2, in which a pach of missing daa will be made. The defaul span is he whole series. INDICATOR senence The INDICATOR senence is used o specify a name (label) for an indicaor variable associaed wih he paching. The indicaor variable conains 1 for missing daa ha are replaced, and 0 oherwise. The lengh of he indicaor variable is always he same as he old series regardless of he ime periods specified in he SPAN senence. This convenion allows use of he same indicaor variable for muliple paches of a series using differen mehods. LAG Paragraph The LAG paragraph is used o apply he lag (backshif) operaor, B, o a variable o creae a new lagged variable. For he variable X, he lag operaion Y = B(X) is defined as Y = X 1 provided i exiss (oherwise a missing value code is provided). This definiion is for a lag one backshif. Various oher lag orders may be specified (e.g., lag k, where Y = for various values of k), hence creaing more han one new series. X k Lagged values are sored in he following manner. If he variable YDATA sores he k- h order lagged values of he variable XDATA, hen YDATA() = he missing value code, j = 1,2,...,k = XDATA(-k), = k+1,...,k+n where n is he index of he las observaion (value) of XDATA. As a resul, YDATA has (n+k) observaions, he firs k of which conaining he missing value code, while XDATA has n observaions. Synax for he LAG Paragraph LAG OLD IS v. NEW ARE v1, v2, ---. LAGS ARE i1, i2, ---. Required senence: OLD

392 GENERATING AND EDITING TIME SERIES DATA C.17 Senences Used in he LAG Paragraph OLD senence The OLD senence is used o specify he name of he series o lag. NEW senence The NEW senence is used o specify names of he new series. Resuls will be sored in he OLD series if he NEW senence is omied. LAGS senence The LAGS senence is used o specify he lags o be made on he old series. For example, if here are 3 specified lags, hree new series will be generaed. The defaul is 1, creaing Y = X. 1 DIFFERENCE Paragraph j The DIFFERENCE paragraph is used o apply he operaor (1 B ) o a variable or a se j of variables o creae one or more variables. For a variable X, he operaion Y = (1 B )X is defined as Y = X X j provided > j (oherwise a missing value is specified). This definiion is given for one differencing order (DFORDER) in he backshif operaor B. More han one differencing order may be specified. If differencing orders, i1, i2,..., im, are i1 i2 im specified, hen he operaor (1 B )(1 B )...(1 B ) w ill be applied o all designaed variables. In such a case he missing value code is sored as he firs (i1+i2+ +im) values of he resuling variable. The missing values may be deleed and he resuling variable compressed o a series conaining n(-i1+i2+ +im) values, where n is he number of observaions of he original series, if he COMPRESS senence is specified. Synax for he DIFFERENCE Paragraph DIFFERENCE OLD ARE v1, v2, ---. NEW ARE v1, v2, ---. DFORDERS ARE i1, i2, ---. COMPRESS. /NO COMPRESS. Required senence: OLD

393 C.18 GENERATING AND EDITING TIME SERIES DATA Senences Used in he DIFFERENCE Paragraph OLD senence The OLD senence is used o specify he name(s) of he series o be differenced. NEW senence The NEW senence is used o specify he variable name(s) where he differenced series are sored. The defaul is ha he daa will be sored in he names specified in he OLD senence. DFORDERS senence The DFORDERS senence is used o specify he orders in he produc of differencing operaors o be made on he OLD series. Defaul is 1, he single operaor (1-B). If i1, i2, i1 i2... are specified, he operaor (1 B )(1 B ) is applied o he old series. COMPRESS senence The COMPRESS senence is used o indicae wheher he missing values caused by differencing will be deleed. When COMPRESS is specified, he resuling NEW variable will have fewer observaions han is corresponding OLD variable. The firs value of he NEW variable will be he firs value for which he differencing operaor is valid. Defaul is NO COMPRESS, i.e., he missing value code will be assigned o all undefined values and he NEW variable will have as many observaions as is corresponding OLD variable. AGGREGATE Paragraph The AGGREGATE paragraph is used o generae a new ime series hrough he emporal aggregaion of a specified ime series. More han one series can be aggregaed a a ime. Synax of he AGGREGATE Paragraph AGGREGATE OLD IS v1, v2, ---. NEW IS v1, v2, ---. BEGINNING IS i. METHOD IS w(i). SPAN IS i1, i2. COMPRESS./NO COMPRESS. Required senences: OLD, METHOD

394 GENERATING AND EDITING TIME SERIES DATA C.19 Senences Used in he AGGREGATE Paragraph OLD senence The OLD senence is used o specify he name(s) of ime series variable(s) from which aggregaed ime series will be derived. NEW senence The NEW senence is used o specify he name(s) of variable(s) o sore he aggregaed ime series. The defaul is o use he names specified in he OLD senence. BEGINNING senence The BEGINNING senence is used o specify he index for which aggregaion will begin. The defaul is 1, i.e., he firs period. METHOD senence The METHOD senence is used o specify he manner of, and period for, emporal aggregaion. The associaed keywords are SUM in which he aggregae sum of each nonoverlapping inerval is recorded and MEAN in which he aggregae average of each nonoverlapping inerval is recorded. The ineger argumen for each keyword is used o specify he ime period used in he emporal aggregaion. Values 1 o i of he each variable specified in he OLD senence are aggregaed o he firs value of he sored in he variable(s) specified in he NEW senence; values i+1 o 2i are aggregaed o he second value; and so on. Indexing values are shifed appropriaely if he value specified in he BEGINNING senence is no 1. SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, in which aggregae ime series will be generaed. The defaul span is he whole series. COMPRESS senence The COMPRESS senence is used o specify ha he generaed series will be sored in condensed form, i.e., aggregaed observaions are no repeaed so ha he oal number of observaions are less han ha of he original series. The defaul is COMPRESS.

395 C.20 GENERATING AND EDITING TIME SERIES DATA PERCENT Paragraph The PERCENT paragraph is used o generae a new ime series using he percen change in he observaions of a specified ime series. More han one series can be generaed simulaneously. Synax for he PERCENT paragraph PERCENT OLD ARE v1, v2, ---. NEW ARE v1, v2, ---. METHOD IS w(i). SPAN IS i1, i2. Required senence: OLD Senences Used in he PERCENT Paragraph OLD senence The OLD senence is used o specify he name(s) of ime series variable(s) for which he percen change will be derived. NEW senence The NEW senence is used o specify he name(s) of variable(s) o sore he percen change ime series. The defaul is o use he names specified in he OLD. METHOD senence The METHOD senence is used o specify he mehod and he period used in he compuaion of he percen change. The keyword w can be eiher SIMPLE or SYMMETRIC. The associaed period lengh (i) mus also be specified. The SIMPLE mehod compues he percen change using he formula (Y() Y( i))*100. Y( i) The SYMMETRIC mehod uses he formula (Y() Y( i))*100, Y() + Y( i) 2 The defaul is SIMPLE(1). SPAN senence The SPAN senence is used o specify he span of ime indices, from i1 o i2, in which he compuaion of percen change will be made. The defaul span is he whole series.

396 GENERATING AND EDITING TIME SERIES DATA C.21 REFERENCES Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasing and Conrol. San Francisco: Holden Day. (Revised ediion published in 1976). Wei, W.W.S. (1990). Time Series Analysis: Univariae and Mulivariae Mehods. New York: Addison-Wesley.

397

398 APPENDIX D SCA MACRO PROCEDURES The SCA Sysem provides us wih he capabiliy o creae and mainain compuaions, analyses or procedures specific o our needs. For example, we may find i useful o perform a special sequence of SCA operaions wih differen daa during an SCA session. I would simplify our work if such sequences can be wrien only once and hen could be freely referred o subsequenly. Many programming languages provide subprograms o help in his siuaion. SCA offers macro procedures o obain such flexibiliy. The use of an SCA macro procedure enables us o sore any se of SCA saemens on a file which may be referenced a any poin of an SCA session. This enables us o exend he capabiliies of he SCA Sysem. D.1 SCA Macro Files and Macro Procedures SCA macro procedures are mainained in files. These files are referred o as SCA macro files. Procedures conained on a macro file may be creaed by any ex edior. An SCA macro procedure consiss of a sequence of SCA saemens, including boh analyic and English-like saemens. A macro procedure is handled as a subprogram wihin an SCA session. To illusrae SCA macro files and SCA macro procedures, Tables D.1 and D.2 lis he conens of wo SCA macro files. These files will be used hroughou his Appendix o illusrae SCA macro procedures. The records of Table D.1 and Table D.2 comprise he files APPENDX.DATA and MACRO.DATA, respecively. The names of he files are for illusraion only and may be changed o names appropriae o a local compuer.

399 D.2 SCA MACRO PROCEDURES Table D.1 Conens of he file APPENDX.DATA ==ALL MACRO CALL APPENDXA CALL APPENDXB RETURN ==APPENDXA --A MACRO PROCEDURE ILLUSTRATING THE MATRIX EXAMPLES --OF APPENDIX A INPUT ADATA, BDATA, EDATA. NCOL ARE 2, 3, END OF DATA C1DATA = BDATA # ADATA C2DATA = T(ADATA) # BDATA DETB = DET(BDATA) PRINT C1DATA, C2DATA, DETB BINVERSE = INV(BDATA) ADJOINTB = DETB*BINVERSE PRINT BINVERSE, ADJOINTB EIGEN EDATA RETURN ==APPENDXB --A MACRO PROCEDURE OF THE SCA STATEMENTS IN SECTION B AND B.1.2 OF APPENDXB GENERATE VECTOR1. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 5. GENERATE VECTOR2. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 2, 0 FOR 3. GENERATE VECTOR3. NROW ARE 10. PATTERN IS STEP(1.0, 0.5). GENERATE VECTOR4. NROW ARE 10. PATTERN IS RATE(1.0, 2.0). PRINT VECTOR1, VECTOR2, VECTOR3, VECTOR4 RETURN //

400 SCA MACRO PROCEDURES D.3 Table D.2 Conens of he file MACRO.DATA ==SCORES C C AVERAGE ENGLISH SCORES C INPUT VARIABLE IS ENGLISH END OF DATA C C AVERAGE PHYSICS SCORES C INPUT VARIABLE IS PHYSICS END OF DATA RETURN ==EXPLORE C C AS A MEANS TO GET A FEEL FOR A DATA SET, C A CONFIDENCE INTERVAL AND PLOT OF DATA C OVER TIME WILL BE INVOKED. C DATA ARE ASSUMED TO BE STORED IN THE SCA WORKSPACE C IN A VARIABLE NAMED X. C CINTERVAL X TSPLOT X RETURN ==LINREG PARAMETER SYMBOLIC-VARIABLES ARE NINDEP, FILE(12). C C READ IN DATA C INPUT VARIABLES ARE Y,X. FILE IS &FILE. NCOLS ARE 1, &NINDEP. C C COMPUTE REGRESSION COEFFICIENTS, PREDICTED VALUES, RESIDUALS, ETC. C BETA = INV(T(X)#X)#T(X)#Y -- COMPUTE REGRESSION COEFFICIENTS YHAT = X#BETA -- COMPUTE PREDICTED VALUES RESI = Y-YHAT -- COMPUTE RESIDUALS N = NROW(X) P = NCOL(X) NP = N-P P1 = P-1 MEAN = SUM(Y)/N SST = SUM((Y-MEAN)**2) SSE = SUM(RESI**2) SSB = SST-SSE MSE = SSE/NP MSB = SSB/P1 F = MSB/MSE C C PRINT REGRESSION COEFFICIENTS C DO 100 I=1,P I1=I-1 IF(I1 LE 9) THEN NEXT ELSE GO FORWARD 80 DISPLAY TEXT IS T5,'BETA',I1('F1.0'),' = ',BETA('F12.4',I). GO FORWARD DISPLAY TEXT IS T5,'BETA',I1('F2.0'), ' = ',BETA('F12.4'',I). 100 CONTINUE

401 D.4 SCA MACRO PROCEDURES Table 2 Conens of he file MACRO.DATA (coninued) C C C // PRINT THE ANALYSIS OF VARIANCE TABLE DISPLAY TEXT IS ///T5,'ANALYSIS OF VARIANCE TABLE'// T5,' SOURCE D.F. SUM OF SQUARES MEAN SQUARES F'/. DISPLAY TEXT IS T5,'REGRESSION',P1('F6.0',1),SSB('C17.4',1), MSB('C15.4',1),F('C10.2',1). DISPLAY TEXT IS T5,' ERROR ',NP('F6.0',1),SSE('C17.4',1), MSE('C15.4',1). DISPLAY TEXT IS T5,' TOTAL ',N ('F6.0',1),SST('C17.4',1) RETURN D.2 Srucure of an SCA Macro File Boh files APPENDX.DATA and MACRO.DATA have similar srucure. A se of SCA commands, or daa, are preceded by a record wih double equal signs (i.e., = = ) in columns 1 and 2; and are ended wih he saemen RETURN. The final enry of each file is //. The alphanumeric characers following = = provide he name of he macro procedure. For example, he file APPENDX.DATA consiss of he macro procedures named ALLMACRO, APPENDXA and APPENDXB; while MACRO.DATA conains he macro procedures SCORES, EXPLORE and LINREG. The name of a macro procedure may conain from one o eigh alphanumeric characers, wih a leer as he mandaory firs characer. If more han eigh characers are used as a macro procedure name, only he firs eigh characers are inerpreed. Any line wih he leer C in he firs column and a space in he second column (i.e., C ) is inerpreed as a line of commens. Any line whose firs non-blank enries are a double dash ( -- ) is also inerpreed as a line of commens. Lines beginning wih -- are no prined, bu hose beginning wih C will be prined as hey are inerpreed during an SCA session. D.3 Invoking a Macro Procedure If we ener he command -->CALL APPENDXB. FILE IS 'APPENDX.DATA'. hen he following se of SCA commands will be inerpreed and execued GENERATE VECTOR1. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 5. GENERATE VECTOR2. NROW ARE 10. VALUES ARE 0 FOR 5, 1 FOR 2, 0 FOR 3. GENERATE VECTOR3. NROW ARE 10. PATTERN IS STEP(1.0, 0.5). GENERATE VECTOR4. NROW ARE 10. PATTERN IS RATE(1.0, 2.0). PRINT VECTOR1, VECTOR2, VECTOR3, VECTOR4

402 SCA MACRO PROCEDURES D.5 These commands will duplicae seleced capabiliies illusraed in Appendix B. Similarly, if we ener -->CALL APPENDXA. FILE IS 'APPENDX.DATA' hen seleced capabiliies illusraed in Appendix A will be compued and resuls displayed. The macro procedure SCORES of MACRO.DATA will ransmi wo variables, sored as ENGLISH and PHYSICS, o he SCA workspace. The procedure named EXPLORE can be used for compuing a confidence inerval and a ime series plo of a variable. For example, suppose we have hree variables, SERIESA, SERIESB, and SERIESC, in he SCA workspace. We can repeaedly perform hese operaions by enering he sequence of commands -->X = SERIESA -->CALL EXPLORE. FILE IS 'MACRO.DATA'. -->X = SERIESB -->CALL EXPLORE -->X = SERIESC -->CALL EXPLORE We may noe ha afer he firs CALL o he EXPLORE macro procedure he FILE senence is omied. Unless i is insruced oherwise, he SCA Sysem assumes a macro procedure being called resides in he las referenced macro file. This defaul is implici wihin he macro procedure ALLMACRO of he APPENDX.DATA file. If we ener CALL ALLMACRO. FILE IS 'APPENDX.DATA'. we see ha calls o he remaining macro procedures of he file are invoked, hence all macro procedures of he file are execued. Care mus be aken if one or more macro procedures is nesed wihin anoher. Tha is, an error can occur if one macro procedure calls anoher. Appropriae allocaion and de-allocaion of files is required. Please refer o Secion 1 of Appendix E for furher informaion. D.4 Symbolic Variables in a Macro Procedure Symbolic Variables The erm symbolic variables refers o any name used in a macro procedure o label a variable or enry ha can be given a new value or connoaion when a macro procedure is invoked. Symbolic variables add flexibiliy o macro procedures by labeling acual argumens ha may change when a procedure is execued. For example, i is desirable o be able o pass a differen series name (or variable name) o he EXPLORE procedure in MACRO.DATA raher han requiring a variable o have X as is name for all uses of he procedure. To faciliae his convenience, he label X may be replaced by an expression, such as &SERIES,

403 D.6 SCA MACRO PROCEDURES in he procedure. In his manner, SERIES is recognized by he sysem as a symbolic variable as explained below. Wihin he body of a macro procedure a symbolic variable is denoed by preceding a sring of alphanumeric characers by an ampersand (&). The firs characer of he sring mus be a leer and he las characer may be a compound symbol (#). The compound symbol is used as a deliminaor if he symbol variable is immediaely followed by a number or leer. The name of he symbolic variable is he characer sring excluding he compound symbol. The compound symbol can be omied if he symbolic variable name is immediaely followed by a special characer such as blank,. or,. If he alphanumeric sring denoing he symbolic variable has more han eigh characers, only he firs eigh are inerpreed. The special characer & is used o disinguish symbolic variables from oher variables used in a macro procedure. The acual values used for he symbolic variables are supplied when he macro procedure is invoked (by he CALL paragraph, see synax a he end of his Appendix), or may be hose values supplied as defaul values wihin he procedure iself. In he SCA ineracive mode, if a symbolic variable does no have a defaul value and is no supplied in he CALL paragraph, he SCA Sysem will issue a promp for a value. The response o he promp mus be enclosed in a pair of parenheses. A faal error will occur if such a siuaion happens in he bach environmen. Symbolic Subsiuion The SCA Sysem scans each line in a macro procedure and replaces symbolic variables wih heir acual values in an acion called symbolic subsiuion. An acual argumen for a symbolic variable is always sored in is exac characer form. For example, if a symbolic variable has a value 2.3, i is sored as a sring of hree characers 2.3, raher han a real number. Hence symbolic subsiuion will no lose any precision. The rule governing symbolic subsiuion is simple: he SCA Sysem scans a line in he macro procedure from righ o he lef and subsiues he firs symbolic variable encounered by is associaed value (in characer form). This scanning is repeaed unil all symbolic variables are subsiued and resolved. This rule allows he user o concaenae symbolic variables o modify exising variable names, or o use muliple ampersands. For example, if &A has he symbolic argumen JOHN, and &JOHN has he symbolic argumen BOY, hen &&A will have he value BOY afer he compleion of symbolic subsiuion. The symbolic variables may appear anywhere in a saemen in an SCA macro procedure alhough hey usually appear in analyic expressions or argumen liss of assignmen senences. D.5 A Regression Macro Procedure To illusrae boh he use of symbolic variables and he abiliy o wrie our own procedures, we consider he macro procedure LINREG of he MACRO.DATA file (see Table D.2). LINREG performs a regression analysis using analyic expressions (see Appendix A). This procedure may be useful in eaching regression analysis, bu a more compuaionally efficien means is available hrough he SCA REGRESS paragraph (see Chaper 4).

404 SCA MACRO PROCEDURES D.7 The macro procedure ransmis daa for he dependen and independen variables from a file. The symbolic argumen FILE is used o designae he logical uni number for he file conaining he daa. If a uni number is no specified in he CALL paragraph, he defaul uni 12 is used. This defaul value is specified wihin he LINREG macro in he PARAMETERS paragraph (is complee synax is provided a he end of his Appendix). Wihin he file FILE he firs column of daa is ransmied o he dependen variable labeled Y and he remaining p columns conain he independen variables, sored in he marix X. The value p is represened in he macro by he symbolic argumen NINDEP. This argumen has no defaul value, hence we mus specify i in our CALL of LINREG. The daa lised in Table D.3 is assumed o be on a file ha has been associaed wih he logical uni 12. This assignmen may have been accomplished before we invoked he SCA Sysem or hrough he ASSIGN paragraph (see Appendix E). Table D.3 Daa used in he LINREG example To invoke he LINREG procedure on his daa se we can ener -->CALL LINREG. FILE IS 'MACRO.DATA'. SYMBOLIC IS NINDEP(4). We will obain oupu similar o ha given below.

405 D.8 SCA MACRO PROCEDURES REGRESSION COEFFICIENTS: BETA0 = BETA1 = BETA2 = BETA3 = ANALYSIS OF VARIANCE TABLE SOURCE D.F. SUM OF SQUARES MEAN SQUARES F REGRESSION 3 135, , ERROR 12 4, TOTAL , D.6 Global and Local Variables A variable wih as he firs characer of is name is reaed as a local variable wihin a macro procedure. Ohers are regarded as global variables. The difference beween a local and global variable is ha local variables are deleed from he workspace upon compleion of a macro procedure, unless oherwise specified. A local variable may be reained in he workspace by using he RETAIN senence in he RETURN paragraph (see he Synax secion a he end of his Appendix). Global variables may be used anywhere in a session, including in subsequen macro procedures.

406 SCA MACRO PROCEDURES D.9 SUMMARY OF THE SCA PARAGRAPHS IN APPENDIX D This secion provides a summary of hose SCA paragraphs employed in his chaper. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are CALL, PARAMETERS, and RETURN. Legend (see Chaper 2 for furher explanaion) v v(a) i c : variable name : variable name (wih argumen) : ineger : characer daa (mus be enclosed wihin single aposrophes) CALL Paragraph The CALL paragraph is used o invoke an SCA macro procedure. I is also used o specify he acual argumens for he symbolic variables in he macro procedure and repeiions of he execuion of he procedure. Synax of he CALL Paragraph CALL PROCEDURE IS procedure-name. FILE IS 'c' (or i). SYMBOLIC-VALUES ARE v1(a), v2(a), ---. REPEAT IS i. Required senence: PROCEDURE

407 D.10 SCA MACRO PROCEDURES Senences used in he CALL paragraph PROCEDURE senence The PROCEDURE senence is used o specify he name of he macro procedure o be execued. FILE senence The FILE senence is used o specify he name of he macro procedure file conaining he called macro procedure. A logical uni number may be specified insead. The defaul uni is 8. More han one macro procedure file may be allocaed for an SCA session. SYMBOLIC-VALUES senence The SYMBOLIC-VALUES senence is used o specify he acual values or argumens of he symbolic variables used in he procedure. The value of a symbolic variable need no be specified if he defaul value is desirable. If a symbolic variable does no have a defaul value and is no specified in his senence, execuion of he macro procedure is abored in bach mode or a promp message is issued in he ineracive mode requesing an appropriae value when he PARAMETER paragraph is execued. The synax for he argumens in his senence is he same as ha in he SYMBOLIC-VARIABLE senence of he PARAMETER paragraph. REPEAT senence The REPEAT senence is used o specify he number of imes he macro procedure should be execued. This senence is useful when he macro procedure is used for simulaion. The defaul value is 1. PARAMETERS Paragraph The PARAMETERS paragraph is used o specify he symbolic variables (and heir possible defaul values) of an SCA macro procedure. This paragraph is no required in a macro procedure if he procedure does no have symbolic variables. The PARAMETERS paragraph mus be execued before any symbolic variable is used. Usually, i is placed a he beginning of a macro procedure. Noe only one PARAMETERS paragraph may be specified in a macro procedure. Synax of he PARAMETERS paragraph PARAMETERS SYMBOLIC-VARIABLES ARE v1(a), v2(a), ---. Required senence: SYMBOLIC

408 SCA MACRO PROCEDURES D.11 Senence used in he PARAMETERS paragraph SYMBOLIC-VARIABLE senence The SYMBOLIC-VARIABLE senence is used o specify hose variables ha will be used as symbolic variables in a macro procedure. The argumens, v1(a), v2(a), ---, have he following synax Symbolic-variable-name(defaul-symbolic-value) Specificaion of a defaul symbolic value or argumen is opional. If a symbolic variable is given no defaul argumen, is argumen mus be specified in he CALL paragraph. Oherwise a faal error resuls in bach mode, or a promp is issued by he sysem in he ineracive mode. All characers, inside he parenheses, including he leading and railing blanks, are inerpreed as par of he argumen. Therefore boh NAME(A) and NAME(A ) are accepable o define he defaul value of he symbolic variable NAME and are considered o be differen. The argumen for he former specificaion has one characer, A, he laer has wo characers, i.e., A and a railing blank. Due o such differences, he response o a sysem promp for he value of a symbolic variable of he paragraph mus be enclosed in a pair of parenheses. Noe: The names specified in his senence are he labels of hose variables ha are symbolic variables in he remainder of he macro procedure. Unlike he designaion of symbolic variables in he remainder of he macro, hese names mus no be preceded by an ampersand (&). RETURN Paragraph The RETURN paragraph is used o signify he end of an execuion flow for a se of insrucions wrien as an SCA macro procedure. The paragraph also is used o specify acions o be aken wih respec o variables creaed during he macro procedure. Synax of he RETURN paragraph RETURN RETAIN v1, v2, ---. COMPRESSION./NO COMPRESSION. Required senences: none

409 D.12 SCA MACRO PROCEDURES Senences used in he RETURN paragraph RETAIN senence The RETAIN senence is used o specify he name(s) of hose local variables (i.e., ones ha are for emporary use in he macro procedure) ha should now be reained (i.e., no deleed) in he workspace afer he execuion of he macro procedure. Normally, all local variables are deleed from he workspace. All local variables may be reained by specifying RETAIN ALL. COMPRESSION senence The COMPRESSION senence is used o specify he compression of he SCA workspace afer he execuion of he macro procedure. Alhough all local variables are deleed afer an SCA macro procedure is compleed, he SCA Sysem does no auomaically compress he user workspace. Tha is, he deleed variables sill occupy space in he memory. The keyword COMPRESSION mus be specified in he RETURN paragraph if he workspace is o be compressed.

410 APPENDIX E UTILITY RELATED INFORMATION The SCA Sysem provides a number of capabiliies o manage files, inernal workspace (memory), and oher uiliy relaed asks effecively wihin an SCA session. An overview of some of hese feaures is presened in his Appendix. More informaion may be found in The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. Informaion for using he SCA Sysem on specific compuers usually accompanies he apes or diskees conaining he SCA Sysem. This informaion may have been reained by personnel in a compuing cener and may no be readily available o an SCA user. In such a case, SCA will furnish necessary documen(s) upon reques. All informaion (daa) used during an SCA session resides in he main memory of he compuer. The SCA Sysem refers o his memory as he workspace of he SCA session. In addiion o user defined informaion, cerain conrol blocks for he SCA Sysem, and emporary work arrays required by some of he operaions are also placed in he workspace as variables. The SCA Sysem has a buil-in dynamic sorage allocaor o manage he space available for variables during an SCA session. Usually we do no need o be concerned abou he managemen of exernal files or of he workspace; bu occasionally cerain acions may be necessary in order o use he SCA Sysem or he workspace efficienly. We will firs examine aspecs of file managemen, hen discuss how we can manage he workspace and he presenaion of maerial in i. E.1 File Allocaion and De-allocaion A file may need o be designaed when ransmiing daa o or from he SCA workspace, when execuing a macro procedure (see Appendix D), or managing he SCA workspace (see Secion E.2). The FILE senence is used for his purpose. The synax of his senence is FILE IS 'file-name'. where file-name is a valid file name. Please noe ha he file name specified mus be enclosed wihin a pair of single quoes. File names wih direcory pah are accepable. In some siuaions, i is necessary o associae (assign) a uni number wih a file name. In such cases, he file uni number is an ineger and should no be enclosed wihin single quoes. Some reasons o use uni numbers are provided below. The SCA ASSIGN paragraph can be used for his purpose. When daa are ransmied o or from he SCA workspace, he SCA Sysem dynamically assigns (associaes) uni number 7 wih he file name specified. Since inernal assignmen of uni numbers is made in hese paragraphs, we need no specify a file uni number when using hese paragraphs.

411 E.2 UTILITY RELATED INFORMATION The FREE paragraph releases a file from an SCA session and makes he uni number available o oher files. However, ASSIGNing he same uni wice implicily FREEs he firs file before ASSIGNing he second one. Thus, i is no necessary o issue a FREE paragraph before re-using a uni number, hough i cerainly does no hur. The ASSIGN paragraph is seldom needed excep when (1) recalling he conens of a workspace file wih a name differen from he defaul file (or defaul uni) employed, or (2) a macro procedure calls anoher macro procedure of a differen file. An example is provided o illusrae each siuaion. EXAMPLE 1: As an example of he ASSIGN and WORKSPACE (see Secion E.3) paragraphs, he following SCA paragraphs may be used o allocae a file and save he SCA workspace o he file PROJECT1.WRK. ASSIGN FILE IS 9. EXTERNAL IS 'PROJECT1.WRK'. NEW. ATTRIBUTE FILEFORMAT(BINARY), ACCESS(WRITE). WORKSPACE MEMORY IS SAVED(9). The specificaion ACCESS(WRITE) is no necessary since a NEW file is always wriable. However, such specificaion is necessary if he file o be used is an exising file. EXAMPLE 2: To recall he workspace saved previously, we may ener ASSIGN FILE IS 9. EXTERNAL IS 'PROJECT1.WRK'. ATTRIBUTE FILEFORMAT(BINARY). WORKSPACE MEMORY IS RECALLED(9). The following example demonsraes he use of he ASSIGN paragraph wihin an SCA macro procedure (see Appendix D) ha has an imbedded CALL o anoher file. In his example, we assume here are wo macro procedure files. One is named MYDATA.DAT, a file consising of procedures ha will ransmi daa ses o he SCA Sysem. One of he macro procedures of his file is assumed o have he name DATA1. Suppose here is a second macro procedure file, say MYPROC.DAT, consising of a number of macro procedures useful for daa analysis. In his file, we assume here is a macro procedure named EXAMPLE1 ha reads he daa conained in he macro procedure DATA1. The porion of his file relaed o EXAMPLE1 is given below.

412 UTILITY RELATED INFORMATION E.3 The procedure above does he following:... ==EXAMPLE1 ASSIGN FILE IS 20. EXTERNAL IS 'MYDATA.DAT'. CALL DATA1. FILE IS RETURN END ==EXAMPLE2... (1) MYDATA.DAT is associaed wih he file uni 20. (2) Daa are ransmied hrough he call of he macro procedure DATA1 in he file MYDATA.DAT (3) Oher analyses may follow afer he daa are ransmied The above seps are invoked by enering he saemen CALL EXAMPLE1. FILE IS 'MYPROC.DAT' (See Appendix D regarding he use of he CALL paragraph.) If MYDATA.DAT was no provided wih a separae file uni number, hen he macro CALL of DATA1 would cause an error. Firs he macro file MYPROC.DAT would be freed and replaced by MYDATA.DAT as he macro file in use. The SCA Sysem would hen be unable o reurn o EXAMPLE1 as i will have los rack of he file conaining i. E.2 Conrol of he SCA Environmen: he PROFILE Paragraph We can conrol our SCA environmen hrough he use of he PROFILE paragraph. The PROFILE paragraph can be used o aler he promping and display levels of an SCA session, direc oupu o an exernal file, or adjus he widh of oupu displayed or assumed for daa ransmied o he SCA workspace. More complee informaion can be found in Chaper 8 of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. E.2.1 Direcing oupu o a file and oupu review In some siuaions, we may wish o simulaneously roue o a file all, or porions, of SCA oupu ha are displayed a our erminal screen.

413 E.4 UTILITY RELATED INFORMATION When we ener he SCA Sysem, he Sysem auomaically opens a file called SCAOUTP.OTP. This file remains aached for he remainder of our SCA session and is assigned an inernal uni number of 10. To simulaneously roue he oupu o his file, we simply ener PROFILE REVIEW To sop his flow of oupu o he oupu file, we ener he SCA saemen PROFILE NO REVIEW Oupu will hen be displayed a our screen only. If we re-specify PROFILE REVIEW a any poin of he session, he oupu will again be direced o he file SCAOUTP.OTP. In he PC environmen, any new oupu direced o he file is appended so ha previous informaion will no be overwrien. However, previous oupu will be overwrien in he mainframe or worksaion environmen. In he PC environmen, we may review he oupu informaion on he file a any ime by enering REVIEW The curren SCA session will be suspended emporarily and we can review wha we have roued o he file. Scrolling insrucions a his ime are accessed hrough he movemen keys on he numeric keypad (Pageup, Pagedown, Home, End, arrow up, arrow down). To erminae his review of oupu and coninue wih our SCA session, we press he ESC key. In order o review his oupu informaion on a mainframe compuer or worksaion we can emporarily suspend he curren SCA session by using he OS paragraph (see Secion E.4). The file SCAOUTP.OTP can be viewed using a local edior. If he SCA Sysem is accessed hrough he SCA Windows/Graphics Package, oupu informaion is auomaically sored on he file SCAOUTP.OTP on our PC and appears in he SCA oupu window. Oupu informaion can be reviewed a any ime during he SCA session by scrolling he oupu window. The file SCAOUTP.OTP exiss in he PC subdirecory \SCAWIN and is available a he end of an SCA session. The file SCAOUTP.OTP is auomaically opened and rewound when a new SCA session is sared. Hence, if we wan o keep a permanen copy of his file, we mus eiher rename he file, or copy he file, before we invoke a new SCA session.

414 UTILITY RELATED INFORMATION E.5 E.2.2 Adjusing inpu and oupu widh The defaul display (oupu) widh for he SCA Sysem is 80 columns. Similarly he defaul inpu widh is 72 columns. These defauls accommodae all inpu and oupu devices. We may find i convenien o re-adjus hese defauls o beer reflec he devices we are employing or he oupu we will generae. For example, we can exend he inpu widh o 80 columns and display (oupu) widh o 132 columns by enering PROFILE IWIDTH IS 80. OWIDTH IS 132. To be cerain ha we have hese widhs hroughou our session, we should make his he firs command wihin our SCA session. E.3 Managing he SCA Workspace: he WORKSPACE paragraph Alhough he SCA Sysem manages he workspace auomaically, on occasion we may need o manage he workspace ourselves. This is especially rue if we need o creae more space in our workspace for large daa ses (by deleing curren variables from our workspace) or if we wish o copy (or rerieve) our workspace o (from) an exernal file. E.3.1 Saving and rerieving a workspace We can suspend an SCA session, and coninue from where we were, by saving he conens of our curren workspace o a file, and laer rerieving i. The SCA Sysem auomaically assigns a workspace file as uni 9 when we sar a session. To save workspace o his file, we can ener or simply WORKSPACE MEMORY IS SAVED (9) WORKSPACE SAVED To recall his workspace a some laer ime, we can ener or simply WORKSPACE MEMORY IS RECALLED (9) WORKSPACE RECALLED Noe ha if we use a file name oher han he one assigned by he SCA Sysem, we mus use he ASSIGN paragraph o associae he file wih he appropriae uni number (see Example 1 in Secion E.1).

415 E.6 UTILITY RELATED INFORMATION E.3.2 Deleing variables from he workspace The WORKSPACE paragraph is used if we need o remove variables from he curren workspace. For example, if we need o delee he variables A1DATA, BDATA, and CDATA, we can ener WORKSPACE DELETE A1DATA, BDATA, CDATA. COMPRESS. The COMPRESS senence is included o compress he space occupied by remaining variables. If we do no specify his senence, hen he SCA Sysem may no compress he workspace auomaically. E.3.3 Workspace conen We can display he conen of our workspace (i.e., variable and model names) and he amoun of space occupied, by enering he command WORKSPACE CONTENT E.3.4 Increasing he size of he SCA workspace On occasions in an SCA session, especially when a large daa se is involved or in he esimaion of many parameers in a mulivariae ime series model, he amoun of available workspace may no be sufficien. If we find ha more workspace is necessary o coninue an analysis, he following seps should be aken in an ineracive SCA session: (1) Save he conens of he curren SCA workspace o an exernal file. This is accomplished by he WORKSPACE paragraph (using he SAVED opion of he MEMORY senence). (2) Exi he SCA Sysem (i.e., STOP). (3) Re-execue he SCA load module wih more workspace allocaed. (See Appendix D of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies or a local compuer consulan for he insrucions appropriae for he hos compuer environmen.) (4) Once in a new SCA session, we may recall he conens of he old SCA workspace back o he curren session by he WORKSPACE paragraph (using he RECALLED opion of he MEMORY senence). As a resul of he above seps, we now have he conens of he previously saved SCA workspace bu wih a larger size a our disposal. In his way an analysis may coninue from he poin i was sopped. However, if he SCA Sysem is exied before he curren workspace is saved o an exernal file, he conens of he curren memory are los.

416 UTILITY RELATED INFORMATION E.7 E.4 Access o he Hos Operaing Sysem, he OS Paragraph Frequenly i is desirable o be able o access he operaing sysem commands of he hos compuer while sill in an SCA session. The SCA Sysem provides us wih such a capabiliy wih he use of he OS (Operaing Sysem) paragraph. If we ener OS during an SCA session, we emporarily ener he operaing sysem environmen. A his ime, mos of he operaing sysem commands, such as ex ediing, file allocaion, de-allocaion (freeing), copying, and lising can be performed. However, some operaing sysem commands may be inaccessible. For more informaion on wha may be accessed, we may need o check wih Appendix D of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies or local consulans. We may reurn o he SCA session by issuing a QUIT or END saemen (or exi saemen on HP/UX). E.5 The RESTART Paragraph In some siuaions, we may work on several unrelaed analyses during he same SCA session. I may be desirable o re-iniialize he workspace once a ask is compleed. This can be achieved by issuing a RESTART saemen. This effecively erases he curren workspace.

417 E.8 UTILITY RELATED INFORMATION SUMMARY OF THE SCA PARAGRAPHS IN APPENDIX E This secion provides a summary of hose SCA paragraphs employed in his appendix. In mos cases, he synax presened for a paragraph reflecs only a porion of he capabiliies of he paragraph. More complee informaion may be found in Chaper 8 of The SCA Saisical Sysem: Reference Manual for Fundamenal Capabiliies. Each SCA paragraph begins wih a paragraph name and is followed by modifying senences. Senences ha may be used as modifiers for a paragraph are shown below and he ypes of argumens used in each senence are also specified. Senences no designaed required may be omied as defaul condiions (or values) exis. The mos frequenly used required senence is given as he firs senence of he paragraph. The porion of his senence ha may be omied is underlined. This porion may be omied only if his senence appears as he firs senence in a paragraph. Oherwise, all porions of he senence mus be used. The las characer of each line excep he las line mus be he coninuaion characer,. The paragraphs o be explained in his summary are ASSIGN, PROFILE, WORKSPACE, OS, and RESTART. Legend (see Chaper 2 for furher explanaion) v i w c : variable name : ineger : keyword : characer daa (mus be enclosed wihin single aposrophes) ASSIGN Paragraph Synax of he ASSIGN Paragraph (A) Assigning an exising file ASSIGN FILE IS i. EXTERNAL-NAME IS 'c'. ATTRIBUTE IS ACCESS(READ/WRITE/BOTH), SHARE(YES/NO). Required senences: FILE and EXTERNAL

418 UTILITY RELATED INFORMATION E.9 (B) Assigning a new file ASSIGN NEW-FILE. FILE IS i. EXTERNAL-NAME IS 'c'. ATTRIBUTES ARE ACCESS(READ/WRITE/BOTH),SHARE(YES/NO), FILE_FORMAT(FORMAT/BINARY), TRACKS(i),BLKSIZE(i),RECLENGTH(i), DISPOSITION(CATALOG/DELETE). Required senences: FILE, EXTERNAL and NEW-FILE Senences Used in he ASSIGN Paragraph FILE senence The FILE senence is used o specify a file uni number for a new or an exising file in an SCA session. On some operaing sysems, his uni number may only be valid wihin he same SCA session. EXTERNAL-NAME senence The EXTERNAL-NAME senence specifies he file name used by he hos compuer's operaing sysem. File name convenions may differ from compuer o compuer. The user should consul local documenaion for exernal file name convenions. NEW-FILE senence The NEW-FILE senence is used o indicae ha he file o be assigned is a new file. The defaul is NO NEW-FILE, i.e., he file exiss. ATTRIBUTE senence The ATTRIBUTE senence is used o specify he characerisics of a file. The keywords in his senence are: ACCESS : specifies wheher he file is READ only, WRITE only, or boh READ and WRITE (BOTH). The specificaion is only valid wihin he same SCA session. The defaul is READ only. Noe ha a file used for saving daa, workspace, or oupu mus be assigned as wriable. SHARE : specifies wheher he file will be used in sharing or exclusive mode. Sharing denoes he file may be used by more han one user a he same ime. Exclusive denoes ha he file may no be shared. The defaul is YES, i.e., sharing mode. FILE_FORMAT : specifies wheher he file is a FORMATTED or BINARY file. The defaul is FORMATTED file.

419 E.10 UTILITY RELATED INFORMATION TRACKS : specifies he number of racks o be iniially assigned o he file. The defaul is 10 racks. BLKSIZE : specifies he block size (in characers) of he file. The defaul is 1600 characers. RECLENGTH: specifies he logical record lengh (in characers) of he file. The defaul is 80 characers. DISPOSITION : specifies wheher he file is o be CATALOGUED or DELETED afer file is freed. The defaul is CATALOG. PROFILE Paragraph The PROFILE paragraph is used o conrol key feaures of an SCA session, such as rouing informaion o a file, he widh of inpu/oupu devices, and he level of oupu desired. Synax for he PROFILE Paragraph PROFILE REVIEW/NO REVIEW. STYLE IS w. ECHO./NO ECHO. IWIDTH IS i. OWIDTH IS i. OUTPUT-LEVEL IS w. Required senences: none Senences Used in he PROFILE Paragraph REVIEW senence The REVIEW senence is used o specify ha oupu will be simulaneously displayed on he erminal device and roued o he file SCAOUTP.OTP. This dual rouing is coninued unil he senence NO REVIEW is specified. STYLE senence The STYLE senence is used o specify he level of promping provided o he user during an SCA session. The syle of an SCA session is eiher bach or ineracive. The keyword BATCH mus be specified if he sysem is used in bach mode. For he ineracive mode, he syle may be eiher ALL or PARTIAL. The defaul syle is PARTIAL. In a PARTIAL session, required senences and some oher imporan senences are promped if hey are no provided as basic insrucions. All logical senences and assignmen senences wih defauls are no promped. An ALL syle will cause all

420 UTILITY RELATED INFORMATION E.11 senences o be promped unless he senence is specified in he basic se of user insrucions or he senence is rarely used. ECHO senence The ECHO senence is used o specify he echo (display) of user's insrucions. When ECHO is specified, he SCA Sysem will display user insrucions afer hey are enered. This opion is also useful when he inpu insrucions come from cards (e.g., in bach mode) raher han from he erminal or when a macro procedure (see Appendix D) is invoked. When he inpu insrucions come from he erminal, he ECHO opion is also useful since he communicaion line which connecs he erminal and he compuer may be noisy (defecive) on occasions. This opion allows he user o know wha informaion he compuer acually received. The NO ECHO insrucion urns off he display of basic insrucions. The defaul opion is ECHO. IWIDTH senence The IWIDTH senence is used o specify he widh (in number of characers) for he inpu device. The widh may range from 60 o 80 characers. The IWIDTH also applies o saemens from a macro procedure (see Appendix D) or daa from a file. The widh of records on a daa file can also be specified in he INPUT paragraph (see Chaper 2). Since columns 73 o 80 on a record are usually reserved for sequence numbers, he defaul widh is assumed o be 72. OWIDTH senence The OWIDTH senence is used o specify he widh (in number of characers) of he oupu device. Boh he analyic and English-like saemens auomaically adjus he oupu forma according o he specified oupu device widh. The defaul oupu widh is 80 characers. OUTPUT-LEVEL senence The OUTPUT-LEVEL senence is used o indicae he overall oupu level desired in an SCA session. The keyword is NONE, BRIEF, NORMAL, or DETAILED. The defaul oupu amoun is NORMAL. If NONE is specified, he echo of he basic insrucions is also urned off. No oupu is displayed when an analyic saemen is used, and he oupu from an English-like saemen is same as in BRIEF level. The user is responsible for mos of he oupu. This opion is useful when he SCA Sysem is used sricly as a programming language. If BRIEF, NORMAL, or DETAILED is specified, he SCA Sysem ses a defaul level of oupu for each English-like saemen according o he specified level. This defaul opion may be modified in a paricular paragraph by he OUTPUT senence of he paragraph.

421 E.12 UTILITY RELATED INFORMATION WORKSPACE Paragraph The WORKSPACE paragraph is used o manage he user's workspace, such as displaying curren saus, deleing unneeded variables, saving or recalling he workspace, or consolidaing he unused workspace. Synax for he WORKSPACE Paragraph WORKSPACE MEMORY IS SAVED(i), RECALLED(i). DELETE v1, v2, ---. COMPRESSION./NO COMPRESSION. NOVAR-REQUIRED IS i. CONTENT./NO CONTENT. Required senences: none Senences Used in he WORKSPACE Paragraph MEMORY senence The MEMORY senence is used o save he conens of he curren SCA workspace o a file or recall a previously saved SCA workspace from a file. The SAVED keyword specifies he logical uni of he file where he workspace will be saved, and RECALLED specifies he logical uni of he file conaining he workspace o be recalled. If boh SAVED and RECALLED are used, he curren workspace is firs saved o he designaed file and hen a previous workspace is recalled from anoher name. The defaul logical uni for a workspace file is 9. Therefore if he defaul file uni is used, he following wo saemens are boh accepable WORKSPACE MEMORY IS SAVED. (or simply WORKSPACE SAVED.) WORKSPACE MEMORY IS RECALLED. (or simply WORKSPACE RECALLED.) DELETE senence The DELETE senence is used o specify he names of he variables and/or models o be deleed. Noe ha he deleion does no increase he available workspace unless he workspace is compressed. COMPRESSION senence The COMPRESSION senence is used o specify he compression of he SCA workspace. When a variable is deleed, wheher implicily by he processor or explicily by he user, he SCA Sysem does no compress he workspace immediaely. When he user runs ou of workspace, unneeded variables and models may be deleed and he workspace compressed in order o release unused workspace. The defaul opion is NO COMPRESSION.

422 UTILITY RELATED INFORMATION E.13 NOVAR senence The NOVAR senence is used o specify he number of addiional variables desired in an SCA session beyond hose already in he workspace. The SCA Sysem iniially allows up o 150 variables in he workspace. If he user requires more han 150 variables, he variable lis may be expanded o mee he user's requiremen. CONTENT senence The CONTENT senence requess he sysem o display he bookkeeping informaion of an SCA session. The bookkeeping informaion includes he names of he variables and models in he workspace, and he amoun of workspace used. The defaul is NO CONTENT. OS Paragraph The OS paragraph is used o access he hos compuer's operaing sysem commands during an SCA session. Mos of he operaing sysem commands, such as file allocaion, deallocaion (freeing), copying, lising, and ex ediing, can hen be accessed. However, some operaing sysem commands may be inaccessible. The OS paragraph does no have any modifying senences. The user may reurn o he SCA session by issuing a QUIT saemen. RESTART Paragraph The RESTART paragraph is used o iniialize he SCA workspace and begin anoher SCA session. The RESTART paragraph has no modifying senences.