11 Multiple Linear Regression



Similar documents
Finite Dimensional Vector Spaces.

Online Insurance Consumer Targeting and Lifetime Value Evaluation - A Mathematics and Data Mining Approach

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

REVISTA INVESTIGACIÓN OPERACIONAL VOL., 32, NO. 2, , 2011

DEVELOPMENT OF MODEL FOR RUNNING DIESEL ENGINE ON RAPESEED OIL FUEL AND ITS BLENDS WITH FOSSIL DIESEL FUEL

QUANTITATIVE METHODS CLASSES WEEK SEVEN

Modern Portfolio Theory (MPT) Statistics

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

VARIABLE SELECTION IN JOINT MEANS AND VARIANCE MODELS OF THE PARETO DISTRIBUTION

Section 3: Logistic Regression

Evaluating Direct Marketing Practices On the Internet via the Fuzzy Cognitive Mapping Method

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

Problem Set 6 Solutions

NEURAL DATA ENVELOPMENT ANALYSIS: A SIMULATION

TIME VALUE OF MONEY: APPLICATION AND RATIONALITY- AN APPROACH USING DIFFERENTIAL EQUATIONS AND DEFINITE INTEGRALS

ENGINEERING COMPUTATION BY ARTIFICIAL NEURAL NETWORKS. Explaining Neural Networks

Regression Analysis. 1. Introduction

DYNAMIC PROGRAMMING APPROACH TO TESTING RESOURCE ALLOCATION PROBLEM FOR MODULAR SOFTWARE

Evaluating Microsoft Hyper-V Live Migration Performance Using IBM System x3650 M3 and IBM N series N5600

Question 3: How do you find the relative extrema of a function?

Simple Linear Regression

Term Structure of Interest Rates: The Theories

Initial inventory levels for a book publishing firm

PARTICULAR RELIABILITY CHARACTERISTICS OF TWO ELEMENT PARALLEL TECHNICAL (MECHATRONIC) SYSTEMS

Control of Perceived Quality of Service in Multimedia Retrieval Services: Prediction-based mechanism vs. compensation buffers

Learning & Development

INFLUENCE OF DEBT FINANCING ON THE EFFECTIVENESS OF THE INVESTMENT PROJECT WITHIN THE MODIGLIANIMILLER THEORY

The simple linear Regression Model

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

Traffic Flow Analysis (2)

Online Load Balancing and Correlated Randomness

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

REFINED CALCULATION AND SIMULATION SYSTEM OF LOCAL LARGE DEFORMATION FOR ACCIDENT VEHICLE

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

Facts About Chronc Fatgu Syndrom - sample thereof

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

81-1-ISD Economic Considerations of Heat Transfer on Sheet Metal Duct

5.4 Exponential Functions: Differentiation and Integration TOOTLIFTST:

Authenticated Encryption. Jeremy, Paul, Ken, and Mike

Approximate Counters for Flash Memory

Basic statistics formulas

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

GROUP MEDICAL INSURANCE PROPOSAL FORM GROUP MEDICAL INSURANCE PROPOSAL FORM

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Numerical and Experimental Study on Nugget Formation in Resistance Spot Welding for High Strength Steel Sheets in Automobile Bodies

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Load and Resistance Factor Design (LRFD)

The Digital Signature Scheme MQQ-SIG

Intermediate Macroeconomic Theory / Macroeconomic Analysis (ECON 3560/5040) Final Exam (Answers)

Dehumidifiers: A Major Consumer of Residential Electricity

Bank Incentives, Economic Specialization, and Financial Crises in Emerging Economies

Basis risk. When speaking about forward or futures contracts, basis risk is the market

Chapter Eight. f : R R

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

ANALYSIS OF ORDER-UP-TO-LEVEL INVENTORY SYSTEMS WITH COMPOUND POISSON DEMAND

Bayesian Network Representation

Foreign Exchange Markets and Exchange Rates

Reinsurance and the distribution of term insurance claims

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Category 7: Employee Commuting

Performance Evaluation

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

New Basis Functions. Section 8. Complex Fourier Series

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Repulsive Force

Questions? Ask Prof. Herz, General Classification of adsorption

Long run: Law of one price Purchasing Power Parity. Short run: Market for foreign exchange Factors affecting the market for foreign exchange

ERLANG C FORMULA AND ITS USE IN THE CALL CENTERS

Statistical Techniques for Sampling and Monitoring Natural Resources

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

SPECIAL VOWEL SOUNDS

[ ] These are the motor parameters that are needed: Motor voltage constant. J total (lb-in-sec^2)

West Virginia. Instructions. Income/Business Franchise Tax for S Corps & Partnerships (Pass-Through Entities) Guyandotte River, Mingo County

Mininum Vertex Cover in Generalized Random Graphs with Power Law Degree Distribution

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

(Analytic Formula for the European Normal Black Scholes Formula)

Expert Systems with Applications

On Error Detection with Block Codes

Exponential Generating Functions

Average Price Ratios

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education.

Transcription:

11 Multpl Lar Rgrsso Multpl lar rgrsso (MLR) s a mthod usd to modl th lar rlatoshp btw a dpdt varabl ad o or mor dpdt varabls. Th dpdt varabl s somtms also calld th prdctad, ad th dpdt varabls th prdctors. MLR s basd o last squars: th modl s ft such that th sum-of-squars of dffrcs of obsrvd ad prdctd valus s mmzd. MLR s probably th most wdly usd mthod ddroclmatology for dvlopg modls to rcostruct clmat varabls from tr-rg srs. Typcally, a clmatc varabl s dfd as th prdctad ad tr-rg varabls from o or mor sts ar dfd as prdctors. Th modl s ft to a prod th calbrato prod for whch clmatc ad tr-rg data ovrlap. I th procss of fttg, or stmatg, th modl, statstcs ar computd that summarz th accuracy of th rgrsso modl for th calbrato prod. Th prformac of th modl o data ot usd to ft th modl s usually chckd som way by a procss calld valdato. Fally, tr-rg data from bfor th calbrato prod ar substtutd to th prdcto quato to gt a rcostructo of th prdctad. Th rcostructo s a prdcto th ss that th rgrsso modl s appld to grat stmats of th prdctad varabl outsd th prod usd to ft th data. Th ucrtaty th rcostructo s summarzd by cofdc trvals, whch ca b computd by varous altratv ways. Rgrsso has log b usd ddroclmatology for rcostructg clmat varabls from tr rgs. A fw xampls of ddroclmatc studs usg lar rgrsso ar rcostructo of aual prcptato th Pacfc Northwst (Graumlch 1987), rcostructo of ruoff of th Wht Rvr, Arkasas (Clavlad ad Stahl 1989), rcostructo of a dx of th El No Southr Oscllato (Mchals 1989), ad rcostructo of a drought dx for Iowa (Clavlad ad Duvck 199). MLR s ot strctly a tm srs mthod. Th most mportat pot applcato to tm srs s that obsrvatos ar typcally ot dpdt of o aothr. As a cosquc, spcal attto must b pad to th rgrsso assumpto about th dpdc of th rsduals. Th prdctors ay rgrsso problm mght b trcorrlatd. Itrcorrlato of prdctors dos ot valdat th us of rgrsso, but ca mak t dffcult or mpossbl to assss th rlatv mportac of dvdual prdctors from th stmatd coffcts of th rgrsso quato. Extrmly hgh trcorrlato of prdctors, or multcolarty, xacrbats ay dffculty of trprtg th rgrsso coffcts, ad may call for combato of substs of prdctors to a w st of lss-trcorrlatd prdctors. Rgrsso modls ar grally ot tdd to b appld to prdctor data outsd th rag coutrd th calbrato prod. Ths prsts a dlmma ddroclmatology bcaus som of th most trstg sgmts of tr-rg rcostructos portray xtrm ad somtms uqu clmatc aomals. Th rcostructo for thos prods s lkly to b mor ucrta tha mpld by rgrsso statstcs bcaus th prdctors ar a part of th multvarat prdctor spac ot sampld by th data usd to ft th modl. Th statstcal aspcts of ths problm ca b addrssd by dstgushg prdctos as xtrapolatos, as opposd to trpolatos. Th MLR modl s rvwd blow, wth mphass o topcs of partcular trst for tm srs. Mor dtald formato ca b foud may stadard rfrcs for xampl, a statstcal txt o rgrsso (Wsbrg 1985), a chaptr o rgrsso as appld to th atmosphrc sccs (Wlks 1995) ad a moograph o rgrsso a tm srs cotxt (Ostrom 199). Nots_11, GEOS 585A, Sprg 15 1

11.1 Modl Modl quato. Th modl xprsss th valu of a prdctad varabl as a lar fucto of o or mor prdctor varabls ad a rror trm: y b b x b x b x 1,1, K, K, k th x v a lu o f k p r d c to r y a r b b k K r g r s s o c o s ta t c o ffc t o th k p r d c to r to ta l u m b r o f p r d c to rs th (1) y = p r d c ta d y a r rro r t rm Prdcto quato. Th modl (1) s stmatd by last squars, whch ylds paramtr stmats such that th sum of squars of rrors s mmzd. Th rsultg prdcto quato s ˆ ˆ ˆ ˆ () y ˆ b b x b x b x 1,1, K, K whr th varabls ar dfd as (1) xcpt that ^ dots stmatd valus Rsduals. Th rror trm quato (1) s ukow bcaus th tru modl s ukow. Oc th modl has b stmatd, th rgrsso rsduals ar dfd as ˆ y yˆ y yˆ o b s rv d v a lu o f p r d c ta d y a r p r d c t d v a lu o f p r d c ta d y a r Th rsduals masur th closss of ft of th prdctd valus ad actual prdctad th calbrato prod. Th algorthm for stmatg th rgrsso quato (soluto of th ormal quatos) guarats that th rsduals hav a ma of zro for th calbrato prod. Th varac of th rsduals masurs th sz of th rror, ad s small f th modl fts th data wll. (3) 11. Assumptos Th MLR modl s basd o svral assumptos. Provdd th assumptos ar satsfd, th rgrsso stmators ar optmal th ss that thy ar ubasd, ffct, ad cosstt. Ubasd mas that th xpctd valu of th stmator s qual to th tru valu of th paramtr. Effct mas that th stmator has a smallr varac tha ay othr stmator. Cosstt mas that th bas ad varac of th stmator approach zro as th sampl sz approachs fty. Ostrom (199, p. 14) lsts sx basc assumptos for th rgrsso modl: 1. Larty: th rlatoshp btw th prdctad ad th prdctors s lar. Th MLR modl appls to lar rlatoshps. If rlatoshps ar olar, thr ar two rcourss: Nots_11, GEOS 585A, Sprg 15

(1) trasform th data to mak th rlatoshps lar, or () us a altratv statstcal modl (.g., ural tworks, bary classfcato trs). Scattrplots should b chckd as a xploratory stp rgrsso to dtfy possbl dparturs from larty.. Nostochastc X: E ( X ). Th rrors ar ucorrlatd wth th dvdual prdctors., k Ths assumpto s chckd rsduals aalyss wth scattrplots of th rsduals agast dvdual prdctors. Volato of th assumpto mght suggst a trasformato of th prdctors. 3. Zro ma: E. Th xpctd valu of th rsduals s zro. Ths assumpto caot b chckd bcaus w hav accss to th stmatd rgrsso rsduals, but ot to th tru (ukow) rrors. Th last-squars mthod usd to stmat th rgrsso quato guarats that th ma of th stmatd rsduals s zro. (A chck that th stmatd rsduals hav zro-ma s thrfor potlss.) 4. Costat varac: E. Th varac of th rsduals s costat. I tm srs applcatos, a volato of ths assumpto s dcatd by som orgazd pattr of dpdc of th rsduals o tm. A xampl of volato s a pattr of rsduals whos scattr (varac) crass ovr tm. Aothr aspct of ths assumpto s that th rror varac should ot chag systmatcally wth th sz of th prdctd valus. For xampl, th varac of rrors should ot b gratr wh th prdctd valu of th prdctad s larg tha wh th prdctd valu s small. 5. Noautorgrsso: E, m. Th rsduals ar radom, or ucorrlatd tm. m Ths assumpto s o most lkly to b volatd tm srs applcatos. Svral mthods of chckg th assumpto ar covrd latr. 6. Normalty: th rror trm s ormally dstrbutd. Ths assumpto must b satsfd for covtoal tsts of sgfcac of coffcts ad othr statstcs of th rgrsso quato to b vald. It s also possbl to mak o xplct assumpto about th form of th dstrbuto ad to appal stad to th Ctral Lmt Thorm to justfy th us of such tsts. Th ormalty assumpto s th last crucal of th rgrsso assumptos. 11.3 Statstcs Sum-of-squars trms. Svral rgrsso statstcs ar computd as fuctos of th sumsof-squars trms: S S E 1 ˆ s u m o f s q u a r s, rro r S S T 1 y y s u m o f s q u a r s, to ta l (4) S S R = 1 yˆ y s u m o f s q u a r s, r g r s s o s a m p l s z ( u m b r o f o b s rv a to s c a lb ra to p ro d ) Nots_11, GEOS 585A, Sprg 15 3

Parttog of varato. Th rgrsso quato s stmatd such that th total sum-ofsquars ca b parttod to compots du to rgrsso ad rsduals: S S T S S R S S E (5) Coffct of dtrmato. Th xplaatory powr of th rgrsso s summarzd by ts R-squard valu, computd from th sums-of-squars trms as S S R R S S E 1 (6) S S T S S T R, also calld th coffct of dtrmato, s oft dscrbd as th proporto of varac accoutd for, xplad, or dscrbd by rgrsso. It s mportat to kp md that, just as corrlato dos ot mply causato, a hgh R rgrsso dos ot mply causato. Th rlatv szs of th sums-of-squars trms dcat how good th rgrsso s trms of fttg th calbrato data. If th rgrsso s prfct, all rsduals ar zro, SSE s zro, ad R s 1. If th rgrsso s a total falur, th sum-of-squars of rsduals quals th total sum-ofsquars, o varac s accoutd for by rgrsso, ad R s zro. ANOVA tabl ad dfto of ma squard trms. Th sums-of-squars trms ad rlatd statstcs ar oft summarzd a Aalyss of Varac (ANOVA) tabl: Sourc df SS MS Total 1 SST MST = SST/(-1) Rgrsso K SSR M S R S S R / K Rsdual K 1 SSE M S E S S E /( K 1) Sourc=sourc of varato SS=sum-of-squars trm df =dgrs of frdom for SS trm MS= ma squard trms Th ma squard trms ar th sums-of-squars trms dvdd by th dgrs of frdom. Stadard rror of th stmat. Th rsdual ma squar (MSE) s th sampl stmat of th varac of th rgrsso rsduals. Th otato for th populato valu of th rror varac s somtms wrtt as, whl th sampl stmat of that varac s gv by s M S E (7) whr MSE has b dfd prvously. Th squar root of th rsdual ma squar s calld th root-ma-squar rror (RMSE), or th stadard rror of th stmat s s M S E R M S E c (8) Th subscrpt c s attachd ( R M S E ) (8) to dstgush th RMSE drvd from calbrato c from th root-ma-squar rror drvd by cross-valdato (s latr). F rato, or ovrall F. Rcall that th xplaatory powr of a rgrsso s gv by th rgrsso R, whch s computd from sums-of-squars trms. Th F-rato, or ovrall F, whch Nots_11, GEOS 585A, Sprg 15 4

s computd from th ma squard trms th ANOVA tabl, stmats th statstcal sgfcac of th rgrsso quato. Th F-rato s gv by M S R F (9) M S E Th advatag of th F- rato ovr R s that th F- rato taks to accout th dgrs of frdom, whch dpd o th sampl sz ad th umbr of prdctors th modl. A modl ca hav a hgh R ad stll ot b statstcally sgfcat f th sampl sz s ot larg compard wth th umbr of prdctors th modl. Th F- rato corporats sampl sz ad umbr of prdctors a assssmt of sgfcac of th rlatoshp. Th sgfcac of th F- rato s obtad by rfrrg to a tabl of th F dstrbuto, usg dgrs of frdom {df1,df}, whr df1 ad df ar th dgrs of frdom for th rgrsso ma squar ad rsdual ma squar from th ANOVA tabl. Adjustd R. Th R valu for a rgrsso ca b mad arbtrarly hgh smply by cludg mor ad mor prdctors th modl. Th adjustd R s o of svral statstcs that attmpts to compsat for ths artfcal cras accuracy. Th adjustd R s gv by 1 M S E M S T R (1) whr MSE ad MST ar th ma squard trms prvously dfd th ANOVA tabl. Rfrrg to th ANOVA tabl shows that rato of ma squard trms s rlatd to th rato of sum-of-squars trms by M S E ( 1) S S E (11) M S T K 1 S S T whr s th umbr of obsrvatos, ad K s th umbr of prdctors. Bcaus 1 must b gratr tha zro, t ca mmdatly b s that adjustd R must b K 1 smallr tha R, ad that th dffrc th two statstcs dpds o both th sampl sz ad th umbr of prdctors th modl. Cofdc trval for stmatd coffcts. If th rgrsso assumptos o th rsduals ar satsfd, cludg th ormalty assumpto, th th samplg dstrbuto of a stmatd rgrsso coffct s ormal wth a varac proportoal to th rsdual ma squar (MSE). Th varac of th stmator also dpds o th varacs ad covaracs of th prdctors. Th da s bst llustratd for th cas of smpl lar rgrsso (o prdctor), for whch th varac of th stmatd rgrsso coffct s gv by v a r( bˆ ) 1 1 whr s s th rsdual ma squar, x s th valu of th prdctor yar, x s th ma of th prdctor, ad th summato s ovr th yars th calbrato prod. Th 1 (1 )% cofdc trval s b ˆ t v a r( b ˆ ), whr t 1 / 1 s obtad from a t dstrbuto wth / dgrs of frdom. x s x (1) Nots_11, GEOS 585A, Sprg 15 5

For xampl, f th sampl sz s 45 yars, th umbr of dgrs of frdom s 43. If th 95% cofdc trval s dsrd, th approprat -lvl s.5. A t-tabl for ths sampl sz ad -lvl gvs t. 5, 4 3.. Th corrspodg cofdc trval s b ˆ. v a r( b ˆ ) (13) 1 1 To a approxmato, th 95% cofdc trval for ad stmatd rgrsso paramtr s two stadard dvatos aroud th stmat. For mor tha o prdctor, th cofdc trvals for rgrsso ca b computd smlarly, but th quato s mor complcatd. Th quato for th varacs ad covaracs of stmatd coffcts s xprssd matrx trms by ˆ T 1 v a r( β ) s ( X X ) (14) whr X s th tm srs matrx of prdctors. Equato (14) rturs a matrx, wth th varacs of th paramtrs alog th dagoal, ad th covaracs as th off-dagoal lmts (Wsbrg 1985, p. 44). Th approprat dgrs of frdom of th t dstrbuto s d f K 1, whr K s th umbr of prdctors th modl, ad s th sampl sz. 11.4 Slctg prdctors Gral gudls. Th prdctors for a MLR modl ar somtms spcfd bforhad, ad ar somtms slctd by som automatd procdur from a pool of pottal prdctors. Varous schms for automatd varabl scrg ar avalabl. For xampl, th forward stpws mthod trs addtoal prdctors o by o dpdg o maxmum rducto of th rsdual varac (varac ot accoutd for by prdctors alrady th modl). Th bst substs mthod tsts all possbl sts of 1,, 3, prdctors ad slcts th st gvg th bst valu of accuracy adjustd for loss of dgrs of frdom as masurd by ay of svral possbl statstcs. It s grally a good da to rstrct th pool of pottal prdctors to varabls wth som plausbl physcal lk to th prdctad. For xampl, f th prdctad s Tucso sasoal prcptato, tr-rg dcs from th Sata Catala Moutas ar physcally rasoabl 18 prdctors, whl O layrs of a splothm from ctral Cha s ot. It s also mportat that th prdctor pool ot b mad ucssarly larg for xampl by cludg all sorts of trasformatos of th orgal varabls th pool. Ths s mportat bcaus R ca b srously basd wh th pool cluds a larg umbr of pottal prdctors, v f oly a fw of thos prdctors ar slctd for th fal modl (Rchr ad Pu 198). Laggd prdctors ad prwhtd prdctors. Laggd rgrsso modls rfr to modls whch th rlatoshp btw th prdctors ad prdctad s ot costrad to b cotmporaous. I a dstrbutd lag modl, th prdctors of y clud x t t m, whr m mght tak o som valu bsds zro. I coomtrcs trmology, ths partcular modl s a dstrbutd lag modl wth laggd xogous varabls (varabls outsd of or ot dpdt o th modl). I ddroclmatology, postv lags o th prdctors dstrbutd-lag modls ar asy to ratoalz: th clmat yar t affcts tr growth ot just yar t, but succdg yars; thus th tr-rg dcs for yars t+m mght form o what th clmat was yar t. Th cas for gatv lags s lss obvous, but plausbl: th tr-rg yar t holds formato o th clmat yar t, but th formato s cloudd by th prcodtog of th rg yar t by clmat or bology of arlr yars; thus cludg gatv lags o th tr-rg dx lts th Nots_11, GEOS 585A, Sprg 15 6

modl compsat or rmov th cofoudg ffcts of pror yars clmat o th currt yar s rg. Th ddroclmatc stratgy of prwhtg tr-rg dcs as prdctors rgrsso modls to rcostruct clmat s groudd smlar ratoal to that for usg gatvly laggd prdctors dstrbutd-lag modls. Wth prwhtg, a tm srs modl (say, a AR modl) s ft to th full lgth of tr-rg srs ad th modl rsduals ar rgardd as rsdual dcs. Ths rsdual dcs ar th usd as prdctors th clmat rcostructo modl. Th ratoal s that th currt yar s dx s prcodtod to som xtt by past codtos, cludg bology ad possbly clmat. Th tm-srs modlg prsumably adjusts for ths dstorto by rmovg th lar dpdc of th tr-rg dx o ts past valus. 11.5 Multcolarty Th prdctors a rgrsso modl ar oft calld th dpdt varabls, but ths trm dos ot mply that th prdctors ar thmslvs dpdt statstcally from o aothr. I fact, for atural systms, th prdctors ca b hghly trcorrlatd. Multcolarty s a trm rsrvd to dscrb th cas wh th trcorrlato of prdctor varabls s hgh. It has b otd that th varac of th stmatd rgrsso coffcts dpds o th trcorrlato of prdctors (quato (14)). Haa () cocsly summarzs th ffcts of multcolarty o th rgrsso modl. Multcolarty dos ot valdat th rgrsso modl th ss that th prdctv valu of th quato may stll b good as log as th prdcto ar basd o combatos of prdctors wth th sam multvarat spac usd to calbrat th quato. But thr ar svral gatv ffcts of multcolarty. Frst, th varac of th rgrsso coffcts ca b flatd so much that th dvdual coffcts ar ot statstcally sgfcat v though th ovrall rgrsso quato s strog ad th prdctv ablty good. Scod, th rlatv magtuds ad v th sgs of th coffcts may dfy trprtato. For xampl, th rgrsso wght o a tr-rg dx a multvarat rgrsso quato to prdct prcptato mght b gatv v though th tr-rg dx by tslf s postvly corrlatd wth prcptato. Thrd, th valus of th dvdual rgrsso coffcts may chag radcally wth th rmoval or addto of a prdctor varabl th quato. I fact, th sg of th coffct mght v swtch. Sgs of multcolarty. Sgs of multcolarty clud 1) hgh corrlato btw pars of prdctor varabls, ) rgrsso coffcts whos sgs or magtuds do ot mak physcal ss, 3) statstcally osgfcat rgrsso coffcts o mportat prdctors, ad 4) xtrm sstvty of sg or magtud of rgrsso coffcts to srto or dlto of a prdctor varabl. Varac Iflato Factor (VIF). Th Varac Iflato Factor (VIF) s a statstc that ca b usd to dtfy multcolarty a matrx of prdctor varabls. Varac Iflato rfrs hr to th mtod ffct of multcolarty o th varac of stmatd rgrsso coffcts. Multcolarty dpds ot just o th bvarat corrlatos btw pars of prdctors, but o th multvarat prdctablty of ay o prdctor from th othr prdctors. Accordgly, th VIF s basd o th multpl coffct of dtrmato rgrsso of ach prdctor multvarat lar rgrsso o all th othr prdctors: V IF 1 1 whr R s th multpl coffct of dtrmato a rgrsso of th th prdctor o all othr prdctors, ad V IF s th varac flato factor assocatd wth th th prdctor. Not that f th th prdctor s dpdt of th othr prdctors, th varac flato factor s o, R (15) Nots_11, GEOS 585A, Sprg 15 7

whl f th th prdctor ca b almost prfctly prdctd from th othr prdctors, th varac flato factor approachs fty. I that cas th varac of th stmatd rgrsso coffcts s uboudd. Multcolarty s sad to b a problm wh th varac flato factors of o or mor prdctors bcoms larg. How larg appars to b a subjctv judgmt. Accordg to Haa (), som rsarchrs us a VIF of 5 ad othrs us a VIF of 1 as a crtcal thrshold. Ths VIF valus corrspod, rspctvly, to R valus of.8 ad.9. Som comput th avrag VIF for all prdctors ad dclar that a avrag cosdrably largr tha o dcats multcolarty (Haa, ). At ay rat, t s mportat to kp md that multcolarty rqurs strog trcorrlato of prdctors, ot just o-zro trcorrlato. Th VIF s closly rlatd to a statstc call th tolrac, whch s 1 /V IF. Som statstcs packags rport th VIF ad som rport th tolrac (Haa ). 11.6 Aalyss of Rsduals Aalyss of rsduals cossts of xamg graphs ad statstcs of th rgrsso rsduals to chck that modl assumptos ar satsfd. Som frqutly usd rsduals tsts ar lstd blow. Tm srs plot of rsduals. Th tm srs plot of rsduals ca dcat such problms as o-costat varac of rsduals, ad trd or autocorrlato rsduals. A tm-dpdt varac mght show, say, as a crasg scattr of th rsduals about th zro l wth tm. Th slop of th scattr plot of rsduals o tm ca b tstd for sgfcac to dtfy trd rsduals. Scattrplot of rsduals agast prdctd valus. Th rsduals ar assumd to b ucorrlatd wth th prdctd valus of th prdctad. Volato s dcatd by som otcabl pattr of dpdc th scattrplots. For xampl, th rsdual mght flar out (crasd scattr) wth crasg valu of th prdctad; th rmdy mght b a trasformato (.g., log trasform) of th prdctad. Scattrplots of rsduals agast dvdual prdctors. Th rsduals ar assumd to b ucorrlatd wth th dvdual prdctors. Volato of ths assumptos would b dcatd by som otcabl pattr of dpdc th scattrplots, ad mght suggst trasformato of th prdctors. Hstogram of rsduals. Th rsduals ar assumd to b ormally dstrbutd. Accordgly, th hstogram of rsduals should rsmbl a ormal pdf. But kp md that a radom sampl from a ormal dstrbuto wll b oly approxmatly ormal, ad so th som dparturs from ormalty th apparac of th hstogram ar xpctd spcally for small sampl sz. Acf of rsduals. Th rsduals ar assumd ot to b autocorrlatd. If th assumpto s satsfd, th acf of rgrsso rsduals should ot b larg at ay o-zro lag. Spcal trst should b attachd to th lowst lags, sc physcal systms ar charactrzd by prsstc from yar to yar. Lag-1 scattrplot of rsduals. Ths plot also dals wth th assumpto of dpdc of rsduals. Th rsduals at tm t should b dpdt of th rsduals at tm t-1. Th Nots_11, GEOS 585A, Sprg 15 8

scattrplot should thrfor rsmbl a formlss clustr of pots. Algmt som drcto mght b vdc of autocorrlato of rsduals at lag 1. Durb-Watso. Th Durb-Watso (D-W) statstc tsts for autocorrlato of rsduals, spcfcally lag-1 autocorrlato. Th D-W statstc tsts th ull hypothss of o frst-ordr autocorrlato agast th altratv hypothss of postv frst-ordr autocorrlato. Th altratv hypothss mght also b gatv frst-ordr autocorrlato. Assum th rsduals follow a frst-ordr autorgrssv procss, p, whr s radom ad p s th frstordr autocorrlato coffct of th rsduals. If th tst s for postv autocorrlato of t t 1 t t rsduals, th hypothss for th D-W tst ca b wrtt H : p H A : p whr p s th populato valu of th frst-ordr autocorrlato coffct of rsduals. Th D- W statstc s gv by d ˆ ˆ 1 1 1 For postv autocorrlato, th rsduals at succssv tms wll td to b smlar, such that th umrator wll b small rlatv to th domator; th lmt, as th frst-ordr autocorrlato approachs 1, th umrator gos to zro ad th d gos to zro. It ca b show (Ostrom, 199, p. 8) that f th rsduals follow a frst-ordr autorgrssv procss, d s rlatd to th frst-ordr autocorrlato coffct, p, as follows Th abov quato mpls that d f o autocorrlato ( p ) ˆ (16) d (1 p) (17) d f th frst-ordr autocorrlato s 1. d 4 f th frst-ordr autocorrlato s 1. Durb ad Watso stablshd uppr ( d ) ad lowr ( d ) lmts for th sgfcac lvls of u l a computd d. Ths lmts ar avalabl tabls may statstcs txts, ad ar stord a usr-wrtt lookup tabl Matlab. Th applcato of th D-W tsts s to comput d from th rgrsso rsduals, choos a sgfcac lvl (.g., 99%), look up th uppr ad lowr lmts from th tabl, ad us a dcso rul dpdg o th altratv hypothss. For postv autocorrlato, th dcso rul s f d d r j c t H f d d d o o t r j c t H f d d d l l u u c o c lu s v Rjct H Accpt H?? (No Autocorrlato) Rjct H d l d u 4-d u 4-d l 4 Nots_11, GEOS 585A, Sprg 15 9 Fgur 1. Fv dcso rgos for valus of Durb-Watso d.

For gatv autocorrlato, th dcso rul s f d 4 d r j c t H l f d 4 d d o o t r j c t H u f 4 d d 4 d c o c lu s v u Accordg to Ostrom (199, p. 9), som rsarchrs rjct th D-W statstc favor of a smpl of rul of thumb for how much autocorrlato of rsduals s tolrabl rgrsso. Ths rul of thumb s that a altratv mthod to rgrsso should b usd f th frst-ordr autocorrlato of rsduals s gratr tha.3. Portmatau tst. Th portmatau statstc, or Q statstc, s dsgd to tst whthr th rgrsso rsduals ar purly radom, or wht os (Ostrom 199, p. 5). Ulk th D-W tst, th portmatau tst dos ot rstrct th possbl form of autocorrlato to frst-ordr autorgrssv. Th ull hypothss for th tst s that th rsduals ar compltly radom; th altratv hypothss s that th rsduals ar gratd by a autorgrssv or movg avrag modl of som ordr. If th rsduals ar radom, th acf of rsduals should b zro at all ozro lags. Th Q statstc s computd as l K k (18) Q N r whr r k s th lag-k autocorrlato coffct of th rgrsso rsduals, N s th lgth of th tm srs of rsduals, ad K s chos as th maxmum atcpatd ordr of autorgrssv or movg-avrag procss hypothszd udr th altratv hypothss to hav gratd th rsduals. As a rul of thumb, K should b chos as o largr tha about N /4, whr N s th lgth of th tm srs. If th ull hypothss s tru, Q s dstrbutd as ch-squar wth K dgrs of frdom. Larg acf coffcts lad to a hgh computd Q. A hgh Q thrfor dcats sgfcat autocorrlato ad rjcto of th ull hypothss. Th p-valu for Q s th probablty of obtag as hgh a Q as computd wh th ull hypothss s tru. Th p-valu for a computd Q ca obtad from a ch-squar tabl. I summary, rjcto of th ull hypothss s dcatd by larg acf coffcts ad hgh computd Q. Th mor sgfcat th computd Q, th lowr ts p-valu. 1 Rfrcs Clavlad, M., ad Duvck, D.N., 199, Iowa clmat rcostructd from tr rgs, 164-198, Watr Rsourcs Rsarch 8(1), 67-615. Clavlad, M.K., ad Stahl, D.W., 1989, Tr rg aalyss of surplus ad dfct ruoff th Wht Rvr, Arkasas, Watr Rsourcs Rsarch 5 (6), 1391-141. Graumlch, L.J., 1987, Prcptato varato th Pacfc Northwst (1675-1975) as rcostructd from tr rgs, Aals of th Assocato of Amrca Gographrs 77(1), 19-9. Haa C. T. () Statstcal mthods Hydrology, scod dto. Iowa Stat Uvrsty Prss, Ams, Iowa. Mchals, J., 1989, Log-prod fluctuatos El No ampltud ad frqucy rcostructd from tr-rgs, Gophyscal Moograph 55, Amrca Gophyscal Uo, 69-74. Ostrom, C.W., Jr., 199, Tm Srs Aalyss, Rgrsso Tchqus, Scod Edto: Quattatv Applcatos th Socal Sccs, v. 7-9: Nwbury Park, Sag Publcatos. Rchr, A.C., ad Pu, Fu Cayog, 198, Iflato of R bst subst rgrsso, Tchomtrcs (1), 49-53. Nots_11, GEOS 585A, Sprg 15 1

Wsbrg, S., 1985, Appld Lar Rgrsso, d d., Joh Wly, Nw York, 34 pp. Wlks, D.S., 1995, Statstcal mthods th atmosphrc sccs: Acadmc Prss, 467 p. Woodhous, C.A., 1999, Artfcal ural tworks ad ddroclmatc rcostructos: A xampl from th Frot Rag, Colorado, USA: Th Holoc, v. 9, o. 5, p. 51-59. Nots_11, GEOS 585A, Sprg 15 11