Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask


 Randolf Briggs
 1 years ago
 Views:
Transcription
1 Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods. By workig out i detail a small, fictitious umerical example, the writers exhibit the various steps ivolved i ivestigatig the depedece betwee two radom variables ad i modelig it usig copulas. Simple graphical tools ad umerical techiques are preseted for selectig a appropriate model, estimatig its parameters, ad checkig its goodessoffit. A larger, realistic applicatio of the methodology to hydrological data is the preseted. DOI: 0.06/ASCE :4347 CE Database subject headigs: Frequecy aalysis; Distributio fuctios; Risk maagemet; Statistical models. Itroductio Hydrological pheomea are ofte multidimesioal ad hece require the joit modelig of several radom variables. Traditioally, the pairwise depedece betwee variables such as depth, volume, ad duratio of flows has bee described usig classical families of bivariate distributios. Perhaps the most commo models occurrig i this cotext are the bivariate ormal, logormal, gamma, ad extremevalue distributios. The mai limitatio of this approach is that the idividual behavior of the two variables or trasformatios thereof must the be characterized by the same parametric family of uivariate distributios. Copula models, which avoid this restrictio, are just begiig to make their way ito the hydrological literature; see, e.g., De Michele ad Salvadori 2002, Favre et al. 2004, Salvadori ad De Michele 2004, ad De Michele et al Restrictig attetio to the bivariate case for the sake of simplicity, the copula approach to depedece modelig is rooted i a represetatio theorem due to Sklar 959. The latter states that the joit cumulative distributio fuctio c.d.f. Hx,y of ay pair X,Y of cotiuous radom variables may be writte i the form Hx,y = CFx,Gy, x,y R where Fx ad Gymargial distributios; ad C:0, 2 0,copula. While Sklar 959 showed that C, F, ad G are uiquely determied whe H is kow, a valid model for X,Y arises from Eq. wheever the three igrediets are chose from give parametric families of distributios, viz. Professor, Dépt. de mathématiques et de statistique, Uiv. Laval, Québec QC, Caada GK 7P4. 2 Professor, Chaire e Hydrologie Statistique, INRS, Eau, Terre et Eviroemet, Québec QC, Caada GK 9A9. Note. Discussio ope util December, Separate discussios must be submitted for idividual papers. To exted the closig date by oe moth, a writte request must be filed with the ASCE Maagig Editor. The mauscript for this paper was submitted for review ad possible publicatio o August 29, 2006; approved o August 29, This paper is part of the Joural of Hydrologic Egieerig, Vol. 2, No. 4, July, ASCE, ISSN /2007/ /$ F F, G G, C C Thus, for example, F might be ormal with bivariate parameter =, 2 ; G might be gamma with parameter =,; ad C might be take from the Farlie Gumbel Morgester family of copulas, defied for each, by C u,v = uv + uv u v, u,v 0, 2 The mai advatage provided to the hydrologist by this approach is that the selectio of a appropriate model for the depedece betwee X ad Y, represeted by the copula, ca the proceed idepedetly from the choice of the margial distributios. For a itroductio to the theory of copulas ad a large selectio of related models, the reader may refer, e.g., to the moographs by Joe 997 ad Nelse 999, or to reviews such as Frees ad Valdez 998 ad Cherubii et al. 2004, i which actuarial ad fiacial applicatios are cosidered. While the theoretical properties of these objects are ow fairly well uderstood, iferece for copula models is, to a extet, still uder developmet. The literature o the subject is yet to be collated, ad most of it is ot writte with the ed user i mid, makig it difficult to decipher except for the most mathematically iclied. The aim of this paper is to preset, i the simplest terms possible, the successive steps required to build a copula model for hydrological purposes. To this ed, a fictitious data set of very small size will be used to illustrate the diagostic ad iferetial tools curretly available. Although ituitio will be give for the various techiques to be preseted, emphasis will be put o their implemetatio, rather tha o their theoretical foudatio. Therefore, computatios will be preseted i more detail tha usual, at the expese of exhaustive mathematical expositio, for which the reader will oly be give appropriate refereces. The pedagogical data set to be used throughout the paper is itroduced i the Depedece ad Raks sectio, where it will be explaied why statistical iferece cocerig depedece structures should always be based o raks. This will lead, i the Measurig Depedece sectio, to the descriptio of classical oparametric measures of depedece ad tests of idepedece. Exploratory tools for ucoverig depedece ad measurig it will be reviewed i the Additioal Graphical Tools for Detectig Depedece sectio. Poit ad iterval estimatio for JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 347
2 Table. Learig Data Set i X i Y i ,. At the other extreme, it ca also be show that i order for Y to be a determiistic fuctio of X, C must be either oe of the two copulas Wu,v = max0,u + v or Mu,v = miu,v which are usually referred to as the Fréchet Hoeffdig bouds i the statistical literature; see, e.g., Fréchet 95 or Nelse 999, p. 9. Whe C=W, Y is a decreasig fuctio of X, while Y is mootoe icreasig i X whe C = M. More geerally, ay copula C represets a model of depedece that lies somewhere betwee these two extremes, a fact that traslates ito the iequalities Wu,v Cu,v Mu,v, u,v 0, To get a feelig for the depedece betwee X ad Y, it is traditioal to look at the scatter plot of the pairs X,Y,...,X,Y. Such a represetatio is give i Fig. a for the followig fictitious radom sample of size =6 from the bivariate stadard ormal distributio with zero correlatio. This example will be used for illustratio purposes throughout the paper. Fig.. a Covetioal scatter plot of the pairs X i,y i ; b correspodig scatter plot of the pairs Z i,t i =e X i,e 3Y i depedece parameters from copula models will the be preseted i the Estimatio sectio. Recet goodessoffit techiques will be illustrated i the GoodessofFit Tests sectio. The Applicatio sectio will discuss i detail a cocrete hydrological implemetatio of this methodology. This will lead to the cosideratio of additioal tools for the treatmet of extremevalue depedece structures i the Graphical Diagostics for Bivariate ExtremeValue Copulas sectio. Fial remarks will the be made i the Coclusio sectio. Depedece ad Raks Suppose that a radom sample X,Y,...,X,Y is give from some pair X,Y of cotiuous variables, ad that it is desired to idetify the bivariate distributio Hx,y that characterizes their joit behavior. I view of Sklar s represetatio theorem, there exists a uique copula C for which idetity, Eq., holds. Therefore, just as Fx ad Gy give a exhaustive descriptio of X ad Y take separately, the joit depedece betwee these variables is fully ad uiquely characterized by C. It is easy to see, for example, that X ad Y are stochastically idepedet if ad oly if C=, where u,v=uv for all u,v Learig Data Set Table shows six idepedet pairs of mutually idepedet observatios X i, Y i geerated from the stadard N0, distributio usig the statistical freeware R R Developmet Core Team For simplicity, ad without loss of geerality, the pairs were labeled i such a way that X X 6. While there is othig fudametally wrog with lookig at the patter of the pairs X i,y i for example, to look for liear associatio, it must be realized that this picture does ot oly icorporate iformatio about the depedece betwee X ad Y, but also about their margial behavior. To drive this poit home, cosider the trasformed pairs Z i = expx i, T i = exp3y i, i 6 whose scatter plot, show i Fig. b, is drastically differet from the origial oe. I effect, both pictures are distortios of the depedece betwee the pairs X,Y ad Z,T, which is characterized by the same copula, C, whatever it may be. More geerally, if ad are two icreasig trasformatios with iverses ad, the copula of the pair Z,T with Z=X ad T=Y is the same as that of X,Y. Let H * z,t = C * F * z,g * t 3 be the Sklar represetatio of the joit distributio of the pair Z,T. Sice the margial distributios of Z ad T are give by ad F * z =PZ z =PX z = F z G * t =PT t =PY t = G t 348 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
3 Table 2. Raks for the Learig Data Set of Table i R i S i oe has H * z,t =PZ z,t t =PX z,y t =H z, t = CF z,g t =CF * z,g * t 4 for all choices of z,tr. It follows at oce from the compariso of Eqs. 3 ad 4 that C * =C. Expressed i differet terms, the above developmet meas that the uique copula associated with a radom pair X,Y is ivariat by mootoe icreasig trasformatios of the margials. Sice the depedece betwee X ad Y is characterized by this copula, a faithful graphical represetatio of depedece should exhibit the same ivariace property. Amog fuctios of the data that meet this requiremet, it ca be see easily that the pairs of raks R,S,...,R,S associated with the sample are the statistics that retai the greatest amout of iformatio; see, e.g., Oakes 982. Here, R i stads for the rak of X i amog X,...,X, ad S i stads for the rak of Y i amog Y,...,Y. These raks are uambiguously defied, because ties occur with probability zero uder the assumptio of cotiuity for X ad Y. Pairs of raks correspodig to the learig data set are give i Table 2. Displayed i Fig. 2a is the scatter plot of the pairs R i,s i correspodig to these X i,y i. Fig. 2b shows the graph of the pairs R * i,s * i associated with the Z i,t i. The result is obviously the same. It is the most judicious represetatio of the copula C that oe could hope for. Upo rescalig of the axes by a factor of /+, oe gets a set of poits i the uit square 0, 2, which form the domai of the socalled empirical copula Deheuvels 979, formally defied by C u,v = v R i + u, S i + with A deotig the idicator fuctio of set A. For ay give pair u,v, it may be show that C u,v is a rakbased estimator of the ukow quatity Cu,v whose largesample distributio is cetered at Cu,v ad ormal. Measurig Depedece It was argued above that the empirical copula C is the best samplebased represetatio of the copula C, which is itself a characterizatio of the depedece i a pair X,Y. It would make sese, therefore, to measure depedece, both empirically ad theoretically, usig C ad C, respectively. It will ow be explaied how this leads to two wellkow oparametric measures of depedece, amely Spearma s rho ad Kedall s tau. Fig. 2. Displayed i a is a scatter plot of the pairs R i,s i of raks derived from the learig data set X i,y i,i6. As for b, it shows a scatter plot of the pairs R i *,S i * of raks derived from the trasformed data Z i,t i =expx i,exp3y i, i6. For obvious reasos, the two graphs are actually idetical. Spearma s Rho Mimickig the familiar approach of Pearso to the measuremet of depedece, a atural idea is to compute the correlatio betwee the pairs R i,s i of raks, or equivaletly betwee the poits R i /+,S i /+ formig the support of C. This leads directly to Spearma s rho, viz. where = R i R S i S, R i R 2 S i S 2 R = R i = + = S i = S 2 This coefficiet, which may be expressed more coveietly i the form JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 349
4 2 = + R i S i 3 + shares with Pearso s classical correlatio coefficiet, r, the property that its expectatio vaishes whe the variables are idepedet. However, is theoretically far superior to r, i that. E = ± occurs if ad oly if X ad Y are fuctioally depedet, i.e., wheever their uderlyig copula is oe of the two Fréchet Hoeffdig bouds, M or W; 2. I cotrast, Er = ± if ad oly if X ad Y are liear fuctios of oe aother, which is much more restrictive; ad 3. estimates a populatio parameter that is always well defied, whereas there are heavytailed distributios such as the Cauchy, for example for which a theoretical value of Pearso s correlatio does ot exist. For additioal discussio o these poits, see, e.g., Embrechts et al As it turs out, is a asymptotically ubiased estimator of =20, 2 uvdcu,v 3=20, 2 Cu,vdvdu 3 where the secod equality is a idetity origially prove by Hoeffdig 940 ad exteded by QuesadaMolia 992. To show this, oe may use the fact that uvdc u,v 3= 2 20, 2 R i S i + + 3= + ad that C C as. For more precise coditios uder which this result holds, see, e.g., Hoeffdig 948. Note i passig that uder the ull hypothesis H 0 :C= of idepedece betwee X ad Y, the distributio of is close to ormal with zero mea ad variace /, so that oe may reject H 0 at approximate level =5%, for istace, if z /2 =.96. Example For the observatios from the learig data set, a simple calculatio yields =/35=0.028, while r = Here, there is o reaso to reject the ull hypothesis of idepedece. For, if Z is a stadard ormal radom variable, the Pvalue of the test based o is 2PrZ 5/35=94.9%. Give a family C of copulas idexed by a real parameter, the theoretical value of is, typically, a mootoe icreasig fuctio of. A sufficiet coditio for this is that the copulas be ordered by positive quadrat depedece PQD, which meas that the implicatio C u,vc u,v is true for all u,v0,. The origial defiitio of PQD as a cocept of depedece goes back to Lehma 966; the same orderig, rediscovered by Dhaee ad Goovaerts 996 i a actuarial cotext, is ofte referred to as the correlatio or cocordace orderig i that field. I the Farlie Gumbel Morgester model, for example, oe has Fig. 3. Spearma s rho a ad Kedall s tau b as a fuctio of Pearso s correlatio i the bivariate ormal model 2 c u,v = uv C u,v =+ 2u 2v sice C is absolutely cotiuous i this case. A simple calculatio the yields 0 0 uvc u,vdvdu = ad, hece, =/3, as iitially show by Schucay et al As a secod example, if X,Y follows a bivariate ormal distributio with correlatio r, a somewhat itricate calculatio to be foud, e.g., i Kruskal 958, shows that arcsi FxGydHx,y 3= 6 r =2 2 where 0, 2 uvdc u,v =0 0 uvc u,vdvdu For those people accustomed to thikig i terms of r, the above formula may suggest that a serious effort would be required to thik of correlatio i terms of Spearma s rho i the traditioal bivariate ormal model. As show i Fig. 3a, however, the differece betwee ad r is miimal i this cotext. 350 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
5 W =0, 2 C u,vdc u,v Usig Eq. 6 ad the fact that uder suitable regularity coditios, C C as, oe ca coclude with Hoeffdig 948 that is a asymptotically ubiased estimator of the populatio versio of Kedall s tau, give by =40, 2 Cu,vdCu,v Fig. 4. Two pairs of cocordat a ad discordat b observatios Kedall s Tau A secod, wellkow measure of depedece based o raks is Kedall s tau, whose empirical versio is give by = P Q 2 = 4 P where P ad Q =umber of cocordat ad discordat pairs, respectively. Here, two pairs X i,y i, X j,y j are said to be cocordat whe X i X j Y i Y j 0, ad discordat whe X i X j Y i Y j 0. Oe eed ot worry about ties, sice the borderlie case X i X j Y i Y j =0 occurs with probability zero uder the assumptio that X ad Y are cotiuous. The characteristic patters of cocordat ad discordat pairs are displayed i Fig. 4. It is obvious that is a fuctio of the raks of the observatios oly, sice X i X j Y i Y j 0 if ad oly if R i R j S i S j 0. Accordigly, is also a fuctio of C. To make the coectio, itroduce I ij = if X j X i,y j Y i 0 otherwise for arbitrary i j, ad let I ii = for all i,...,. Observe that P = 2 ji I ij + I ji = ji I ij = + I ij j= sice I ij +I ji = if ad oly if the pairs X i,y i ad X j,y j are cocordat. Now write W i = I ij = j= # j:x j X i,y j Y i so that if W =W + +W /, the P = + 2 W ad =4 W +3 The coectio with C the comes from the fact that by defiitio W i = C R i +, S i + hece 5 6 A alterative test of idepedece ca be based o, sice uder H 0, this statistic is close to ormal with zero mea ad variace 22+5/9. Thus, H 0 would be rejected at approximate level =5% if Example (Cotiued) For the observatios from the learig data set, a simple calculatio yields =/5= Here, there is o reaso to reject the ull hypothesis of idepedece. For, if Z is a stadard ormal radom variable, the Pvalue of the test based o is 2PrZ0.88=85.%. As for Spearma s rho, the theoretical value of Kedall s tau is a mootoe icreasig fuctio of the real parameter wheever a family C of copulas is ordered by positive quadrat depedece. I the Farlie Gumbel Morgester model, for example, oe has C u,vdc u,v C u,vc u,vdvdu 0, =0 0 2 which reduces to /8+/4, hece =2/9, as per Schucay et al For the bivariate ormal model with correlatio r, Kruskal 958 has show that =4 Hx,ydHx,y = 2 arcsir As show i Fig. 3b, is also early a liear fuctio of r i this special case. Other Measures ad Tests of Depedece Although Spearma s rho ad Kedall s tau are the two most commo statistics with which depedece is measured ad tested, may alterative rakbased procedures have bee proposed i the statistical literature. Most of them are based o expressios of the form Ju,vdC u,v where J is some suitably regular score fuctio. Thus, while Ju,v=uv is the basis of Spearma s statistic, as see earlier, the choice Ju,v= u v, e.g., yields the va der Waerde statistic. Geest ad Verret 2005, who review this literature, explai how each J should be chose so as to yield the most powerful testig procedure agaist a specific class of copula alteratives. JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 35
6 I the absece of privileged iformatio about the suspected departure from idepedece, however, omibus procedures such as those based o ad usually perform well. See Deheuvels 98 or Geest ad Rémillard 2004 for other geeral tests based o the empirical copula process C = C C. Additioal Graphical Tools for Detectig Depedece Besides the scatter plot of raks, two graphical tools for detectig depedece have recetly bee proposed i the literature, amely, chiplots ad Kplots. These will be briefly described i tur. ChiPlots Chiplots were origially proposed by Fisher ad Switzer 985 ad more fully illustrated i Fisher ad Switzer 200. Their costructio is ispired from cotrol charts ad based o the chisquare statistic for idepedece i a twoway table. Specifically, itroduce ad H i = # j i:x j X i,y j Y i = W i F i = # j i:x j X i G i = # j i:y j Y i Notig that these quatities deped exclusively o the raks of the observatios, Fisher ad Switzer propose to plot the pairs i, i, where ad i = H i F i G i Fi F i G i G i i = 4 sig F ig i max F 2 i,g 2 i where F i=f i /2, G i=g i /2 for i,...,. To avoid outliers, they recommed that what should be plotted are oly the pairs for which i Fig. 5 shows the resultig graph for the learig data set of Tables ad 2. The coordiates of the poits ad the itermediate calculatios that lead to them are summarized i Table 3. Note that, i geeral, betwee two ad four poits may be lost due to divisio by zero; such is the case here for three poits. Give that the origial data set cosisted of six observatios oly, this leaves oly 6 3=3 poits o the graph, which is obviously ot particularly revealig. However, the reallife applicatios cosidered i the Applicatio sectio ad by Fisher ad Switzer 985, 200 provide more covicig evidece of the usefuless of this tool. Fisher ad Switzer 985, 200 argue that i, i,. While i =measure of distace betwee the pair X i,y i ad the ceter of the scatter plot, i =siged square root of the traditioal chisquare test statistic for idepedece i the twoway table geerated by coutig poits i the four regios delieated by the lies x=x i ad y=y i. Sice oe would expect H i F i G i for all i uder idepedece, values of i that fall too far from zero are idicative of departures from that hypothesis. To help idetify such departures, Fisher ad Switzer 985, 200 suggest that cotrol limits be draw at ±c p /, where cp is selected so that approximately 00p% of the pairs i, i lie betwee the lies. Through simulatios, they foud that the c p values.54,.78, ad 2.8 correspod to p=0.9, 0.95, ad 0.99, respectively. KPlots Aother rakbased graphical tool for visualizig depedece was recetly proposed by Geest ad Boies It is ispired by the familiar otio of QQplot. Specifically, their techique cosists i plottig the pairs W i:,h i for i,...,, where H H are the order statistics associated with the quatities H,...,H itroduced i the ChiPlots subsectio. As for W i:,itisthe expected value of the ith statistic from a radom sample of size from the radom variable W=CU,V=HX,Y uder the ull hypothesis of idepedece betwee U ad V or betwee X ad Y, which is the same. The latter is give by where Fig. 5. Chiplot for the learig data set W i: = wk 0 wk 0 w i K 0 w i dw i 0 Table 3. Computatios Required for Drawig the ChiPlot Associated with the Learig Data Set of Table i H i F i G i i i / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
7 Table 4. Coordiates of Poits Displayed o the KPlot Associated with the Learig Data Set of Table i W i: H i thus rely oly o the raks of the observatios, which are the best summary of the joit behavior of the radom pairs. Fig. 6. Kplot for the learig data set. Superimposed o the graph are a straight lie correspodig to the case of idepedece ad a smooth curve K 0 w associated with perfect positive depedece. Estimate Based o Kedall s Tau To fix ideas, suppose that the uderlyig depedece structure of a radom pair X,Y is appropriately modeled by the Farlie Gumbel Morgester family C defied i Eq. 2. I this case, is real ad as see i the Kedall s Tau subsectio there exists a immediate relatio i this model betwee the parameter ad the populatio value of Kedall s tau, amely = 2 9 K 0 w =PUV w PU w =0 vdv w w dv =0 +w v dv = w w logw ad k 0 =correspodig desity. The values of W :6,...,W 6:6 required to produce Fig. 6 ca be readily computed usig ay symbolic calculator, such as Maple. They are give i Table 4. The iterpretatio of Kplots is similar to that of QQplots: just as curvature is problematic, e.g., i a ormal QQplot, ay deviatio from the mai diagoal is a sig of depedece i Kplots. Positive or egative depedece may be suspected i the data, depedig whether the curve is located above or below the lie y=x. Roughly speakig, the further the distace, the greater the depedece. I this costructio, perfect egative depedece i.e., C=W would traslate ito a strig of data poits aliged o the xaxis. As for perfect positive depedece i.e., C=M, it would materialize ito data aliged o the curve K 0 w show o the graph. As for the chiplot, the liearity or lack thereof i the Kplot displayed i Fig. 6 is hard to detect, give the extremely small size of the learig data set. However, see the Applicatio sectio ad Geest ad Boies 2003 for more compellig illustrartios of Kplots. Estimatio Now suppose that a parametric family C of copulas is beig cosidered as a model for the depedece betwee two radom variables X ad Y. Give a radom sample X,Y,...,X,Y from X,Y, how should be estimated? This sectio reviews differet oparametric strategies for tacklig this problem, depedig o whether is real or multidimesioal. Oly rakbased estimators are cosidered i the sequel. This methodological choice is justified by the fact, highlighted earlier, that the depedece structure captured by a copula has othig to do with the idividual behavior of the variables. A fortiori, ay iferece about the parameter idexig a family of copulas should Give a sample value of computed from Eq. 5 or 6, a simple ad ituitive approach to estimatig would the cosist of takig = 9 2 Sice is rakbased, this estimatio strategy may be costrued as a oparametric adaptatio of the celebrated method of momets. More geerally, if =g for some smooth fuctio g, the =g may be referred to as the Kedallbased estimator of. A small adaptatio of Propositio 3. of Geest ad Rivest 993 implies that where ad 4S N0, S 2 = W i + W i 2W 2 W i = I ji = j= # j:x i X j,y i Y j Therefore, a applicatio of Slutsky s theorem, also kow as the Delta method, implies that as 2 N, 4Sg Accordigly, a approximate 00 % cofidece iterval for is give by ± z /2 4Sg For a alterative cosistet estimator of the asymptotic variace of, see for istace, Samara ad Radles 988. JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 353
8 Table 5. Itermediate Values Required for the Computatio of the Stadard Error Associated with Kedall s Tau i W i W i Example (Cotiued) For the learig data set of Table, it was see earlier that =/5, hece =0.3. Usig the itermediate quatities summarized i Table 5, oe fids S 2 =0.043, hece a approximate 95% cofidece iterval for this estimatio is,, sice g9/2, ad hece,.964sg / =2.99. While the size of the stadard error may appear exceedigly coservative, this result is ot surprisig, cosiderig that the sample size is =6. The popularity of as a estimator of the depedece parameter stems i part from the fact that closedform expressios for the populatio value of Kedall s tau are available for may commo parametric copula models. Such is the case, i particular, for several Archimedea families of copulas, e.g., those of Ali et al. 978, Clayto 978, Frak 979, Gumbel Hougaard Gumbel 960, etc. Specifically, a copula C is said to be Archimedea if there exists a covex, decreasig fuctio :0, 0, such that =0 ad Cu,v = u + v is valid for all u,v0,. As show by Geest ad MacKay 986 t =+40 t dt 7 Table 6 gives the geerator ad a expressio for for the three most commo Archimedea models. Algebraically closed formulas are available for various other depedece models, e.g., extremevalue or Archimax copulas. See, for example, Ghoudi et al. 998 or Capéraà et al Estimate Based o Spearma s Rho Whe the depedece parameter is real, a alterative rakbased estimator that remais i the spirit of the method of momets cosists of takig = h where =h represets the relatioship betwee the parameter ad the populatio value of Spearma s rho. I the cotext of the Farlie Gumbel Morgester family of copulas, for example, it was see earlier that =/3, so that =3 would be a alterative oparametric estimator to =9 /2. Now it follows from stadard covergece results about empirical processes to be foud, e.g., i Chapter 5 of Gaessler ad Stute 987, that N, 2 where the asymptotic variace 2 depeds o the uderlyig copula C i a way that has bee described i detail by Borkowf Arguig alog the same lies as i the Estimate Based o Table 6. Three Commo Families of Archimedea Copulas, Their Geerator, Their Parameter Space, ad a Expressio for the Populatio Value of Kedall s Tau Family Geerator Parameter Kedall s tau Clayto t / /+2 Frak Kedall s Tau subsectio, it ca the be see that uder suitable regularity coditios o h N, h 2 where 2 =suitable estimator of 2. A approximate 00 % cofidece iterval for is the give by ± z /2 h Substitutig C for C i the expressios reported by Borkowf 2002, a very atural, cosistet estimate for 2 is give by where ad 2 = 44 9A 2 + B +2C +2D +2E C = 3 j= k= R i S i A = + + B = D = 2 j= E = 2 j= log e t e R 4/+4D / Gumbel Hougaard logt / Note: Here, D = 0 x//e x dx is the first Debye fuctio. R i R i + S i 2 S i R k R i,s k S j + 4 A S i + max S R j i + +, R j + max R i R S j i + + +, S j + Example (Cotiued) For the learig data set of Table, it was see earlier that =/35, hece =3/ Burdesome but simple calculatios yield =7.77, hece a approximate 95% cofidece iterval for this estimatio is,, sice h3, ad hece,.96 h / =8.66. Here agai, the size of the stadard error is quite large, as might be expected give that =6. Maximum Pseudolikelihood Estimator I classical statistics, maximum likelihood estimatio is a wellkow alterative to the method of momets that is usually more / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
9 efficiet, particularly whe is multidimesioal. I the preset cotext, a adaptatio of this approach to estimatio is required if iferece cocerig depedece parameters is to be based exclusively o raks. Such a adaptatio was described i broad terms by Oakes 994 ad was later formalized ad studied by Geest et al. 995 ad by Shih ad Louis 995. The method of maximum pseudolikelihood, which requires that C be absolutely cotiuous with desity c, simply ivolves maximizig a rakbased, loglikelihood of the form = logc R i +, S i 8 + The latter is exactly the expressio oe gets whe the ukow margial distributios F ad G i the classical loglikelihood = logc FX i,gy i are replaced by rescaled versios of their empirical couterparts, i.e. ad F x = X i x + G y = Y i y + That this substitutio yields formula 8 is immediate, oce it is realized that F X i =R i /+ ad G Y i =S i /+ for all i,...,. This method may seem superficially less attractive tha the iversio of Kedall s tau or Spearma s rho, both because it ivolves umerical work ad requires the existece of a desity c. At the same time, however, it is much more geerally applicable tha the other methods, sice it does ot require the depedece parameter to be real. The procedure for estimatig a multivariate ad computig associated approximate cofidece regio is described by Geest et al For simplicity, it is oly preseted here i the case where is real; however, see the Applicatio sectio for the bivariate case. Lettig ċ u,v=c u,v/, Geest et al. 995 show uder mild regularity coditios that the root ˆ of the equatio = ċ R i = +, S i + c R i +, S i + =0 is uique. Furthermore ˆ N, 2 where 2 depeds exclusively o the true uderlyig copula C as per Propositio 2. of Geest et al As metioed by these authors, a cosistet estimate of 2 is give by where ˆ 2 = ˆ 2 2 /ˆ ad ˆ 2 = ˆ 2 = M i M 2 N i N 2 are sample variaces computed from two sets of pseudoobservatios with meas M =M + +M / ad N =N + +N /, respectively. To compute the pseudoobservatios M i ad N i, oe should proceed as follows: Step : Relabel the origial data X,Y,...,X,Y i such a way that X X ; as a cosequece oe the has R =,...,R =. Step 2: Write L,u,v=log c u,v ad compute L, L u, ad L v, which are the derivatives of L with respect to, u, ad v, respectively. Step 3: For i,...,, set i N i = L ˆ, +, S i + Step 4: For i,...,, let also M i = N i j L ˆ, j=i +, S j j uˆ, +L +, S j + j L ˆ, S j S i +, S j j vˆ, +L +, S j + Example (Cotiued) Suppose that a Farlie Gumbel Morgester copula model is beig cosidered for the learig data set of Table. I this case ad c u,v =+ 2u 2v ċ u,v c u,v = 2u 2v + 2u 2v Accordigly, the logpseudolikelihood associated with this model is give by = log+ 2R i + 2S i + ad the correspodig pseudoscore fuctio is 2 R i + 2 S i + = + 2 R i + 2 S i + + 2R = i + 2S i R i + 2S i These two fuctios are plotted i Fig. 7 with =6 ad the values of R i ad S i give i Table 2. Upo substitutio, oe gets ˆ = as the uique root of the equatio JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 355
10 Table 7. Values of the Costats N i ad M i Required to Compute a Approximate Cofidece Iterval for the Maximum of Pseudolikelihood Estimator ˆ i N i M i oe gets ˆ 2 =0.0677/0.0707=0.958 ad.96ˆ / = The cofidece iterval for the maximum likelihood estimator is give by 0.684, Fig. 7. Graphs of a ad b for the learig data set of Table whe the assumed model is the Farlie Gumbel Morgester family of copulas 5 = =0 I the preset case 2u 2v L,u,v = + 2u 2v 2 2v L u,u,v = + 2u 2v 2 2u L v,u,v = + 2u 2v Usig the itermediate calculatios summarized i Table 7, Other Estimatio Methods Although they are the most commo, estimators based o the maximizatio of the pseudolikelihood ad o the iversio of either Kedall s tau or Spearma s rho are ot the oly rakbased procedures available for selectig appropriate values of depedece parameters i a copulabased model. Tsukahara 2005, for example, recetly ivestigated the behavior ad performace of two ew classes of estimators derived from miimumdistace criteria ad a estimatigequatio approach. I his simulatios, however, the maximum pseudolikelihood estimator tured out to have the smallest measquared error. Circumstaces uder which the latter approach is asymptotically semiparametrically efficiet were delieated by Klaasse ad Weller 997 ad by Geest ad Werker See Biau ad Wegkamp 2005 for aother rakbased, miimumdistace method for depedece parameter estimatio. I all fairess, it should be metioed that the exclusive reliace o raks for copula parameter estimatio advocated here does ot make complete cosesus i the statistical commuity. I his book, Joe 997, Chap. 0 recommeds a parametric twostep procedure ofte referred to as the iferece from margis or IFM method. As i the pseudolikelihood approach described above, the estimate of is obtaied through the maximizatio of a fuctio of the form = logc Fˆ X i,ĝy i However, while the rakbased method takes Fˆ =F ad Ĝ=G, Joe 997 substitutes Fˆ =F ad Ĝ=G, where F ad G =suitable parametric families for the margis, ad ad =stadard maximum likelihood estimates of their parameters, derived from the observed values of X ad Y, respectively. Cherubii et al. 2004, Sectio 5.3 poit out that the IFM method may be viewed as a special case of the geeralized method of momets with a idetity weight matrix. Joe 2005 quatifies the asymptotic efficiecy of the approach i differet circumstaces. Although they usually perform well, the estimates of the associatio parameters derived by the IFM techique clearly deped o the choice of F ad G, ad thus always ru the risk of beig uduly affected if the models selected for the margis tur out to be iappropriate see e.g., Kim et al For completeess, it may be worth metioig that aother developig body of literature proposes the use of kerel methods to derive a smooth estimate of a copula or its desity, without assumig ay specific parametric form for it. See, e.g., Gijbels ad Mieliczuk 990 or Fermaia ad Scaillet / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
11 GoodessofFit Tests I typical modelig exercises, the user has a choice betwee several differet depedece structures for the data at had. To keep thigs simple, suppose that two copulas C ad D were fitted by some arbitrary method. It is the atural to ask which of the two models provides the best fit to the observatios. Both iformal ad formal ways of tacklig this questio will be discussed i tur. Graphical Diagostics Whe dealig with bivariate data, possibly the most atural way of checkig the adequacy of a copula model would be to compare a scatter plot of the pairs R i /+,S i /+ i.e., the support of the empirical copula C with a artificial data set of the same size geerated from C. To avoid arbitrariess iduced by samplig variability, however, a better strategy cosists of geeratig a large sample from C, which effectively amouts to portrayig the associated copula desity i two dimesios. Simple simulatio algorithms are available for most copula models; see, e.g., Devroye 986, Chap., or Whela 2004 for Archimedea copulas. I the bivariate case, a good strategy for geeratig a pair U,V from a copula C proceeds as follows: Step : Geerate U from a uiform distributio o the iterval 0,. Step 2: Give U = u, geerate V from the coditioal distributio Q u v =PV vu = u = u Cu,v by settig V=Q u U *, where U * =aother observatio from the uiform distributio o the iterval 0,. Whe a explicit formula does ot exist for Q u, the value v=q u u * ca be determied by trial ad error or more effectively usig the bisectio method; see Devroye 986, Chap. 2. Thus, for the Farlie Gumbel Morgester family of copulas, oe fids Q u v = v + v v 2u for all u,v0,, ad hece = u* if b = 2u =0 Q u u * b + b bu * if b = 2u 2b Fig. 8a displays 00 pairs U i,v i geerated with this algorithm, takig =ˆ = as deduced from the method of maximum pseudolikelihood. The six poits of the learig data set, represeted by crosses, are superimposed. Give the small size of the data set, it is hard to tell from this graph whether the selected model accurately reproduces the depedece structure revealed by the six observatios. To show the effectiveess of the procedure, the same exercise was repeated i Fig. 8b, usig a Clayto copula with =0. Here, the iappropriateess of the model is apparet, as might have bee expected from the fact that =5/6 for this copula, while =/5. Aother optio, which is related to Kplots, cosists of comparig the empirical distributio K of the variables W,...,W itroduced previously with K, i.e., the theoretical distributio of W=C U,V, where the pair U,V is draw from C. Oe possibility is to plot K ad K o the same graph to see Fig. 8. a Scatter plot of 00 pairs U i,v i simulated from the Farlie Gumbel Morgester with parameter = b Similar plot, geerated from the Clayto copula with =5/6. O both graphs, the six poits of the learig data set are idicated with a cross. how well they agree. Alteratively, a QQplot ca be derived from the order statistics W W by plottig the pairs W i:,w i for i,...,. I this case, however, W i: is the expected value of the ith order statistic from a radom sample of size from K, rather tha from K 0, as was the case i the Kplot. I other words W i: = wk wk w i K w i dw 9 i 0 where K w=pc U,Vw ad k =dk w/dw. These two graphs are preseted i Fig. 9 for the learig data set ad Clayto s copula with parameter =ˆ =0.449, obtaied by the method of maximum pseudolikelihood. As implied by the data i Table 5, K is a scale fuctio with steps of JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 357
12 Table 8. Coordiates of the QQPlot Displayed i Fig. 9b i W i: W i /6 /6 2/6 2/6 4/6 4/6 additioal iformatio about the BIPIT ad its properties ad applicatios, refer to Geest ad Rivest 200 ad Nelse et al Example (Cotiued) Fig. 9b shows a QQplot for visual assessmet of the adequacy of the Clayto model for the learig data set. The coordiates of the poits o the graph are give i Table 8. The ycoordiates were obtaied by umerical itegratio, upo substitutio of the specific choice of K give i Eq. 0 ito the geeral formula 9. By costructio, this geeralized Kplot is desiged to yield a approximate straight lie, whe the model is adequate ad the data sufficietly umerous to make a visual assessmet. The effectiveess of the two diagostic tools described above will be demostrated more covicigly i the Applicatio sectio. Formal Tests of GoodessofFit Formal methodology for testig the goodessoffit of copula models is just emergig. To the writers kowledge, the first serious effort to develop such a procedure was made by Wag ad Wells 2000 i the cotext of Archimedea models. Ispired by Geest ad Rivest 993, these authors proposed to compute a Cramér vo Mises statistic of the form Fig. 9. a Graphs of K ad K for the learig data set ad Clayto s copula with =ˆ = b Geeralized Kplot providig a visual check of the goodessoffit of the same model o these data. height /3 at w=/6, 2/6, ad 4/6. This is portrayed i dotted lies i Fig. 9a. The solid lie which is superimposed is K w = w + w w, w 0, 0 Sice K K ad K K as show by Geest ad Rivest 993, the two curves should look very similar whe the data are sufficietly abudat ad the model is good, i.e., whe K=K. More geerally, see Barbe et al. 996 for a study of the largesample behavior of the empirical process K K. I the preset case, the formula for K is easily deduced from the fact, established by Geest ad Rivest 993, that if C is a Archimedea copula with geerator, the distributio fuctio of W=CU,V=HX,Y, called the bivariate probability itegral trasform BIPIT, is give by S = K w K w 2 dw where 0, is a arbitrary cutoff poit. While Theorem 3 i their paper idetifies the limitig distributio of S, the latter is aalytically uwieldy. Furthermore, the bootstrap procedure they propose i replacemet is, of their ow admissio, ieffective. As a result, Pvalues for the statistic caot be computed. Whe faced with a choice betwee several copulas, therefore, Wag ad Wells 2000 thus ed up recommedig that the model yieldig the smallest value of S be selected. Recetly, Geest et al itroduced two variats of the S statistic ad of the bootstrap procedure of Wag ad Wells 2000 that allow overcomig these limitatios. I additio to beig much simpler to compute tha S ad idepedet of the choice of, the statistics proposed by Geest et al ca be used to test the adequacy of ay copula model, whether Archimedea or ot. More importatly still, Pvalues associated with these statistics are relatively easy to obtai by bootstrappig. Specifically, the statistics cosidered by Geest et al are of the form Kw = w w w, w 0, It may be observed i passig that idetity 7 is a straightforward cosequece of this result ad the fact that EW=+/4. For ad S =0 K w 2 k wdw 358 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
13 Table 9. PValues Estimated by Parametric Bootstrap for Testig the GoodessofFit of the Clayto Copula Model o the Learig Data Set Usig the Cramér vo Mises ad the Kolmogorov Smirov Statistics S ad T T = sup K w 0w where K w= K w K w. Although prima facie these expressios seem just as complicated as S, Geest et al show that i fact, they ca be easily computed as follows: K j j + ad S = 3 + K 2 j= T = j= K K j 2 max i=0,;0j K j + K j 2 K j j K j + i The bootstrap methodology required to compute associated Pvalues proceeds as follows, say i the case of S : Step : Estimate by a cosistet estimator. Step 2: Geerate N radom samples of size from C ad, for each of these samples, estimate by the same method as before ad determie the value of the test statistic. * * Step 3: If S :N S N:N deote the ordered values of the test statistics calculated i Step 2, a estimate of the critical value of the test at level based o S is give by ad Pvalue based o a ru of Statistic N=00,000 N=0,000 N=00 N=00 S T * S N :N N # j:s * j S yields a estimate of the Pvalue associated with the observed value S of the statistic. Here, x simply refers to the iteger part of xr. Obviously, the larger N, the better. I practice, N = 0,000 seems perfectly adequate, although oe could certaily get by with less, if limited i time or computig power. A additioal complicatio occurs whe K caot be writte i algebraic form. I that case, a double bootstrap procedure must be called upo, for which the reader is referred to Geest ad Rémillard Example (Cotiued) Suppose that Clayto s copula model has bee fitted to the learig data set usig some cosistet estimator. To test the adequacy of this depedece structure, oe could the compute the distace betwee K ad K w = w + w w usig either S or T. The correspodig Pvalues could the be foud via the parametric bootstrap procedure described above. I order to get valid results, however, ote that the same estimatio method must be used at every iteratio of this umerical algorithm. To reduce the itesity of the computig effort, the estimator obtaied through the iversio of Kedall s tau is ofte the most coveiet choice, particularly for Archimedea models. Whe the depedece parameter of Clayto s model is estimated i this fashio, oe gets = =0.43. The observed values of these statistics are the easily foud to be S = 0.272, T =.053 Table 9 reports the simulated Pvalues obtaied via parametric bootstrappig for oe ru of N=00,000, oe ru of N=0,000, ad two rus of N = 00. The discrepacy observed betwee Pvalues derived from the two rus at N = 00 illustrates the importace of takig N large eough to isure reliable coclusios. As ca be see from Table 9, takig N=00,000 istead of N=0,000 did ot chage the estimated Pvalues much, which is reassurig. Notwithstadig these differeces, either of the two tests leads to the rejectio of Clayto s model. Give the sample size, this is of course usurprisig. Oe drawback of this geeral strategy to goodessoffit testig is that as the umber of variables icreases, the uivariate summary represeted by the probability itegral trasformatio W=CU,...,U d =HX,...,X d ad its distributio fuctio Kw is less ad less represetative of the multivariate depedece structure embodied i C. For bivariate or trivariate applicatios such as are commo i hydrology, there is, however, aother more serious difficulty associated with a test based o S, S,orT. This arises from the fact that a give theoretical distributio K ca sometimes correspod to two differet copulas. I other words, it may happe that K is ot oly the distributio fuctio of W=CU,V but also that of W =C U,V, where U,V is distributed as C. I fact, Nelse et al show that uless C belogs to the Bertio family of copulas Bertio 977; Fredricks ad Nelse 2002, there always exists C i that class such that K=K ad CC. To illustrate the difficulties associated with the lack of uiqueess of K, cosider the class of bivariate extremevalue copulas, which are of the form Cu,v = exploguva loguv logu where A:0, /2,=some covex mappig such that Atmaxt, t for all t0,. See, e.g., Geoffroy 958, Sibuya 960, or Ghoudi et al The populatio value of Spearma s rho for this class of copulas ca be writte as A Aw + =20 2 dw 3 Also, as show by Ghoudi et al. 998, the distributio fuctio of W=CU,V for C i this class is give by where K A w = w A w logw JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 359
14 A =0 w w Awdw Aw wheever the secod derivative of A is cotiuous. I particular, ote that K A does ot deped o the whole fuctio A, but oly o the populatio value of Kedall s tau iduced by A. For this reaso, formal ad iformal goodessoffit procedures based o a compariso of K ad K could ot possibly distiguish, e.g., betwee two extremevalue copulas whose realvalued parameters would be estimated through iversio of Kedall s tau. I statistical parlace, the abovemetioed tests are ot cosistet. As already metioed by Fermaia 2005 ad by Geest et al. 2006, a obvious way to circumvet the cosistecy issue would be to base a goodessoffit test directly o the distace betwee C ad C. Sice the limitig distributio of the process C C is very complex, however, this strategy could oly be implemeted through a itesive use of the parametric bootstrap. For additioal iformatio i this regard, refer to the Applicatio sectio ad to Geest ad Rémillard The oly other geeral solutio available to date ivolves kerel estimatio of the copula desity, as developed i Fermaia A advatage of his statistic is that it has a stadard chisquare distributio i the limit. The implemetatio of the procedure, however, ivolves arbitrary choices of a kerel, its widow, ad a weight fuctio. As a result, some objectivity is lost. Fially, sice extremevalue copula models are likely to be useful i frequecy aalysis; diagostic ad selectio tools specifically suited to that case will be discussed i the cotext of the hydrological applicatio to be cosidered ext. Applicatio The Harricaa watershed is located i the orthwest regio of the provice of Québec. The Harricaa River origiates from several lakes ear Val d Or ad empties ito James Bay about 553 km orth. The ame of the river takes its origi from the Algoqui word Naikaa meaig the mai way. The daily discharges of the Harricaa River at Amos measured at Eviromet Caada Statio Number 04NA00 have bee used several times i the hydrology literature sice the data are available from 94 to preset; see, e.g., Bobée ad Ashkar 99 ad Bâ et al The mai characteristics of the watershed are the followig: draiage area of 3,680 km 2 at the gaugig statio, mea altitude 380 m, 23% of lakes ad swamp, ad 72% of forest. Sprig represets the high flow seaso due to the cotributio of seasoal sowmelt to river ruoff. Geerally, a combiatio of sowmelt ad raifall evets geerates the aual floods. Fig. 0. QQplots showig the fit of margial models for peak a ad volume b for the Harricaa River data Data The data cosidered for the applicatio cosist of the maximum aual flow X i m 3 /s ad the correspodig volume Y i hm 3 for =85 cosecutive years, startig i 95 ad edig i 999. Usig stadard uivariate modelig techiques, the preset writers came to the coclusio that the aual flow X could be appropriately modeled by a Gumbel extremevalue distributio Fˆ with mea 89 m 3 /s ad stadard error 5.5 m 3 /s. As for volume Y, it is faithfully described by a gamma distributio Ĝ with mea, hm 3 ad stadard error hm 3. Fig. 0a ad b show QQplots attestig to the good fit of these margial distributios to the observed values of X ad Y, respectively. Sice the focus of the preset study is o modelig the depedece betwee the two variables, othig further will be said about the fit of their margial distributios. As previously emphasized, the choice of margis is immaterial ayway, at least isofar as iferece o the depedece structure of the data is based o raks. Assessmet of Depedece Before a copula model for the pair X,Y is sought, visual tools were used to check for the presece of depedece. The scatter plot of ormalized raks show i Fig. suggests the presece of positive associatio betwee peak flow ad volume, as might be expected. This is cofirmed by the chiplot ad the Kplot, reproduced i Fig. 2a ad 2b, respectively. As ca be see, most of the poits fall outside the cofidece bad of the chiplot. A obvious curvature is also apparet i the Kplot. Both graphs poit to the existece of a positive relatioship betwee the two variables. To quatify the degree of depedece i the pair X,Y, sample values of Spearma s rho ad Kedall s tau were 360 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
15 Fig.. Scatter plot of the raks for the Harricaa River data computed, alog with the Pvalues of the associated tests of idepedece. For, oe fids = ad = 6.38 so that the Pvalue of the test is 2PrZ =2PrZ6.380%, where Z cotiues to deote a ormal radom variate with zero mea ad uit variace. For, oe gets 9 = ad22 +5 = so that the Pvalue of this test is eve smaller: 2PrZ %. Fig. 2. Chiplot a ad Kplot b for the aual peaks ad correspodig volumes of the Harricaa River Choice of Models I order to model the depedece betwee the aual peak X ad the volume Y of the Harricaa River, some 20 families of copulas were cosidered, which could be classified ito four broad categories:. Oe, two, ad threeparameter Archimedea models, icludig the traditioal Ali Mikhail Haq, Clayto, Frak Nelse 986; Geest 987, ad Gumbel Hougaard families of copulas ad their extesio described by Geest et al. 998, but also the system of Kimeldorf ad Sampso 975, the class of Joe 993, ad the BB BB3 ad BB6 BB7 classes described i the book of Joe 997, pp ; 2. Extremevalue copulas, icludig besides the Gumbel Hougaard system metioed just above Joe s BB5 family ad the classes of copulas itroduced by Galambos 975, Hüsler ad Reiss 989, ad Taw 988; 3. Metaelliptical copulas described, e.g., i Fag et al or Abdous et al. 2005, most otably the ormal, the Studet, ad the Cauchy copulas; ad 4. Other miscellaeous families of copulas, such as those of Farlie Gumbel Morgester ad Plackett 965. Some of these families of copulas could be elimiated off had, give that the degrees of depedece they spa were isufficiet to accout for the associatio observed i the data set. This was the case, e.g., for the Ali Mikhail Haq ad Farlie Gumbel Morgester systems. To help sieve through the remaiig models, the use was made of tools described i the Graphical Diagostics subsectio. Give a family C of copulas, a estimate of its parameter was first obtaied by the method of maximum pseudolikelihood, ad the 0,000 pairs of poits were geerated from C. Fig. 3 shows the five best coteders alog with the traditioal bivariate ormal model. As a further graphical check, the margis of the 0,000 radom pairs U i,v i from each of the six estimated copula models C were trasformed back ito the origial uits usig the margial distributios Fˆ ad Ĝ idetified i the Data subsectio for volume ad peak. The resultig scatter plots of pairs X i,y i =Fˆ U i,ĝ V i are displayed i Fig. 4, alog with the actual observatios. While Fig. 3 provides a graphical test of the goodessoffit of the depedece structure take i isolatio, Fig. 4 makes it possible to judge globally the viability of the complete model for frequecy aalysis. Keepig i mid the predictive ability that the fial model must have, it was decided to discard the bivariate ormal copula structure, due to the obvious lack of fit of the resultig model i the upper part of the distributio. Hece, of the five depedece structures retaied for the fier aalysis, four were extremevalue copulas, i.e., the BB5, Galambos, Gumbel Hougaard, ad Hüsler Reiss families. The fifth was the twoparameter BB Archimedea model. As ca be see from Table 0, the Gumbel Hougaard family obtais as a JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 36
16 Fig. 3. Simulated radom sample of size 0,000 from six chose families C of copulas with parameter estimated by the method of maximum pseudolikelihood usig the peak volume Harricaa River data, whose pairs of raks are idicated by a X special case of the BB system whe 0, while settig 2 = actually yields the Kimeldorf Sampso family. Likewise, the Galambos ad Gumbel Hougaard distributios are special cases of Family BB5 correspodig, respectively, to = ad 2 0. Estimatio Table gives parameter values for each of the five models, based o maximum pseudolikelihood. For oeparameter models, 95% cofidece itervals were computed as explaied above. For Fig. 4. Same data as i Fig. 3, upo trasformatio of the margial distributios as per the selected models for the peak ad the volume of the Harricaa River data, whose pairs of observatios are idicated by a X 362 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
17 Table 0. Defiitio of the Five Chose Families of Copulas with Their Parameter Space Copula C u,v Parameters Gumbel Hougaard exp ũ +ṽ / Galambos uv expũ +ṽ / 0 Hüsler Reiss exp ũ + 2 log ũ ṽ ṽ + 2 log ṽ ũ 0 BB +u 2+v 2 / 2 / 0, 2 BB5 exp ũ +ṽ ũ 2 +ṽ 2 / 2 /, 2 0 Note: With ũ= logu, ṽ= logv ad stadig for the cumulative distributio fuctio of the stadard ormal. qparameter models with q2, the determiatio of the cofidece regios relies o a estimatio of the limitig variace covariace matrix B B of the estimator ˆ =ˆ,...,ˆ q of =,..., q. Followig Geest et al. 995, the estimate of B is simply the empirical qq variace covariace matrix of the variables N,...,N q, for which a set of pseudoobservatios is available, amely i S i N pi = L pˆ, i,..., + +, where L p deotes the derivative of L,u,v=logc u,v with respect to p. Here, it is assumed that the origial data have bee relabeled so that X X. Likewise, =qq variace covariace matrix of the variables M,...,M q, for which the pseudoobservatios are M pi = N pi j L pˆ, j=i +, S j j uˆ, +L +, S j + j L pˆ, S j S i +, S j j vˆ, +L +, S j + for i,...,. A alterative, possibly more efficiet way of estimatig the iformatio matrix B is give by the Hessia matrix associated with L,u,v at ˆ, amely, the qq matrix whose p,r etry is give by i L p rˆ, +, S i + where L p r stads for the cross derivative of L,u,v with respect to both p ad r. I Table, the cofidece itervals for Models BB ad BB5 were derived usig the latter approach, as it produced somewhat arrower itervals. GoodessofFit Testig As a secod step towards model selectio, oe should look at the geeralized Kplot correspodig to the five families uder cosideratio. The graphs correspodig to the BB are displayed i Fig. 5. For reasos give i the Graphical Diagostics subsectio, the graphs correspodig to the BB5, Gumbel Hougaard, Galambos, ad Hüsler Reiss copulas are idetical, sice they are all extremevalue depedece structures. The graphs appear i Fig. 6. The plots displayed i Figs. 5 ad 6 suggest that both the BB ad extremevalue copula structures are adequate for the data at had. A similar coclusio is draw from the formal goodessoffit tests based o S ad T, as idicated i Table 2. Agai, the geeralized Kplot ad the formal goodessoffit tests correspodig to the Galambos, Hüsler Reiss, ad BB5 extremevalue copula models yield exactly the same results, as evideced i Table 2. I a attempt to distiguish betwee the extremevalue copula structures, a cosistet goodessoffit test could be costructed from the process C C, as evoked but dismissed by Fermaia 2005, o accout of the uwieldy ature of its limit. However, this difficulty ca be overcome easily with the use of a parametric or double parametric bootstrap, whose validity i this cotext has recetly bee established by Geest ad Rémillard The bootstrap procedure is exactly the same as described i the Formal Tests of GoodessofFit sectio, but with S replaced by the Cramér vo Mises statistic CM = C R i +, S i = W i C R i + C R i +, S i +2 +, S i +2 This bootstrapbased goodessoffit test was applied for each Table. Maximum Pseudolikelihood Parameter Estimates ad Correspodig 95% Cofidece Iterval for Five Families of Copulas, Based o the Harricaa River Data 95% cofidece Copula Estimates iterval CI Gumbel Hougaard ˆ =2.6 CI=.867,2.455 Galambos ˆ =.464 CI=.62,.766 Hüsler Reiss ˆ =2.027 CI=.778,2.275 BB ˆ =0.48, ˆ 2=.835 CI=0.022, ,2.25 BB5 ˆ =.034, ˆ 2=.244 CI=.000, ,.294 JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 363
18 Fig. 5. a Graphs of K ad K for the BB copula with ˆ =0.48, ˆ =.835, based o the Harricaa River data. b Geeralized Kplot providig a visual check of the goodessoffit of the same model for these data. Fig. 6. a Graphs of K ad K for the Gumbel Hougaard copula with =ˆ =2.6, based o the Harricaa River data. b Geeralized Kplot providig a visual check of the goodessoffit of the same model for these data. of the five families of copulas still uder cosideratio. The results are summarized i Table 3. As it tured out, oe of the models could be rejected o this basis either. The Gumbel Hougaard ad Galambos copulas beig embedded i twoparameter models BB ad BB5, yet aother optio for choosig betwee them would be to call o a pseudolikelihood ratio test procedure recetly itroduced by Che ad Fa Their approach, ispired by a semiparametric adaptatio of the Akaike Iformatio Criterio, makes it possible to measure the tradeoff betwee goodessoffit ad model parsimoy. Suppose it is desired to compare two ested copula models, say C=C, ad D=C,0. Let ˆ,ˆ represet the maximum pseudolikelihood estimator of,r 2 uder model C, ad write for the maximum pseudolikelihood estimator of R uder the submodel D. The test statistic proposed by Che ad Fa 2005 the rejects the ull hypothesis H 0 := 0 that model D is preferable to model C wheever CF =2 logc, 0 c ˆ,ˆ R i R i +, S i + +, S i + is sufficietly small. To determie a Pvalue for this test, oe must resort to a oparametric bootstrap procedure, which proceeds as follows. For some large iteger N ad each k,...,n, do the followig: Step : Draw a bootstrap radom sample X,Y,...,X,Y with replacemet from X,Y,...,X,Y. Step 2: Use the method of maximum pseudolikelihood to determie estimators ˆ,ˆ ad of, ad, 0 uder models C ad D, respectively. Step 3: Compute the Hessia matrices B ad B 2 associated with logc, u,v ad logc,0 u,v at ˆ,ˆ ad, respectively. Step 4: Determie the value of 364 / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
19 Table 2. Results of the Bootstrap Based o the Cramér vo Mises Statistic S ad Kolmogorov Smirov Statistic T : Observed Statistic, Critical Value q Correspodig to =5% ad Approximate PValue, Based o N=0,000 Parametric Bootstrap Samples Copula S q95% Pvalue T q95% Pvalue Gumbel Hougaard Galambos Hüsler Reiss BB BB CF,k = B 2 ˆ ˆ,ˆ ˆ B ˆ ˆ,ˆ ˆ T The Pvalue associated with the test of Che ad Fa 2005 is the give by N CF,k CF N k= The coclusios draw from this aalysis ot reported here are cosistet with Table, which idicate that the iterval estimates for the BB5 family are compatible with the Galambos model sice = is a possible value but ot with the Gumbel Hougaard because 2 =0 is excluded from its 95% cofidece iterval. Likewise, the parameter itervals for Family BB suggest that either the Gumbel Hougaard or the Kimeldorf Sampso families are adequate for the data at had. Additioal tools that may help to distiguish betwee bivariate extremevalue models will be preseted i the ext sectio. Graphical Diagostics for Bivariate ExtremeValue Copulas I the bivariate case, extremevalue copulas are characterized by the depedece fuctio A, asieq.. Whe the margial distributios F ad G of H are kow, a cosistet estimator A of A has bee proposed by Capéraà et al It is give by where t H z z A t = exp0 z z dz, t 0, H t = Z i t is the empirical distributio fuctio of the radom sample Z,...,Z with Z i =logfx i /logfx i GY i for i,..., Table 3. Results of the Bootstrap Based o the Cramér vo Mises Statistics CM : Observed Statistic, Critical Value q Correspodig to =5% ad Approximate PValue, Based o N=0,000 Parametric Bootstrap Samples Copula CM q95% Pvalue Gumbel Hougaard Galambos Hüsler Reiss BB BB These authors showed that if Z Z are the associated ordered statistics, the A ca be writte i closed form as pt if 0 t Z A t = tq t i/ t i/ Q pt Q i if Z i t Z i+ tq pt if Z t i terms of a weight fuctio p so that p0= p= ad quatities Q i = i k= Z k / Z k /, i,..., The asymptotic behavior of the process loga loga is give by Capéraà et al. 997 uder mild regularity coditios, ad could be used to perform a goodessoffit test, say, usig the Cramér vo Mises statistic 0 loga t/a t 2 dt Whe the margis are ukow, however, as is most ofte the case i practice, it would seem reasoable to use a variat Â of the same estimator, with Z i replaced by the pseudoobservatio Ẑ i = log R i +/log R i + S i i +, Before a proper test ca be developed, it will be ecessary to examie the asymptotic behavior of the process logâ t loga t This may be the object of future work. For additioal discussio o this geeral theme, refer to Abdous ad Ghoudi For the time beig, a useful graphical diagostic tool for extremevalue copulas may still cosist of drawig Â ad A o the same plot. Fig. 7 shows such a plot for the four families of extremevalue copulas retaied for this study. Here, the weight fuctio used was pt= t. The reaso for which o goodessoffit test could distiguish betwee these models is obvious from the graph: the geerators of the four families are ot oly fairly close to A, they are practically idetical. Coclusio Usig both a learig data set ad 85 aual records of volume ad peak from the Harricaa watershed, this paper has illustrated the various issues ivolved i characterizig, measurig, ad modelig depedece through copulas. The mai emphasis was JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007 / 365
20 i hydrology; see, e.g., De Michele ad Salvadori 2002, Favre et al. 2004, Salvadori ad De Michele 2004, De Michele et al. 2005, ad some of the papers i the curret issue of the Joural of Hydrologic Egieerig. It is hoped that this special issue, ad the preset paper, i particular, ca help foster the use of copula methodology i this field of sciece. Ackowledgmets Fig. 7. Plot of A solid lies ad A dashed lie for the followig families of extremevalue copulas: Gumbel Hougaard, Galambos, Hüsler Reiss, ad BB5 put o iferece ad testig procedures, may of which have just bee developed. Although the presetatio was limited to the case of two variables, most of the tools described here exted to the multidimesioal case. As the umber of variables icreases, of course, the itricacies of modelig become more complex, ad eve the costructio of appropriate copula models still poses serious difficulties. From the aalysis preseted here, it would appear that several copula families provide acceptable models of the depedece i the Harricaa River data. Not surprisigly, several of them are of the extremevalue type, ad it is ulikely that choosig betwee them would make ay serious differece for predictio purposes. If forced to express a preferece, a aalyst would probably wat to call o additioal criteria, such as model parsimoy, etc. As evideced by the material i the previous sectio, the theory surroudig goodessoffit testig for this particular class of copulas is still icomplete. For the data at had, Fig. 7 suggests that a asymmetric extremevalue copula model might possibly provide a somewhat better fit. Examples of such models metioed by Taw 988 are the asymmetric mixed ad logistic models. I the former At = t 3 + t 2 + t + with 0mi,+3 ad max+,+2; i the latter At = r t r + r t r /r + t + where 0, r. Khoudraji s device, described amog others i Geest et al. 998, could be used to geerate other asymmetric copula models whether extremevalue or ot. The problem of fittig a asymmetric copula to the Harricaa River data is left to the reader as a kowledge itegratio activity, ad the data set is available from the writers for that purpose. Users should keep i mid, however, that i copula matters as i ay other statistical modelig exercise, the pursuit of perfectio is illusory ad a balace should always be struck betwee fit ad parsimoy. The statistical literature o copula modelig is still growig rapidly. I recet years, umerous successful applicatios of this evolvig methodology have bee made, most otably i survival aalysis, actuarial sciece, ad fiace, but also quite recetly Partial fudig i support of this work was provided by the Natural Scieces ad Egieerig Research Coucil of Caada, the fods québécois de la recherche sur la ature et les techologies, the Istitut de fiace mathématique de Motréal, ad Hydro Québec. Refereces Abdous, B., Geest, C., ad Rémillard, B Depedece properties of metaelliptical distributios. Statistical modelig ad aalysis for complex data problems, Spriger, New York, 5. Abdous, B., ad Ghoudi, K Noparametric estimators of multivariate extreme depedece fuctios. J. Noparam. Stat., 78, Ali, M. M., Mikhail, N. N., ad Haq, M. S A class of bivariate distributios icludig the bivariate logistic. J. Multivariate Aal., 83, Bâ, K. M., DíazDelgado, C., ad Cârsteau, A Cofidece itervals of quatile i hydrology computed by a aalytical method. Natural Hazards, 24, 2. Barbe, P., Geest, C., Ghoudi, K., ad Rémillard, B O Kedall s process. J. Multivariate Aal., 582, Bertio, S Sulla dissomigliaza tra mutabili cicliche. Metro, 35 2, Biau, G., ad Wegkamp, M. H A ote o miimum distace estimatio of copula desities. Stat. Probab. Lett., 732, Bobée, B., ad Ashkar, F. 99. The gamma family ad derived distributios applied i hydrology, Water Resources, Littleto, Colo. Borkowf, C. B Computig the oull asymptotic variace ad the asymptotic relative efficiecy of Spearma s rak correlatio. Comput. Stat. Data Aal., 393, Capéraà, P., Fougères, A.L., ad Geest, C A oparametric estimatio procedure for bivariate extreme value copulas. Biometrika, 843, Capéraà, P., Fougères, A.L., ad Geest, C Bivariate distributios with give extreme value attractor. J. Multivariate Aal., 72, Che, X., ad Fa, Y Pseudolikelihood ratio tests for semiparametric multivariate copula model selectio. Ca. J. Stat., 333, Cherubii, U., Luciao, E., ad Vecchiato, W Copula methods i fiace, Wiley, New York. Clayto, D. G A model for associatio i bivariate life tables ad its applicatio i epidemiological studies of familial tedecy i chroic disease icidece. Biometrika, 65, 4 5. De Michele, C., ad Salvadori, G A geeralized Pareto itesity duratio model of storm raifall exploitig 2copulas. J. Geophys. Res., 08D2,. De Michele, C., Salvadori, G., Caossi, M., Petaccia, A., ad Rosso, R Bivariate statistical approach to check adequacy of dam spillway. J. Hydrol. Eg., 0, Deheuvels, P La foctio de dépedace empirique et ses propriétés: U test o paramétrique d idépedace. Bull. Cl. Sci., Acad. R. Belg., 656, / JOURNAL OF HYDROLOGIC ENGINEERING ASCE / JULY/AUGUST 2007
I. Chisquared Distributions
1 M 358K Supplemet to Chapter 23: CHISQUARED DISTRIBUTIONS, TDISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad tdistributios, we first eed to look at aother family of distributios, the chisquared distributios.
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More information3. Covariance and Correlation
Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationKey Ideas Section 81: Overview hypothesis testing Hypothesis Hypothesis Test Section 82: Basics of Hypothesis Testing Null Hypothesis
Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, Pvalue Type I Error, Type II Error, Sigificace Level, Power Sectio 81: Overview Cofidece Itervals (Chapter 7) are
More informationLesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) types of data scatter plots measure of directio measure of stregth Computatio covariatio of X ad Y uique variatio i X ad Y measurig
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationNonlife insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
Nolife isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More informationGregory Carey, 1998 Linear Transformations & Composites  1. Linear Transformations and Linear Composites
Gregory Carey, 1998 Liear Trasformatios & Composites  1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More information7. Sample Covariance and Correlation
1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y
More informationChapter 6: Variance, the law of large numbers and the MonteCarlo method
Chapter 6: Variace, the law of large umbers ad the MoteCarlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationMEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)
MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : pvalue
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 KolmogorovSmirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More information9.8: THE POWER OF A TEST
9.8: The Power of a Test CD91 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More informationLECTURE 13: Crossvalidation
LECTURE 3: Crossvalidatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Threeway data partitioi Itroductio to Patter Aalysis Ricardo GutierrezOsua Texas A&M
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationVladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More informationCHAPTER 3 THE TIME VALUE OF MONEY
CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all
More informationZTEST / ZSTATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
ZTEST / ZSTATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large TTEST / TSTATISTIC: used to test hypotheses about
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationTHE HEIGHT OF qbinary SEARCH TREES
THE HEIGHT OF qbinary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationARTICLE IN PRESS. Statistics & Probability Letters ( ) A Kolmogorovtype test for monotonicity of regression. Cecile Durot
STAPRO 66 pp:  col.fig.: il ED: MG PROD. TYPE: COM PAGN: Usha.N  SCAN: il Statistics & Probability Letters 2 2 2 2 Abstract A Kolmogorovtype test for mootoicity of regressio Cecile Durot Laboratoire
More informationAnalyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationGCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.
GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea  add up all
More informationUnit 20 Hypotheses Testing
Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect
More informationProbabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?
Probabilistic Egieerig Mechaics 4 (009) 577 584 Cotets lists available at ScieceDirect Probabilistic Egieerig Mechaics joural homepage: wwwelseviercom/locate/probegmech Do Roseblatt ad Nataf isoprobabilistic
More informationRobust and Resistant Regression
Chapter 13 Robust ad Resistat Regressio Whe the errors are ormal, least squares regressio is clearly best but whe the errors are oormal, other methods may be cosidered. A particular cocer is logtailed
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationWeek 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
More informationClass Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES  CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
More informationDistributions of Order Statistics
Chapter 2 Distributios of Order Statistics We give some importat formulae for distributios of order statistics. For example, where F k: (x)=p{x k, x} = I F(x) (k, k + 1), I x (a,b)= 1 x t a 1 (1 t) b 1
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationThe analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
More informationOverview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
More informationInference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval
Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT  Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio
More information1 Hypothesis testing for a single mean
BST 140.65 Hypothesis Testig Review otes 1 Hypothesis testig for a sigle mea 1. The ull, or status quo, hypothesis is labeled H 0, the alterative H a or H 1 or H.... A type I error occurs whe we falsely
More informationSubject CT5 Contingencies Core Technical Syllabus
Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationEkkehart Schlicht: Economic Surplus and Derived Demand
Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 200617 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät LudwigMaximiliasUiversität Müche Olie at http://epub.ub.uimueche.de/940/
More informationOn The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data
Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ
More informationCS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations
CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad
More informationHypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lieup for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
More informationPlugin martingales for testing exchangeability online
Plugi martigales for testig exchageability olie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
More informationSwaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps
Swaps: Costat maturity swaps (CMS) ad costat maturity reasury (CM) swaps A Costat Maturity Swap (CMS) swap is a swap where oe of the legs pays (respectively receives) a swap rate of a fixed maturity, while
More informationEntropy of bicapacities
Etropy of bicapacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uivates.fr JeaLuc Marichal Applied Mathematics
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS200609 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 6985295 Email: bcm1@cec.wustl.edu Supervised
More informationChapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing
Chapter 10 Hypothesis Tests Regardig a Parameter A secod type of statistical iferece is hypothesis testig. Here, rather tha use either a poit (or iterval) estimate from a simple radom sample to approximate
More informationMARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measuretheoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationNotes on Hypothesis Testing
Probability & Statistics Grishpa Notes o Hypothesis Testig A radom sample X = X 1,..., X is observed, with joit pmf/pdf f θ x 1,..., x. The values x = x 1,..., x of X lie i some sample space X. The parameter
More informationFOUNDATIONS OF MATHEMATICS AND PRECALCULUS GRADE 10
FOUNDATIONS OF MATHEMATICS AND PRECALCULUS GRADE 10 [C] Commuicatio Measuremet A1. Solve problems that ivolve liear measuremet, usig: SI ad imperial uits of measure estimatio strategies measuremet strategies.
More information, a Wishart distribution with n 1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematiskstatistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 00409 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationThe second difference is the sequence of differences of the first difference sequence, 2
Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationTO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC
TO: Users of the ACTEX Review Semiar o DVD for SOA Eam MLC FROM: Richard L. (Dick) Lodo, FSA Dear Studets, Thak you for purchasig the DVD recordig of the ACTEX Review Semiar for SOA Eam M, Life Cotigecies
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationDefinition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean
1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.
More informationCopulas and bivariate risk measures : an application to hedge funds
Copulas ad bivariate risk measures : a applicatio to hedge fuds Rihab BEDOUI Makram BEN DBABIS Jauary 2009 Abstract With hedge fuds, maagers develop risk maagemet models that maily aim to play o the effect
More informationSimple Linear Regression
Simple Liear Regressio We have bee itroduced to the otio that a categorical variable could deped o differet levels of aother variable whe we discussed cotigecy tables. We ll exted this idea to the case
More informationBASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
More information3. Greatest Common Divisor  Least Common Multiple
3 Greatest Commo Divisor  Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chisquare (χ ) distributio.
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio
More informationResearch Method (I) Knowledge on Sampling (Simple Random Sampling)
Research Method (I) Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
More informationLecture 10: Hypothesis testing and confidence intervals
Eco 514: Probability ad Statistics Lecture 10: Hypothesis testig ad cofidece itervals Types of reasoig Deductive reasoig: Start with statemets that are assumed to be true ad use rules of logic to esure
More informationNEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,
NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical
More informationExploratory Data Analysis
1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios
More informationConfidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.
Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).
More informationThe following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
More informationChapter 7  Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7  Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationwhere: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
More information.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
More informationDivide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015
CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a
More informationExample Consider the following set of data, showing the number of times a sample of 5 students check their per day:
Sectio 82: Measures of cetral tedecy Whe thikig about questios such as: how may calories do I eat per day? or how much time do I sped talkig per day?, we quickly realize that the aswer will vary from day
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More informationTheorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real oegative umber R, called the radius
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationEstimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
More informationSolutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
More information