7. Concepts in Probability, Statistics and Stochastic Modelling

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "7. Concepts in Probability, Statistics and Stochastic Modelling"

Transcription

1 7. Cocepts i Probability, Statistics ad Stochastic Modellig 1. Itroductio 169. Probability Cocepts ad Methods Radom Variables ad Distributios Expectatio Quatiles, Momets ad Their Estimators L-Momets ad Their Estimators Distributios of Radom Evets Parameter Estimatio Model Adequacy Normal ad Logormal Distributios Gamma Distributios Log-Pearso Type 3 Distributio Gumbel ad GEV Distributios L-Momet Diagrams Aalysis of Cesored Data Regioalizatio ad Idex-Flood Method Partial Duratio Series Stochastic Processes ad Time Series Describig Stochastic Processes Markov Processes ad Markov Chais Properties of Time-Series Statistics Sythetic Streamflow Geeratio Itroductio Streamflow Geeratio Models A Simple Autoregressive Model Reproducig the Margial Distributio Multivariate Models Multi-Seaso, Multi-Site Models Disaggregatio Models Aggregatio Models Stochastic Simulatio Geeratig Radom Variables River Basi Simulatio The Simulatio Model Simulatio of the Basi Iterpretig Simulatio Output Coclusios Refereces 3

2 169 7 Cocepts i Probability, Statistics ad Stochastic Modellig Evets that caot be predicted precisely are ofte called radom. May if ot most of the iputs to, ad processes that occur i, water resources systems are to some extet radom. Hece, so too are the outputs or predicted impacts, ad eve people s reactios to those outputs or impacts. To igore this radomess or ucertaity is to igore reality. This chapter itroduces some of the commoly used tools for dealig with ucertaity i water resources plaig ad maagemet. Subsequet chapters illustrate how these tools are used i various types of optimizatio, simulatio ad statistical models for impact predictio ad evaluatio. 1. Itroductio Ucertaity is always preset whe plaig, developig, maagig ad operatig water resources systems. It arises because may factors that affect the performace of water resources systems are ot ad caot be kow with certaity whe a system is plaed, desiged, built, maaged ad operated. The success ad performace of each compoet of a system ofte depeds o future meteorological, demographic, ecoomic, social, techical, ad political coditios, all of which may ifluece future beefits, costs, evirometal impacts, ad social acceptability. Ucertaity also arises due to the stochastic ature of meteorological processes such as evaporatio, raifall ad temperature. Similarly, future populatios of tows ad cities, per capita water-usage rates, irrigatio patters ad priorities for water uses, all of which affect water demad, are ever kow with certaity. There are may ways to deal with ucertaity. Oe, ad perhaps the simplest, approach is to replace each ucertai quatity either by its average (i.e., its mea or expected value), its media, or by some critical (e.g., worst-case ) value, ad the proceed with a determiistic approach. Use of expected or media values of ucertai quatities may be adequate if the ucertaity or variatio i a quatity is reasoably small ad does ot critically affect the performace of the system. If expected or media values of ucertai parameters or variables are used i a determiistic model, the plaer ca the assess the importace of ucertaity by meas of sesitivity aalysis, as is discussed later i this ad the two subsequet chapters. Replacemet of ucertai quatities by either expected, media or worst-case values ca grossly affect the evaluatio of project performace whe importat parameters are highly variable. To illustrate these issues, cosider the evaluatio of the recreatio potetial of a reservoir. Table 7.1 shows that the elevatio of the water surface varies over time depedig o the iflow ad demad for water. The table idicates the pool levels ad their associated probabilities as well as the expected use of the recreatio facility with differet pool levels. The average pool level L is simply the sum of each possible pool level times its probability, or L 10(0.10) 0(0.5) 30(0.30) 40(0.5) 50(0.10) 30 (7.1) This pool level correspods to 100 visitor-days per day: VD(L ) 100 visitor-days per day (7.) A worst-case aalysis might select a pool level of te as a critical value, yieldig a estimate of system performace equal to 100 visitor-days per day: VD(L low ) VD(10) 5 visitor-days per day (7.3)

3 170 Water Resources Systems Plaig ad Maagemet possible pool levels probability of each level recreatio potetial i visitor-days per day for reservoir with differet pool levels Table 7.1. Data for determiig reservoir recreatio potetial. Neither of these values is a good approximatio of the average visitatio rate, that is VD 0.10 VD(10) 0.5 VD(0) 0.30 VD(30) 0.5 VD(40) 0.10 VD(50) 0.10(5) 0.5(75) 0.30(100) 0.5(80) 0.10(70) (7.4) 78.5 visitor-days per day Clearly, the average visitatio rate, VD 78.5, the visitatio rate correspodig to the average pool level VD(L ) 100, ad the worst-case assessmet VD(L low ) 5, are very differet. Usig oly average values i a complex model ca produce a poor represetatio of both the average performace ad the possible performace rage. Whe importat quatities are ucertai, a comprehesive aalysis requires a evaluatio of both the expected performace of a project ad the risk ad possible magitude of project failures i a physical, ecoomic, ecological ad/or social sese. This chapter reviews may of the methods of probability ad statistics that are useful i water resources plaig ad maagemet. Sectio is a codesed summary of the importat cocepts ad methods of probability ad statistics. These cocepts are applied i this ad subsequet chapters of this book. Sectio 3 presets several probability distributios that are ofte used to model or describe the distributio of ucertai quatities. The sectio also discusses methods for fittig these distributios usig historical iformatio, ad methods of assessig whether the distributios are E01101a adequate represetatios of the data. Sectios 4, 5 ad 6 expad upo the use of these mathematical models, ad discuss alterative parameter estimatio methods. Sectio 7 presets the basic ideas ad cocepts of the stochastic processes or time series. These are used to model streamflows, raifall, temperature or other pheomea whose values chage with time. The sectio cotais a descriptio of Markov chais, a special type of stochastic process used i may stochastic optimizatio ad simulatio models. Sectio 8 illustrates how sythetic flows ad other time-series iputs ca be geerated for stochastic simulatios. Stochastic simulatio is itroduced with a example i Sectio 9. May topics receive oly brief treatmet i this itroductory chapter. Additioal iformatio ca be foud i applied statistical texts or book chapters such as Bejami ad Corell (1970), Haa (1977), Kite (1988), Stediger et al. (1993), Kottegoda ad Rosso (1997), ad Ayyub ad McCue (00).. Probability Cocepts ad Methods This sectio itroduces the basic cocepts ad defiitios used i aalyses ivolvig probability ad statistics. These cocepts are used throughout this chapter ad later chapters i the book..1. Radom Variables ad Distributios The basic cocept i probability theory is that of the radom variable. By defiitio, the value of a radom variable caot be predicted with certaity. It depeds, at least i part, o the outcome of a chace evet. Examples are: (1) the umber of years util the flood stage of a river washes away a small bridge; () the umber of times durig a reservoir s life that the level of the pool will drop below a specified level; (3) the raifall depth ext moth; ad (4) ext year s maximum flow at a gauge site o a uregulated stream. The values of all of these radom evets or variables are ot kowable before the evet has occurred. Probability ca be used to describe the likelihood that these radom variables will equal specific values or be withi a give rage of specific values.

4 Cocepts i Probability, Statistics ad Stochastic Modellig 171 The first two examples illustrate discrete radom variables, radom variables that take o values that are discrete (such as positive itegers). The secod two examples illustrate cotiuous radom variables. Cotiuous radom variables take o ay values withi a specified rage of values. A property of all cotiuous radom variables is that the probability that the value of ay of those radom variables will equal some specific umber ay specific umber is always zero. For example, the probability that the total raifall depth i a moth will be exactly 5.0 cm is zero, while the probability that the total raifall will lie betwee 4 ad 6 cm could be ozero. Some radom variables are combiatios of cotiuous ad discrete radom variables. Let X deote a radom variable ad x a possible value of that radom variable X. Radom variables are geerally deoted by capital letters, ad particular values they take o by lowercase letters. For ay real-valued radom variable X, its cumulative distributio fuctio F X (x), ofte deoted as just the cdf, equals the probability that the value of X is less tha or equal to a specific value or threshold x: F X (x) Pr[X x] (7.5) This cumulative distributio fuctio F X (x) is a odecreasig fuctio of x because Pr[X x] Pr[X x δ] for δ 0 (7.6) I additio, lim F ( x) 1 X x ad lim F ( x) 0 X x (7.7) (7.8) The first limit equals 1 because the probability that X takes o some value less tha ifiity must be uity; the secod limit is zero because the probability that X takes o o value must be zero. The left half of Figure 7.1 illustrates the cumulative distributio fuctio (upper) ad its derivative, the probability desity fuctio, f X (x), (lower) of a cotiuous radom variable X. If X is a real-valued discrete radom variable that takes o specific values x 1, x,, the the probability mass fuctio p X (x i ) is the probability X takes o the value x i. p X (x i ) Pr[X x i ] (7.9) The value of the cumulative distributio fuctio F X (x) for a discrete radom variable is the sum of the probabilities of all x i that are less tha or equal to x. FX( x) px( xi) (7.10) The right half of Figure 7.1 illustrates the cumulative distributio fuctio (upper) ad the probability mass fuctio p X (x i ) (lower) of a discrete radom variable. The probability desity fuctio f X (x) (lower left plot i Figure 7.1) for a cotiuous radom variable X is the aalogue of the probability mass fuctio (lower right plot i Figure 7.1) of a discrete radom variable X. The probability desity fuctio, ofte called the pdf, is the derivative of the cumulative distributio fuctio so that: dfx ( x) fx ( x) 0 (7.11) dx It is ecessary to have (7.1) Equatio 7.1 idicates that the area uder the probability desity fuctio is 1. If a ad b are ay two costats, the cumulative distributio fuctio or the desity fuctio may be used to determie the probability that X is greater tha a ad less tha or equal to b where Pr[ a X b] F ( b) F ( a) f ( x) dx (7.13) The area uder a probability desity fuctio specifies the relative frequecy with which the value of a cotiuous radom variable falls withi ay specified rage of values, that is, ay iterval alog the horizotal axis. Life is seldomly so simple that oly a sigle quatity is ucertai. Thus, the joit probability distributio of two or more radom variables ca also be defied. If X ad Y are two cotiuous real-valued radom variables, their joit cumulative distributio fuctio is: F ( x, y) Pr[ X x ad Y y] XY x x fx ( x) 1 i x If two radom variables are discrete, the y X X X a f (, u v) dudv XY F ( x, y) p ( x, y ) XY XY i i xi x yi y b (7.14) (7.15)

5 17 Water Resources Systems Plaig ad Maagemet 1.0 a 1.0 b E0057a Figure 7.1. Cumulative distributio ad probability desity or mass fuctios of radom variables: (a) cotiuous distributios; (b) discrete distributios. F X (x) possible values of a radom variable X x F X (x) possible values of a radom variable X x F X (x) F X (x) possible values of a radom variable X x possible values of a radom variable X x where the joit probability mass fuctio is: p XY (x i, y i ) Pr[X x i ad Y y i ] (7.16) If X ad Y are two radom variables, ad the distributio of X is ot iflueced by the value take by Y, ad vice versa, the the two radom variables are said to be idepedet. For two idepedet radom variables X ad Y, the joit probability that the radom variable X will be betwee values a ad b ad that the radom variable Y will be betwee values c ad d is simply the product of those separate probabilities. Pr[a X b ad c Y d] Pr[a X b] Pr[c Y d] (7.17) This applies for ay values a, b, c, ad d. As a result, F XY (x, y) F X (x)f Y (y) (7.18) which implies for cotiuous radom variables that f XY (x, y) f X (x)f Y (y) (7.19) ad for discrete radom variables that p XY (x, y) p X (x)p Y (y) (7.0) Other useful cocepts are those of the margial ad coditioal distributios. If X ad Y are two radom variables whose joit cumulative distributio fuctio F XY (x, y) has bee specified, the F X (x), the margial cumulative distributio of X, is just the cumulative distributio of X igorig Y. The margial cumulative distributio fuctio of X equals F X (x) Pr[X x] lim F ( x, y) (7.1) where the limit is equivalet to lettig Y take o ay value. If X ad Y are cotiuous radom variables, the margial desity of X ca be computed from f ( x) f ( x, y) dy X XY y XY (7.) The coditioal cumulative distributio fuctio is the cumulative distributio fuctio for X give that Y has take a particular value y. Thus the value of Y may have bee observed ad oe is iterested i the resultig coditioal distributio for the so far uobserved value of X. The coditioal cumulative distributio fuctio for cotiuous radom variables is give by

6 Cocepts i Probability, Statistics ad Stochastic Modellig 173 fxy (, s y) ds FX Y( x y) Pr[ X x Y y] (7.3) fy ( y) where the coditioal desity fuctio is fxy ( x, y) fx Y( x y) (7.4) fy ( y) For discrete radom variables, the probability of observig X x, give that Y y equals pxy ( x, y) px Y( x y) (7.5) py ( y) These results ca be exteded to more tha two radom variables. Kottegoda ad Rosso (1997) provide more detail... Expectatio Kowledge of the probability desity fuctio of a cotiuous radom variable, or of the probability mass fuctio of a discrete radom variable, allows oe to calculate the expected value of ay fuctio of the radom variable. Such a expectatio may represet the average raifall depth, average temperature, average demad shortfall or expected ecoomic beefits from system operatio. If g is a real-valued fuctio of a cotiuous radom variable X, the expected value of g(x) is: E[ gx ( )] gxf ( ) ( x) dx whereas for a discrete radom variable E[ gx ( )] gx ( ) p ( x) (7.6) (7.7) The expectatio operator,e[ ], has several importat properties. I particular, the expectatio of a liear fuctio of X is a liear fuctio of the expectatio of X. Thus, if a ad b are two o-radom costats, E[a bx] a be[x] (7.8) The expectatio of a fuctio of two radom variables is give by E[( gxy, )] gxyf (, ) (, xy) dxdy or i XY E[( gxy, )] gx (, y) p ( x, y ) i i X i j X i i x XY i i (7.9) If X ad Y are idepedet, the expectatio of the product of a fuctio g( ) of X ad a fuctio h( ) of Y is the product of the expectatios: E[g(X) h(y)] E[g(X)] E[h(Y)] (7.30) This follows from substitutio of Equatios 7.19 ad 7.0 ito Equatio Quatiles, Momets ad Their Estimators While the cumulative distributio fuctio provides a complete specificatio of the properties of a radom variable, it is useful to use simpler ad more easily uderstood measures of the cetral tedecy ad rage of values that a radom variable may assume. Perhaps the simplest approach to describig the distributio of a radom variable is to report the value of several quatiles. The pth quatile of a radom variable X is the smallest value x p such that X has a probability p of assumig a value equal to or less tha x p : Pr[X x p ] p Pr[X x p ] (7.31) Equatio 7.31 is writte to isist if at some value x p, the cumulative probability fuctio jumps from less tha p to more tha p, the that value x p will be defied as the pth quatile eve though F X (x p ) p. If X is a cotiuous radom variable, the i the regio where f X (x) 0, the quatiles are uiquely defied ad are obtaied by solutio of F X (x p ) p (7.3) Frequetly reported quatiles are the media x 0.50 ad the lower ad upper quartiles x 0.5 ad x The media describes the locatio or cetral tedecy of the distributio of X because the radom variable is, i the cotiuous case, equally likely to be above as below that value. The iterquartile rage [x 0.5, x 0.75 ] provides a easily uderstood descriptio of the rage of values that the radom variable might assume. The pth quatile is also the 100 p percetile. I a give applicatio particularly whe safety is of cocer it may be appropriate to use other quatiles. I floodplai maagemet ad the desig of flood cotrol structures, the 100-year flood x 0.99 is a commoly selected desig value. I water quality maagemet, a river s miimum seve-day-average low flow expected oce i te years is commoly used i the Uited States as the

7 174 Water Resources Systems Plaig ad Maagemet critical plaig value: Here the oe-i-te year value is the 10 th percetile of the distributio of the aual miima of the seve-day average flows. The atural sample estimate of the media x 0.50 is the media of the sample. I a sample of size where x (1) x () x () are the observatios ordered by magitude, ad for a o-egative iteger k such that k (eve) or k 1 (odd), the sample estimate of the media is x for k 1 ( k 1 ) xˆ x( k) x( k 1 ) for k (7.33) Sample estimates of other quatiles may be obtaied by usig x (i) as a estimate of x q for q i/( 1) ad the iterpolatig betwee observatios to obtai xˆp for the desired p. This oly works for 1/( 1) p /( 1) ad ca yield rather poor estimates of x p whe ( 1)p is ear either 1 or. A alterative approach is to fit a reasoable distributio fuctio to the observatios, as discussed i Sectio 3, ad the estimate x p usig Equatio 7.3, where F X (x) is the fitted distributio. Aother simple ad commo approach to describig a distributio s cetre, spread ad shape is by reportig the momets of a distributio. The first momet about the origi is the mea of X ad is give by µ X E X [ X] xf ( x) dx (7.34) Momets other tha the first are ormally measured about the mea. The secod momet measured about the mea is the variace, deoted Var(X) or σ X, where: σ X Var( X) E[( X µ X) ] (7.35) The stadard deviatio σ X is the square root of the variace. While the mea µ X is a measure of the cetral value of X, the stadard deviatio σ X is a measure of the spread of the distributio of X about µ X. Aother measure of the variability i X is the coefficiet of variatio, X CV X σ (7.36) µ X The coefficiet of variatio expresses the stadard deviatio as a proportio of the mea. It is useful for comparig the relative variability of the flow i rivers of differet sizes, or of raifall variability i differet regios whe the radom variable is strictly positive. The third momet about the mea, deoted λ X, measures the asymmetry, or skewess, of the distributio: λ X E[(X µ X ) 3 ] (7.37) Typically, the dimesioless coefficiet of skewess γ X is reported rather tha the third momet λ X. The coefficiet of skewess is the third momet rescaled by the cube of the stadard deviatio so as to be dimesioless ad hece uaffected by the scale of the radom variable: γ (7.38) Streamflows ad other atural pheomea that are ecessarily o-egative ofte have distributios with positive skew coefficiets, reflectig the asymmetric shape of their distributios. Whe the distributio of a radom variable is ot kow, but a set of observatios {x 1,,x } is available, the momets of the ukow distributio of X ca be estimated based o the sample values usig the followig equatios. The sample estimate of the mea: X The sample estimate of the variace: σˆx λ σ X X 3 X i 1 X / 1 SX Xi X ( ) ( 1) The sample estimate of skewess: λˆ ( X X i X) 3 ( 1)( ) (7.39a) (7.39b) (7.39c) The sample estimate of the coefficiet of variatio: CV ˆ X SX/ X (7.39d) The sample estimate of the coefficiet of skewess: γˆx λˆx /S X 3 i i 1 i 1 (7.39e) The sample estimate of the mea ad variace are ofte deoted as x _ ad s x where the lower case letters are used whe referrig to a specific sample. All of these

8 Cocepts i Probability, Statistics ad Stochastic Modellig 175 sample estimators provide oly estimates of actual or true values. Uless the sample size is very large, the differece betwee the estimators ad the true values of µ X, σ X, λx, CVX, ad γx may be large. I may ways, the field of statistics is about the precisio of estimators of differet quatities. Oe wats to kow how well the mea of twety aual raifall depths describes the true expected aual raifall depth, or how large the differece betwee the estimated 100-year flood ad the true 100-year flood is likely to be. As a example of the calculatio of momets, cosider the flood data i Table 7.. These data have the followig sample momets: _ x s X CV X 0.55 γˆx 0.71 As oe ca see, the data are positively skewed ad have a relatively large coefficiet of variace. Whe discussig the accuracy of sample estimates, two quatities are ofte cosidered, bias ad variace. A estimator θˆ of a kow or ukow quatity θ is a fuctio of the observed values of the radom variable X, say i differet time periods, X 1,,X, that will be available to estimate the value of θ; θˆ may be writte θˆ [X 1, X,, X ] to emphasize that θˆ itself is a radom variable. Its value depeds o the sample values of the radom variable that will be observed. A estimator θˆ of a quatity θ is biased if E[θˆ] θ ad ubiased if E[θˆ] θ. The quatity {E[θˆ] θ} is geerally called the bias of the estimator. A ubiased estimator has the property that its expected value equals the value of the quatity to be estimated. The sample mea is a ubiased estimate of the populatio mea µ X because 1 1 E[ X] E Xi E[ Xi] µ X (7.40) i 1 i 1 The estimator S X of the variace of X is a ubiased estimator of the true variace σ X for idepedet observatios (Bejami ad Corell, 1970): E S X σ X (7.41) However, the correspodig estimator of the stadard deviatio, S X, is i geeral a biased estimator of σ X because E[ S X ] date σ X (7.4) The secod importat statistic ofte used to assess the accuracy of a estimator θˆ is the variace of the estimator Var θˆ, which equals E{(θˆ E[θˆ]) }. For the mea of a set of idepedet observatios, the variace of the sample mea is: X Var(X) σ discharge m 3/s * Value for 1945 is missig. date discharge m 3/s Table 7.. Aual maximum discharges o Magra River, Italy, at Calamazza, *. (7.43) It is commo to call σ x / the stadard error of xˆ rather tha its stadard deviatio. The stadard error of a average is the most commoly reported measure of its precisio. The bias measures the differece betwee the average value of a estimator ad the quatity to be estimated. E01101b

9 176 Water Resources Systems Plaig ad Maagemet The variace measures the spread or width of the estimator s distributio. Both cotribute to the amout by which a estimator deviates from the quatity to be estimated. These two errors are ofte combied ito the mea square error. Uderstadig that θ is fixed ad the estimator θˆ is a radom variable, the mea squared error is the expected value of the squared distace (error) betwee θ ad its estimator θˆ: MSE(θˆ) E[(θˆ θ) ] E{[θˆ E(θˆ)] [E(θˆ) θ]} [Bias] Var(θˆ) (7.44) where [Bias] is E(θˆ) θ. Equatio 7.44 shows that the MSE, equal to the expected average squared deviatio of the estimator θˆ from the true value of the parameter θ, ca be computed as the bias squared plus the variace of the estimator. MSE is a coveiet measure of how closely θˆ approximates θ because it combies both bias ad variace i a logical way. Estimatio of the coefficiet of skewess γ X provides a good example of the use of the MSE for evaluatig the total deviatio of a estimate from the true populatio value. The sample estimate γˆx of γ X is ofte biased, has a large variace, ad its absolute value was show by Kirby (1974) to be bouded by the square root of the sample size : γˆx (7.45) The bouds do ot deped o the true skew, γ X. However, the bias ad variace of γˆx do deped o the sample size ad the actual distributio of X. Table 7.3 cotais the expected value ad stadard deviatio of the estimated coefficiet of skewess γˆx whe X has either a ormal distributio, for which γ X 0, or a gamma distributio with γ X 0.5, 0.50, 1.00,.00, or These values are adapted from Wallis et al. (1974 a,b) who employed momet estimators slightly differet tha those i Equatio For the ormal distributio, E[γˆ] 0 ad Var [γˆx] 5/. I this case, the skewess estimator is ubiased but highly variable. I all the other cases i Table 7.3, the skewess estimator is biased. To illustrate the magitude of these errors, cosider the mea square error of the skew estimator γˆx calculated from a sample of size 50 whe X has a gamma distributio with γ X 0.50, a reasoable value for aual streamflows. The expected value of γˆx is 0.45; its variace equals (0.37), its stadard deviatio is squared. Usig Equatio 7.44, the mea square error of γˆx is: MSE(γˆX) ( ) ( 037. ) (7.46) A ubiased estimate of γ X is simply (0.50/0.45) γˆx. Here the estimator provided by Equatio 7.39e has bee scaled to elimiate bias. This ubiased estimator has a mea squared error of: MSE 050. ˆ 048. γ X 050. ( ) ( 037. ) (7.47) The mea square error of the ubiased estimator of γˆx is larger tha the mea square error of the biased estimate. Ubiasig γˆx results i a larger mea square error for all the cases listed i Table 7.3 except for the ormal distributio for which γ X 0, ad the gamma distributio with γ X As show here for the skew coefficiet, biased estimators ofte have smaller mea square errors tha ubiased estimators. Because the mea square error measures the total average deviatio of a estimator from the quatity beig estimated, this result demostrates that the strict or uquestioig use of ubiased estimators is ot advisable. Additioal iformatio o the samplig distributio of quatiles ad momets is cotaied i Stediger et al. (1993)..4. L-Momets ad Their Estimators L-momets are aother way to summarize the statistical properties of hydrological data based o liear combiatios of the origial observatios (Hoskig, 1990). Recetly, hydrologists have foud that regioalizatio methods (to be discussed i Sectio 5) usig L-momets are superior to methods usig traditioal momets (Hoskig ad Wallis, 1997; Stediger ad Lu, 1995). L-momets have also proved useful for costructio of goodess-of-fit tests (Hoskig et al., 1985; Chowdhury et al., 1991; Fill ad Stediger, 1995), measures of regioal homogeeity ad distributio selectio methods (Vogel ad Feessey, 1993; Hoskig ad Wallis, 1997).

10 Cocepts i Probability, Statistics ad Stochastic Modellig 177 distributio of X expected value of γ X sample size E01101c Table 7.3. Samplig properties of coefficiet of skewess estimator. Source: Wallis et al. (1974b) who oly divided by i the estimators of the momets, whereas i Equatios 7.39b ad 7.39c, we use the geerally-adopted coefficiets of 1/( 1) ad /( 1)( ) for the variace ad skew. ormal gamma γ X γ X γ X γ X γ X γ X = = = = = = upper boud o skew ^ stadard deviatio of γ X distributio of X sample size ormal gamma γ X γ X γ X γ X γ X γ X = = = = = = The first L-momet desigated as λ 1 is simply the arithmetic mea: λ 1 E[X] (7.48) Now let X (i ) be the i th largest observatio i a sample of size (i correspods to the largest). The, for ay distributio, the secod L-momet, λ, is a descriptio of scale based upo the expected differece betwee two radomly selected observatios: λ (1/) E[X ( 1) X (1 ) ] (7.49) Similarly, L-momet measures of skewess ad kurtosis use three ad four radomly selected observatios, respectively. λ 3 (1/3) E[X (3 3) X ( 3) X (1 3) ] (7.50) λ 4 (1/4) E[X (4 4) 3X (3 4) 3X ( 4) X (1 4) ] (7.51) Sample L-momet estimates are ofte computed usig itermediate statistics called probability weighted momets (PWMs). The r th probability weighted momet is defied as: β r E{X[F(X)] r } (7.5) where F(X) is the cumulative distributio fuctio of X. Recommeded (Ladwehr et al., 1979; Hoskig ad Wallis, 1995) ubiased PWM estimators, b r, of β r are computed as: b0 X 1 b1 ( j 1) X ( j ) ( 1) j 1 b ( j 1)( j ) X ( j ) ( 1)( ) j 3 (7.53)

11 178 Water Resources Systems Plaig ad Maagemet These are examples of the geeral formula for computig estimators b r of β r. 1 j 1 1 br X j r r 1 j 1 r 1 r X( j) r 1 (7.54) for r 1,, 1. L-momets are easily calculated i terms of PWMs usig: λ 1 β 0 j r 1 j r 1 λ β 1 β 0 ( ) λ 3 6β 6β 1 β 0 λ 4 0β 3 30β 1β 1 β 0 (7.55) Wag (1997) provides formulas for directly calculatig L-momet estimators of λ r. Measures of the coefficiet of variatio, skewess ad kurtosis of a distributio ca be computed with L-momets, as they ca with traditioal product momets. Where skew primarily measures the asymmetry of a distributio, the kurtosis is a additioal measure of the thickess of the extreme tails. Kurtosis is particularly useful for comparig symmetric distributios that have a skewess coefficiet of zero. Table 7.4 provides defiitios of the traditioal coefficiet of variatio, coefficiet of skewess ad coefficiet of kurtosis, as well as the L-momet, L-coefficiet of variatio, L-coefficiet of skewess ad L-coefficiet of kurtosis. The flood data i Table 7. ca be used to provide a example of L-momets. Equatio 7.53 yields estimates of the first three Probability Weighted Momets: b 0 1,549.0 b b (7.56) Recall that b 0 is just the sample average x _. The sample L-momets are easily calculated usig the probability weighted momets. Oe obtais: λˆ1 b 0 1,549 λˆ b 1 b λˆ3 6b 6b 1 b 0 80 (7.55) Thus, the sample estimates of the L-coefficiet of variatio, t, ad L-coefficiet of skewess, t 3, are: t 0.95 t (7.58) Table 7.4. Defiitios of dimesioless product-momet ad L-momet ratios. ame commo symbol defiitio E01101d product-momet ratios coefficiet of variatio skewess kurtosis CVX γx κ X σx / µ X E [ (X -µ X ) 3 ] / σ X E [ (X -µ X ) 4 ] / σ X 3 4 L-momet ratios * L-coefficiet of variatio * L-CV, τ skewess L-skewess, τ kurtosis L-kurtosis, τ 3 4 λ / λ / λ λ / λ Hoskig ad Wallis (1997) use τ istead of τ to represet the L-CV ratio 3 4 λ 1

12 Cocepts i Probability, Statistics ad Stochastic Modellig Distributios of Radom Evets A frequet task i water resources plaig is the developmet of a model of some probabilistic or stochastic pheomea such as streamflows, flood flows, raifall, temperatures, evaporatio, sedimet or utriet loads, itrate or orgaic compoud cocetratios, or water demads. This ofte requires oe to fit a probability distributio fuctio to a set of observed values of the radom variable. Sometimes, oe s immediate objective is to estimate a particular quatile of the distributio, such as the 100-year flood, 50-year six-hour-raifall depth, or the miimum seve-day-average expected oce-i-te-year flow. The the fitted distributio ca supply a estimate of that quatity. I a stochastic simulatio, fitted distributios are used to geerate possible values of the radom variable i questio. Rather tha fittig a reasoable ad smooth mathematical distributio, oe could use the empirical distributio represeted by the data to describe the possible values that a radom variable may assume i the future ad their frequecy. I practice, the true mathematical form for the distributio that describes the evets is ot kow. Moreover, eve if it was, its fuctioal form may have too may parameters to be of much practical use. Thus, usig the empirical distributio represeted by the data itself has substatial appeal. Geerally, the free parameters of the theoretical distributio are selected (estimated) so as to make the fitted distributio cosistet with the available data. The goal is to select a physically reasoable ad simple distributio to describe the frequecy of the evets of iterest, to estimate that distributio s parameters, ad ultimately to obtai quatiles, performace idices ad risk estimates of satisfactory accuracy for the problem at had. Use of a theoretical distributio has several advatages over use of the empirical distributio: It presets a smooth iterpretatio of the empirical distributio. As a result quatiles, performace idices ad other statistics computed usig the fitted distributio should be more accurate tha those computed with the empirical distributio. It provides a compact ad easy-to-use represetatio of the data. It is likely to provide a more realistic descriptio of the rage of values that the radom variable may assume ad their likelihood. For example, by usig the empirical distributio, oe implicitly assumes that o values larger or smaller tha the sample maximum or miimum ca occur. For may situatios, this is ureasoable. Ofte oe eeds to estimate the likelihood of extreme evets that lie outside the rage of the sample (either i terms of x values or i terms of frequecy). Such extrapolatio makes little sese with the empirical distributio. I may cases, oe is ot iterested i the values of a radom variable X, but istead i derived values of variables Y that are fuctios of X. This could be a performace fuctio for some system. If Y is the performace fuctio, iterest might be primarily i its mea value E[Y], or the probability some stadard is exceeded, Pr{Y stadard}. For some theoretical X-distributios, the resultig Y-distributio may be available i closed form, thus makig the aalysis rather simple. (The ormal distributio works with liear models, the logormal distributio with product models, ad the gamma distributio with queuig systems.) This sectio provides a brief itroductio to some useful techiques for estimatig the parameters of probability distributio fuctios ad for determiig if a fitted distributio provides a reasoable or acceptable model of the data. Sub-sectios are also icluded o families of distributios based o the ormal, gamma ad geeralized-extreme-value distributios. These three families have foud frequet use i water resources plaig (Kottegoda ad Rosso, 1997) Parameter Estimatio Give a set of observatios to which a distributio is to be fit, oe first selects a distributio fuctio to serve as a model of the distributio of the data. The choice of a distributio may be based o experiece with data of that type, some uderstadig of the mechaisms givig rise to the data, ad/or examiatio of the observatios themselves. Oe ca the estimate the parameters of the chose distributio ad determie if the fitted distributio provides a acceptable model of the data. A model is geerally judged to be uacceptable if it is ulikely that

13 180 Water Resources Systems Plaig ad Maagemet oe could have observed the available data were they actually draw from the fitted distributio. I may cases, good estimates of a distributio s parameters are obtaied by the maximum-likelihoodestimatio procedure. Give a set of idepedet observatios {x 1,, x } of a cotiuous radom variable X, the joit probability desity fuctio for the observatios is: fx x x 1, X, X3,, X( 1,, θ ) = f ( x θ) f ( x θ) f ( x θ) X 1 X X (7.59) where θ is the vector of the distributio s parameters. The maximum likelihood estimator of θ is that vector θ which maximizes Equatio 7.59 ad thereby makes it as likely as possible to have observed the values {x 1,, x }. Cosiderable work has goe ito studyig the properties of maximum likelihood parameter estimates. Uder rather geeral coditios, asymptotically the estimated parameters are ormally distributed, ubiased ad have the smallest possible variace of ay asymptotically ubiased estimator (Bickel ad Doksum, 1977). These, of course, are asymptotic properties, valid for large sample sizes. Better estimatio procedures, perhaps yieldig biased parameter estimates, may exist for small sample sizes. Stediger (1980) provides such a example. Still, maximum likelihood procedures are recommeded with moderate ad large samples, eve though the iterative solutio of oliear equatios is ofte required. A example of the maximum likelihood procedure for which closed-form expressios for the parameter estimates are obtaied is provided by the logormal distributio. The probability desity fuctio of a logormally distributed radom variable X is: 1 1 fx( x) exp [l( x) µ ] x πσ σ (7.60) Here, the parameters µ ad σ are the mea ad variace of the logarithm of X, ad ot of X itself. Maximizig the logarithm of the joit desity for {x 1,,x } is more coveiet tha maximizig the joit probability desity itself. Hece, the problem ca be expressed as the maximizatio of the log-likelihood fuctio L l f[( x µσ, )] l( xi π ) l( σ ) (7.61) The maximum ca be obtaied by equatig to zero the partial derivatives L/ µ ad L/ σ whereby oe obtais: L 1 0 [l( xi) µ ] µ σ i 1 L [l( xi) µ ] σ σ σ These equatios yield the estimators 1 µˆ l( x i ) σˆ (7.6) (7.63) The secod-order coditios for a maximum are met ad these values maximize Equatio It is useful to ote that if oe defies a ew radom variable Y l(x), the the maximum likelihood estimators of the parameters µ ad σ, which are the mea ad variace of the Y distributio, are the sample estimators of the mea ad variace of Y: µˆ y _ i 1 i 1 l f( x µσ, ) i i 1 i 1 1 [l( x i ) µ ˆ ] i 1 i i 1 1 l( ) σ [ x i µ ] i 1 σˆ [( 1)/]S Y (7.64) The correctio [( 1)/] i this last equatio is ofte eglected. The secod commoly used parameter estimatio procedure is the method of momets. The method of momets is ofte a quick ad simple method for obtaiig parameter estimates for may distributios. For a distributio with m 1, or 3 parameters, the first m momets of the postulated distributio i Equatios 7.34, 7.35 ad 7.37 are equated to the estimates of those momets calculated usig Equatios The resultig oliear equatios are solved for the ukow parameters.

14 Cocepts i Probability, Statistics ad Stochastic Modellig 181 For the logormal distributio, the mea ad variace of X as a fuctio of the parameters µ ad σ are give by 1 µ X exp µ σ σ exp( µ σ)[exp ( σ) 1] X (7.65) Substitutig x _ for µ X ad s X for σ X ad solvig for µ ad σ oe obtais σˆ l s / x x 1 µˆ l l x σˆ (7.66) 1 s / x The data i Table 7. provide a illustratio of both fittig methods. Oe ca easily compute the sample mea ad variace of the logarithms of the flows to obtai µˆ 7.0 ( 1 ) X X σˆ (0.565) (7.67) Alteratively, the sample mea ad variace of the flows themselves are x _ s X 661,800 (813.5) (7.68) Substitutig those two values i Equatio 7.66 yields µˆ 7.4 σ X (0.4935) (7.69) Method of momets ad maximum likelihood are just two of may possible estimatio methods. Just as method of momets equates sample estimators of momets to populatio values ad solves for a distributio s parameters, oe ca simply equate L-momet estimators to populatio values ad solve for the parameters of a distributio. The resultig method of L-momets has received cosiderable attetio i the hydrological literature (Ladwehr et al., 1978; Hoskig et al., 1985; Hoskig ad Wallis, 1987; Hoskig, 1990; Wag, 1997). It has bee show to have sigificat advatages whe used as a basis for regioalizatio procedures that will be discussed i Sectio 5 (Lettemaier et al., 1987; Stediger ad Lu, 1995; Hoskig ad Wallis, 1997). Bayesia procedures provide aother approach that is related to maximum likelihood estimatio. Bayesia iferece employs the likelihood fuctio to represet the iformatio i the data. That iformatio is augmeted with a prior distributio that describes what is kow about costraits o the parameters ad their likely values beyod the iformatio provided by the recorded data available at a site. The likelihood fuctio ad the prior probability desity fuctio are combied to obtai the probability desity fuctio that describes the posterior distributio of the parameters: f θ (θ x 1, x,, x ) f X (x 1, x,, x θ)ξ(θ) (7.70) The symbol meas proportioal to ad ξ(θ) is the probability desity fuctio for the prior distributio for θ (Kottegoda ad Rosso, 1997). Thus, except for a costat of proportioality, the probability desity fuctio describig the posterior distributio of the parameter vector θ is equal to the product of the likelihood fuctio f X (x 1, x,, x θ) ad the probability desity fuctio for the prior distributio ξ(θ) for θ. Advatages of the Bayesia approach are that it allows the explicit modellig of ucertaity i parameters (Stediger, 1997; Kuczera, 1999) ad provides a theoretically cosistet framework for itegratig systematic flow records with regioal ad other hydrological iformatio (Vices et al., 1975; Stediger, 1983; Kuczera, 1983). Martis ad Stediger (000) illustrate how a prior distributio ca be used to eforce realistic costraits upo a parameter as well as providig a descriptio of its likely values. I their case, use of a prior of the shape parameter κ of a geeralized extreme value (GEV) distributio (discussed i Sectio 3.6) allowed defiitio of geeralized maximum likelihood estimators that, over the κ-rage of iterest, performed substatially better tha maximum likelihood, momet, ad L-momet estimators. While Bayesia methods have bee available for decades, the computatioal challege posed by the solutio of Equatio 7.70 has bee a obstacle to their use. Solutios to Equatio 7.70 have bee available for special cases such as ormal data, ad biomial ad Poisso samples (Raiffa ad Schlaifer, 1961; Bejami ad Corell, 1970; Zeller, 1971). However, a ew ad very geeral set of Markov Chai Mote Carlo (MCMC) procedures (discussed i Sectio 7.) allows umerical computatio of the posterior distributios of parameters

15 18 Water Resources Systems Plaig ad Maagemet for a very broad class of models (Gilks et al., 1996). As a result, Bayesia methods are ow becomig much more popular ad are the stadard approach for may difficult problems that are ot easily addressed by traditioal methods (Gelma et al., 1995; Carli ad Louis, 000). The use of Mote Carlo Bayesia methods i flood frequecy aalysis, raifall ruoff modellig, ad evaluatio of evirometal pathoge cocetratios are illustrated by Wag (001), Bates ad Campbell (001) ad Craiiceau et al. (00), respectively. Fially, a simple method of fittig flood frequecy curves is to plot the ordered flood values o special probability paper ad the to draw a lie through the data (Gumbel, 1958). Eve today, that simple method is still attractive whe some of the smallest values are zero or uusually small, or have bee cesored as will be discussed i Sectio 4 (Kroll ad Stediger, 1996). Plottig the raked aual maximum series agaist a probability scale is always a excellet ad recommeded way to see what the data look like ad for determiig whether or ot a fitted curve is cosistet with the data (Stediger et al., 1993). Statisticias ad hydrologists have ivestigated which of these methods most accurately estimates the parameters themselves or the quatiles of the distributio. Oe also eeds to determie how accuracy should be measured. Some studies have used average squared deviatios, some have used average absolute weighted deviatios with differet weights o uder ad over-estimatio, ad some have used the squared deviatios of the log-quatile estimator (Slack et al., 1975; Kroll ad Stediger, 1996). I almost all cases, oe is also iterested i the bias of a estimator, which is the average value of the estimator mius the true value of the parameter or quatile beig estimated. Special estimators have bee developed to compute desig evets that o average are exceeded with the specified probability ad have the aticipated risk of beig exceeded (Beard, 1960, 1997; Rasmusse ad Rosbjerg, 1989, 1991a,b; Stediger, 1997; Rosbjerg ad Madse, 1998). 3.. Model Adequacy After estimatig the parameters of a distributio, some check of model adequacy should be made. Such checks vary from simple comparisos of the observatios with the fitted model (usig graphs or tables) to rigorous statistical tests. Some of the early ad simplest methods of parameter estimatio were graphical techiques. Although quatitative techiques are geerally more accurate ad precise for parameter estimatio, graphical presetatios are ivaluable for comparig the fitted distributio with the observatios for the detectio of systematic or uexplaied deviatios betwee the two. The observed data will plot as a straight lie o probability graph paper if the postulated distributio is the true distributio of the observatio. If probability graph paper does ot exist for the particular distributio of iterest, more geeral techiques ca be used. Let x (i) be the ith largest value i a set of observed values {x i } so that x (1) x () x (). The radom variable X (i) provides a reasoable estimate of the pth quatile x p of the true distributio of X for p i/( 1). I fact, whe oe cosiders the cumulative probability U i associated with the radom variable X (i), U i F X (X (i) ), ad if the observatios X (i) are idepedet, the the U i have a beta distributio (Gumbel, 1958) with probability desity fuctio: fu ( u )! ui 1 ( 1 u) i 0 u 1 i ( i 1)!( 1)! (7.71) This beta distributio has mea i E[ Ui] 1 ad variace i ( i 1) Var( Ui ) ( 1) ( ) (7.7a) (7.7b) A good graphical check of the adequacy of a fitted distributio G(x) is obtaied by plottig the observatios x (i) versus G 1 [i/( 1)] (Wilk ad Gaadesika, 1968). Eve if G(x) equalled to a exact degree the true X-distributio F X [x], the plotted poits would ot fall exactly o a 45 lie through the origi of the graph. This would oly occur if F X [x (i) ] exactly equalled i/( 1), ad therefore each x (i) exactly equalled F X 1 [i/( 1)]. A appreciatio for how far a idividual observatio x (i) ca be expected to deviate from G 1 [i/( 1)] ca be obtaied by plottig G 1 [u i (0.75) ] ad G 1 [u i (0.5) ], where u i (0.75) ad u i (0.5) are the upper ad lower quartiles of the distributio of U i obtaied from itegratig the probability

16 Cocepts i Probability, Statistics ad Stochastic Modellig 183 desity fuctio i Equatio The required icomplete beta fuctio is also available i may software packages, icludig Microsoft Excel. Stediger et al. (1993) show that u (1) ad (1 u () ) fall betwee 5/ ad 3( 1) with a probability of 90%, thus illustratig the great ucertaity associated with the cumulative probability of the smallest value ad the exceedace probability of the largest value i a sample. Figures 7.a ad 7.b illustrate the use of this quatile quatile plottig techique by displayig the results of fittig a ormal ad a logormal distributio to the aual maximum flows i Table 7. for the Magra River, Italy, at Calamazza for the years The observatios of X (i), give i Table 7., are plotted o the vertical axis agaist the quatiles G 1 [i/( 1)] o the horizotal axis. A probability plot is essetially a scatter plot of the sorted observatios X (i) versus some approximatio of their expected or aticipated value, represeted by G 1 (p i ), where, as suggested, p i i/( 1). The p i values are called plottig positios. A commo alterative to i/( 1) is (i 0.5)/, which results from a probabilistic iterpretatio of the empirical distributio of the data. May reasoable plottig positio formulas have bee proposed based upo the sese i which G 1 (p i ) should approximate X (i). The Weibull formula i/( 1) ad the Haze formula (i 0.5)/ bracket most of the reasoable choices. Popular formulas are summarized by Stediger et al. (1993), who also discuss the geeratio of probability plots for may distributios commoly employed i hydrology. Rigorous statistical tests are available for tryig to determie whether or ot it is reasoable to assume that a give set of observatios could have bee draw from a particular family of distributios. Although ot the most powerful of such tests, the Kolmogorov Smirov test provides bouds withi which every observatio should lie if the sample is actually draw from the assumed distributio. I particular, for G F X, the test specifies that E0057c E0057d observed values X observed values X (i) ad Kolmogorov-Smirov bouds (m 3 (i) ad Kolmogorov-Smirov bouds (m 3 /sec) /sec) upper 90% cofidece iterval for all poits lower 90% cofidece iterval for all poits quatiles of fitted ormal distributio G -1 [ i /(+1)] m 3 /sec) upper 90% cofidece iterval for all poits lower 90% cofidece iterval for all poits quatiles of fitted logormal distributio G -1 [i/(+1)] (m 3 /sec) 1 i 1 1 Pr G Cα X() i G Cα i 1 α i (7.73) Figure 7.. Plots of aual maximum discharges of Magra River, Italy, versus quatiles of fitted (a) ormal ad (b) logormal distributios. where C α is the critical value of the test at sigificace level α. Formulas for C α as a fuctio of are cotaied i Table 7.5 for three cases: (1) whe G is completely

17 184 Water Resources Systems Plaig ad Maagemet specified idepedet of the sample s values; () whe G is the ormal distributio ad the mea ad variace are estimated from the sample with x _ ad s X ; ad (3) whe G is the expoetial distributio ad the scale parameter is estimated as 1/(x _ ). Chowdhury et al. (1991) provide critical values for the Gumbel ad geeralized extreme value (GEV) distributios (Sectio 3.6) with kow shape parameter κ. For other distributios, the values obtaied from Table 7.5 may be used to costruct approximate simultaeous cofidece itervals for every X (i). Figures 7.a ad b cotai 90% cofidece itervals for the plotted poits costructed i this maer. For the ormal distributio, the critical value of C α equals /( / ), where correspods to α For 40, oe computes C α As ca be see i Figure 7.a, the aual maximum flows are ot cosistet with the hypothesis that they were draw from a ormal distributio; three of the observatios lie outside the simultaeous 90% cofidece itervals for all the poits. This demostrates a statistically sigificat lack of fit. The fitted ormal distributio uderestimates the quatiles correspodig to small ad large probabilities while overestimatig the quatiles i a itermediate rage. I Figure 7.b, deviatios betwee the fitted logormal distributio ad the observatios ca be attributed to the differeces betwee F X (x (i) ) ad i/( 1). Geerally, the poits are all ear the 45 lie through the origi, ad o major systematic deviatios are apparet. The Kolmogorov Smirov test coveietly provides bouds withi which every observatio o a probability plot should lie if the sample is actually draw from the assumed distributio, ad thus is useful for visually evaluatig the adequacy of a fitted distributio. However, it is ot the most powerful test available for estimatig which distributio a set of observatios is likely to have bee draw from. For that purpose, several other more aalytical tests are available (Fillibe, 1975; Hoskig, 1990; Chowdhury et al., 1991; Kottegoda ad Rosso, 1997). The Probability Plot Correlatio test is a popular ad powerful test of whether a sample has bee draw from a postulated distributio, though it is ofte weaker tha alterative tests at rejectig thi-tailed alteratives (Fillibe, 1975; Fill ad Stediger, 1995). A test with greater power has a greater probability of correctly determiig that a sample is ot from the postulated distributio. The Probability Plot Correlatio Coefficiet test employs the correlatio r betwee the ordered observatios x (i) ad the correspodig fitted quatiles w i G 1 (p i ), determied by plottig positios p i for each x (i). Values of r ear 1.0 suggest that the observatios could have bee draw from the fitted distributio: r measures the liearity of the probability plot providig a quatitative assessmet of fit. If x _ deotes the average value of the observatios ad w _ deotes the average value of the fitted quatiles, the ( x() i x)( wi w) r (7.74) ( x() i x) ( wi w) 0. 5 ( ) Table 7.5. Critical values of Kolmogorov Smirov statistic as a fuctio of sample size (after Stephes, 1974). sigificace level α E01101e F x completely specified: C α ( / ) F x ormal with mea ad variace estimated as x ad s x Cα ( / ) F x expoetial with scale parameter b estimated as 1 / (x) ( Cα + 0. / ) ( / ) values of C α are calculated as follows: for case with α = 0.10, C α = / ( / )

18 Cocepts i Probability, Statistics ad Stochastic Modellig 185 Table 7.6 provides critical values for r for the ormal distributio, or the logarithms of logormal variates, based upo the Blom plottig positio that has p i (i 3/8)/( 1/4). Values for the Gumbel distributio are reproduced i Table 7.7 for use with the Grigorte plottig positio p i (i 0.44)/( 0.1). The table also applies to logarithms of Weibull variates (Stediger et al., 1993). Other tables are available for the GEV (Chowdhury et al., 1991), the Pearso type 3 (Vogel ad McMarti, 1991), ad expoetial ad other distributios (D Agostio ad Stephes, 1986). Recetly developed L-momet ratios appear to provide goodess-of-fit tests that are superior to both the Kolmogorov Smirov ad the Probability Plot Correlatio test (Hoskig, 1990; Chowdhury et al., 1991; Fill ad Stediger, 1995). For ormal data, the L-skewess estimator τˆ3 (or t 3 ) would have mea zero ad Var τˆ3 ( /)/, allowig costructio of a powerful test of ormality agaist skewed alteratives usig the ormally distributed statistic Z t3 / ( / )/ (7.75) , sigificace level Table 7.6. Lower critical values of the probability plot correlatio test statistic for the ormal distributio usig p i (i 3/8)/( 1/4) (Vogel, 1987). E01101f with a reject regio Z z α/. Chowdhury et al. (1991) derive the samplig variace of the L-CV ad L-skewess estimators τˆ ad τˆ3 as a fuctio of κ for the GEV distributio. These allow costructio of a test of whether a particular data set is cosistet with a GEV distributio with a regioally estimated value of κ, or a regioal κ ad a regioal coefficiet of variatio, CV. Fill ad Stediger (1995) show that the τˆ3 L-skewess estimator provides a test for the Gumbel versus a geeral GEV distributio usig the ormally distributed statistic sigificace level E01101g Z (τˆ3 0.17)/ ( / )/ (7.76) with a reject regio Z z α/. The literature is full of goodess-of-fit tests. Experiece idicates that amog the better tests there is ofte ot a great deal of differece (D Agostio ad Stephes, 1986). Geeratio of a probability plot is most ofte a good idea because it allows the modeller to see what the data look like ad where problems occur. The Kolmogorov Smirov test helps the eye , Table 7.7. Lower critical values of the probability plot correlatio test statistic for the Gumbel distributio usig p i (i 0.44)/( 0.1) (Vogel, 1987).

19 186 Water Resources Systems Plaig ad Maagemet iterpret a probability plot by addig bouds to a graph, illustratig the magitude of deviatios from a straight lie that are cosistet with expected variability. Oe ca also use quatiles of a beta distributio to illustrate the possible error i idividual plottig positios, particularly at the extremes where that ucertaity is largest. The probability plot correlatio test is a popular ad powerful goodess-of-fit statistic. Goodess-of-fit tests based upo sample estimators of the L-skewess τˆ3 for the ormal ad Gumbel distributio provide simple ad useful tests that are ot based o a probability plot Normal ad Logormal Distributios The ormal distributio ad its logarithmic trasformatio, the logormal distributio, are arguably the most widely used distributios i sciece ad egieerig. The probability desity fuctio of a ormal radom variable is fx( x) 1 1 exp ( x µ ) πσ σ for X (7.77) where µ ad σ are equivalet to µ X ad σ X, the mea ad variace of X. Iterestigly, the maximum likelihood estimators of µ ad σ are almost idetical to the momet estimates x _ ad s X. The ormal distributio is symmetric about its mea µ X ad admits values from to. Thus, it is ot always satisfactory for modellig physical pheomea such as streamflows or pollutat cocetratios, which are ecessarily o-egative ad have skewed distributios. A frequetly used model for skewed distributios is the logormal distributio. A radom variable X has a logormal distributio if the atural logarithm of X, l(x), has a ormal distributio. If X is logormally distributed, the by defiitio l(x) is ormally distributed, so that the desity fuctio of X is for x 0 ad µ l(η). Here η is the media of the X-distributio. A logormal radom variable takes o values i the rage [0, ]. The parameter µ determies the scale of the X-distributio whereas σ determies the shape of the distributio. The mea ad variace of the logormal distributio are give i Equatio Figure 7.3 illustrates the various shapes that the logormal probability desity fuctio ca assume. It is highly skewed with a thick right had tail for σ 1, ad approaches a symmetric ormal distributio as σ 0. The desity fuctio always has a value of zero at x 0. The coefficiet of variatio ad skew are: CV X [exp(σ ) 1] 1/ (7.79) γ X 3CV X CV X 3 (7.80) The maximum likelihood estimates of µ ad σ are give i Equatio 7.63 ad the momet estimates i Equatio For reasoable-sized samples, the maximum likelihood estimates geerally perform as well or better tha the momet estimates (Stediger, 1980). The data i Table 7. were used to calculate the parameters of the logormal distributio that would describe these flood flows. The results are reported i Equatio The two-parameter maximum likelihood ad method of momets estimators idetify parameter estimates for which the distributio skewess coefficiets f(x) σ = 0. σ = 0.5 σ = dl ( x) fx( x) exp [ 1( x) µ ] πσ σ dx 1 1 exp [ l( x / η)] x πσ σ (7.78) E0057e x Figure 7.3. Logormal probability desity fuctios with various stadard deviatios σ.

20 Cocepts i Probability, Statistics ad Stochastic Modellig 187 are.06 ad 1.7, which is substatially greater tha the sample skew of A useful geeralizatio of the two-parameter logormal distributio is the shifted logormal or three-parameter logormal distributio obtaied whe l(x τ) is described by a ormal distributio, ad X τ. Theoretically, τ should be positive if, for physical reasos, X must be positive; practically, egative values of τ ca be allowed whe the resultig probability of egative values of X is sufficietly small. Ufortuately, maximum likelihood estimates of the parameters µ, σ, ad τ are poorly behaved because of irregularities i the likelihood fuctio (Giesbrecht ad Kempthore, 1976). The method of momets does fairly well whe the skew of the fitted distributio is reasoably small. A method that does almost as well as the momet method for low-skew distributios, ad much better for highly skewed distributios, estimates τ by: x τˆ () x( ) xˆ. x x xˆ (7.81) provided that x (1) x () xˆ0.50 0, where x (1) ad x () are the smallest ad largest observatios ad xˆ0.50 is the sample media (Stediger, 1980; Hoshi et al., 1984). If x (1) x () xˆ0.50 0, the the sample teds to be egatively skewed ad a three-parameter logormal distributio with a lower boud caot be fit with this method. Good estimates of µ ad σ to go with τˆ i Equatio 7.81 are (Stediger, 1980): µˆ l () 1 ( ) x τˆ s /( x τˆ ) 1 X σˆ sx l 1 ( x τˆ ) (7.8) For the data i Table 7., Equatios 7.81 ad 7.8 yield the hybrid momet-of-momets estimates of µˆ 7.606, σˆ (0.3659) ad τˆ for the threeparameter logormal distributio. This distributio has a coefficiet of skewess of 1.19, which is more cosistet with the sample skewess estimator tha were the values obtaied whe a twoparameter logoral distributio was fit to the data. Alteratively, oe ca estimate µ ad σ by the sample mea ad variace of l(x τˆ) which yields the hybrid maximum likelihood estimates µˆ 7.605, σˆ (0.3751) ad agai τˆ The two sets of estimates are surprisigly close i this istace. I this secod case, the fitted distributio has a coefficiet of skewess of 1.. Natural logarithms have bee used here. Oe could have just as well use base 10 commo logarithms to estimate the parameters; however, i that case the relatioships betwee the log-space parameters ad the real-space momets chage slightly (Stediger et al., 1993, Equatio ) Gamma Distributios The gamma distributio has log bee used to model may atural pheomea, icludig daily, mothly ad aual streamflows as well as flood flows (Bobée ad Ashkar, 1991). For a gamma radom variable X, x fx( x) β x x ( ) ( ) α β α 1 β e β 0 Γ α µ X β σ α X β γ X CVX for β 0 α (7.83) The gamma fuctio, Γ(α), for iteger α is (α 1)!. The parameter α 0 determies the shape of the distributio; β is the scale parameter. Figure 7.4 illustrates the differet shapes that the probability desity fuctio for a gamma variable ca assume. As α, the gamma distributio approaches the symmetric ormal distributio, whereas for 0 α 1, the distributio has a highly asymmetric J-shaped probability desity fuctio whose value goes to ifiity as x approaches zero. The gamma distributio arises aturally i may problems i statistics ad hydrology. It also has a very reasoable shape for such o-egative radom variables as raifall ad streamflow. Ufortuately, its cumulative distributio fuctio is ot available i closed form, except for iteger α, though it is available i may software packages icludig Microsoft Excel. The gamma

NPTEL STRUCTURAL RELIABILITY

NPTEL STRUCTURAL RELIABILITY NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

AQA STATISTICS 1 REVISION NOTES

AQA STATISTICS 1 REVISION NOTES AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

This is arithmetic average of the x values and is usually referred to simply as the mean.

This is arithmetic average of the x values and is usually referred to simply as the mean. prepared by Dr. Adre Lehre, Dept. of Geology, Humboldt State Uiversity http://www.humboldt.edu/~geodept/geology51/51_hadouts/statistical_aalysis.pdf STATISTICAL ANALYSIS OF HYDROLOGIC DATA This hadout

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Descriptive statistics deals with the description or simple analysis of population or sample data.

Descriptive statistics deals with the description or simple analysis of population or sample data. Descriptive statistics Some basic cocepts A populatio is a fiite or ifiite collectio of idividuals or objects. Ofte it is impossible or impractical to get data o all the members of the populatio ad a small

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

BASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1.

BASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1. BASIC STATISTICS 1.) Basic Cocepts: Statistics: is a sciece that aalyzes iformatio variables (for istace, populatio age, height of a basketball team, the temperatures of summer moths, etc.) ad attempts

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

Standard Errors and Confidence Intervals

Standard Errors and Confidence Intervals Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5-year-old boys. If we assume

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Exploratory Data Analysis

Exploratory Data Analysis 1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004

TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004 TIEE Teachig Issues ad Experimets i Ecology - Volume 1, Jauary 2004 EXPERIMENTS Evirometal Correlates of Leaf Stomata Desity Bruce W. Grat ad Itzick Vatick Biology, Wideer Uiversity, Chester PA, 19013

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Institute of Actuaries of India Subject CT1 Financial Mathematics

Institute of Actuaries of India Subject CT1 Financial Mathematics Istitute of Actuaries of Idia Subject CT1 Fiacial Mathematics For 2014 Examiatios Subject CT1 Fiacial Mathematics Core Techical Aim The aim of the Fiacial Mathematics subject is to provide a groudig i

More information

3.1 Measures of Central Tendency. Introduction 5/28/2013. Data Description. Outline. Objectives. Objectives. Traditional Statistics Average

3.1 Measures of Central Tendency. Introduction 5/28/2013. Data Description. Outline. Objectives. Objectives. Traditional Statistics Average 5/8/013 C H 3A P T E R Outlie 3 1 Measures of Cetral Tedecy 3 Measures of Variatio 3 3 3 Measuresof Positio 3 4 Exploratory Data Aalysis Copyright 013 The McGraw Hill Compaies, Ic. C H 3A P T E R Objectives

More information

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book) MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will:

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will: Strad: Number Specific Learig Outcomes It is expected that studets will: 7.N.1. Determie ad explai why a umber is divisible by 2, 3, 4, 5, 6, 8, 9, or 10, ad why a umber caot be divided by 0. [C, R] [C]

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by

More information

Statistical Methods. Chapter 1: Overview and Descriptive Statistics

Statistical Methods. Chapter 1: Overview and Descriptive Statistics Geeral Itroductio Statistical Methods Chapter 1: Overview ad Descriptive Statistics Statistics studies data, populatio, ad samples. Descriptive Statistics vs Iferetial Statistics. Descriptive Statistics

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Quadrat Sampling in Population Ecology

Quadrat Sampling in Population Ecology Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of

More information

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations 44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

sum of all values n x = the number of values = i=1 x = n n. When finding the mean of a frequency distribution the mean is given by

sum of all values n x = the number of values = i=1 x = n n. When finding the mean of a frequency distribution the mean is given by Statistics Module Revisio Sheet The S exam is hour 30 miutes log ad is i two sectios Sectio A 3 marks 5 questios worth o more tha 8 marks each Sectio B 3 marks questios worth about 8 marks each You are

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Chapter XIV: Fundamentals of Probability and Statistics *

Chapter XIV: Fundamentals of Probability and Statistics * Objectives Chapter XIV: Fudametals o Probability ad Statistics * Preset udametal cocepts o probability ad statistics Review measures o cetral tedecy ad dispersio Aalyze methods ad applicatios o descriptive

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Present Values, Investment Returns and Discount Rates

Present Values, Investment Returns and Discount Rates Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Joint Probability Distributions and Random Samples

Joint Probability Distributions and Random Samples STAT5 Sprig 204 Lecture Notes Chapter 5 February, 204 Joit Probability Distributios ad Radom Samples 5. Joitly Distributed Radom Variables Chapter Overview Joitly distributed rv Joit mass fuctio, margial

More information

x : X bar Mean (i.e. Average) of a sample

x : X bar Mean (i.e. Average) of a sample A quick referece for symbols ad formulas covered i COGS14: MEAN OF SAMPLE: x = x i x : X bar Mea (i.e. Average) of a sample x i : X sub i This stads for each idividual value you have i your sample. For

More information

Chapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing

Chapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing Chapter 10 Hypothesis Tests Regardig a Parameter A secod type of statistical iferece is hypothesis testig. Here, rather tha use either a poit (or iterval) estimate from a simple radom sample to approximate

More information

Stat 104 Lecture 2. Variables and their distributions. DJIA: monthly % change, 2000 to Finding the center of a distribution. Median.

Stat 104 Lecture 2. Variables and their distributions. DJIA: monthly % change, 2000 to Finding the center of a distribution. Median. Stat 04 Lecture Statistics 04 Lecture (IPS. &.) Outlie for today Variables ad their distributios Fidig the ceter Measurig the spread Effects of a liear trasformatio Variables ad their distributios Variable:

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Descriptive Statistics Summary Tables

Descriptive Statistics Summary Tables Chapter 201 Descriptive Statistics Summary Tables Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical tables of meas, couts,

More information

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ

More information

Measures of Central Tendency

Measures of Central Tendency Measures of Cetral Tedecy A studet s grade will be determied by exam grades ( each exam couts twice ad there are three exams, HW average (couts oce, fial exam ( couts three times. Fid the average if the

More information

HCL Dynamic Spiking Protocol

HCL Dynamic Spiking Protocol ELI LILLY AND COMPANY TIPPECANOE LABORATORIES LAFAYETTE, IN Revisio 2.0 TABLE OF CONTENTS REVISION HISTORY... 2. REVISION.0... 2.2 REVISION 2.0... 2 2 OVERVIEW... 3 3 DEFINITIONS... 5 4 EQUIPMENT... 7

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean BST 140.65 Hypothesis Testig Review otes 1 Hypothesis testig for a sigle mea 1. The ull, or status quo, hypothesis is labeled H 0, the alterative H a or H 1 or H.... A type I error occurs whe we falsely

More information

Module 4: Mathematical Induction

Module 4: Mathematical Induction Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate

More information

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as: A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

Robust and Resistant Regression

Robust and Resistant Regression Chapter 13 Robust ad Resistat Regressio Whe the errors are ormal, least squares regressio is clearly best but whe the errors are oormal, other methods may be cosidered. A particular cocer is log-tailed

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Example Consider the following set of data, showing the number of times a sample of 5 students check their per day:

Example Consider the following set of data, showing the number of times a sample of 5 students check their  per day: Sectio 82: Measures of cetral tedecy Whe thikig about questios such as: how may calories do I eat per day? or how much time do I sped talkig per day?, we quickly realize that the aswer will vary from day

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information

A Brief Study about Nonparametric Adherence Tests

A Brief Study about Nonparametric Adherence Tests A Brief Study about Noparametric Adherece Tests Viicius R. Domigues, Lua C. S. M. Ozelim Abstract The statistical study has become idispesable for various fields of kowledge. Not ay differet, i Geotechics

More information