7. Concepts in Probability, Statistics and Stochastic Modelling
|
|
|
- Hilda Sutton
- 10 years ago
- Views:
Transcription
1 7. Cocepts i Probability, Statistics ad Stochastic Modellig 1. Itroductio 169. Probability Cocepts ad Methods Radom Variables ad Distributios Expectatio Quatiles, Momets ad Their Estimators L-Momets ad Their Estimators Distributios of Radom Evets Parameter Estimatio Model Adequacy Normal ad Logormal Distributios Gamma Distributios Log-Pearso Type 3 Distributio Gumbel ad GEV Distributios L-Momet Diagrams Aalysis of Cesored Data Regioalizatio ad Idex-Flood Method Partial Duratio Series Stochastic Processes ad Time Series Describig Stochastic Processes Markov Processes ad Markov Chais Properties of Time-Series Statistics Sythetic Streamflow Geeratio Itroductio Streamflow Geeratio Models A Simple Autoregressive Model Reproducig the Margial Distributio Multivariate Models Multi-Seaso, Multi-Site Models Disaggregatio Models Aggregatio Models Stochastic Simulatio Geeratig Radom Variables River Basi Simulatio The Simulatio Model Simulatio of the Basi Iterpretig Simulatio Output Coclusios Refereces 3
2 169 7 Cocepts i Probability, Statistics ad Stochastic Modellig Evets that caot be predicted precisely are ofte called radom. May if ot most of the iputs to, ad processes that occur i, water resources systems are to some extet radom. Hece, so too are the outputs or predicted impacts, ad eve people s reactios to those outputs or impacts. To igore this radomess or ucertaity is to igore reality. This chapter itroduces some of the commoly used tools for dealig with ucertaity i water resources plaig ad maagemet. Subsequet chapters illustrate how these tools are used i various types of optimizatio, simulatio ad statistical models for impact predictio ad evaluatio. 1. Itroductio Ucertaity is always preset whe plaig, developig, maagig ad operatig water resources systems. It arises because may factors that affect the performace of water resources systems are ot ad caot be kow with certaity whe a system is plaed, desiged, built, maaged ad operated. The success ad performace of each compoet of a system ofte depeds o future meteorological, demographic, ecoomic, social, techical, ad political coditios, all of which may ifluece future beefits, costs, evirometal impacts, ad social acceptability. Ucertaity also arises due to the stochastic ature of meteorological processes such as evaporatio, raifall ad temperature. Similarly, future populatios of tows ad cities, per capita water-usage rates, irrigatio patters ad priorities for water uses, all of which affect water demad, are ever kow with certaity. There are may ways to deal with ucertaity. Oe, ad perhaps the simplest, approach is to replace each ucertai quatity either by its average (i.e., its mea or expected value), its media, or by some critical (e.g., worst-case ) value, ad the proceed with a determiistic approach. Use of expected or media values of ucertai quatities may be adequate if the ucertaity or variatio i a quatity is reasoably small ad does ot critically affect the performace of the system. If expected or media values of ucertai parameters or variables are used i a determiistic model, the plaer ca the assess the importace of ucertaity by meas of sesitivity aalysis, as is discussed later i this ad the two subsequet chapters. Replacemet of ucertai quatities by either expected, media or worst-case values ca grossly affect the evaluatio of project performace whe importat parameters are highly variable. To illustrate these issues, cosider the evaluatio of the recreatio potetial of a reservoir. Table 7.1 shows that the elevatio of the water surface varies over time depedig o the iflow ad demad for water. The table idicates the pool levels ad their associated probabilities as well as the expected use of the recreatio facility with differet pool levels. The average pool level L is simply the sum of each possible pool level times its probability, or L 10(0.10) 0(0.5) 30(0.30) 40(0.5) 50(0.10) 30 (7.1) This pool level correspods to 100 visitor-days per day: VD(L ) 100 visitor-days per day (7.) A worst-case aalysis might select a pool level of te as a critical value, yieldig a estimate of system performace equal to 100 visitor-days per day: VD(L low ) VD(10) 5 visitor-days per day (7.3)
3 170 Water Resources Systems Plaig ad Maagemet possible pool levels probability of each level recreatio potetial i visitor-days per day for reservoir with differet pool levels Table 7.1. Data for determiig reservoir recreatio potetial. Neither of these values is a good approximatio of the average visitatio rate, that is VD 0.10 VD(10) 0.5 VD(0) 0.30 VD(30) 0.5 VD(40) 0.10 VD(50) 0.10(5) 0.5(75) 0.30(100) 0.5(80) 0.10(70) (7.4) 78.5 visitor-days per day Clearly, the average visitatio rate, VD 78.5, the visitatio rate correspodig to the average pool level VD(L ) 100, ad the worst-case assessmet VD(L low ) 5, are very differet. Usig oly average values i a complex model ca produce a poor represetatio of both the average performace ad the possible performace rage. Whe importat quatities are ucertai, a comprehesive aalysis requires a evaluatio of both the expected performace of a project ad the risk ad possible magitude of project failures i a physical, ecoomic, ecological ad/or social sese. This chapter reviews may of the methods of probability ad statistics that are useful i water resources plaig ad maagemet. Sectio is a codesed summary of the importat cocepts ad methods of probability ad statistics. These cocepts are applied i this ad subsequet chapters of this book. Sectio 3 presets several probability distributios that are ofte used to model or describe the distributio of ucertai quatities. The sectio also discusses methods for fittig these distributios usig historical iformatio, ad methods of assessig whether the distributios are E01101a adequate represetatios of the data. Sectios 4, 5 ad 6 expad upo the use of these mathematical models, ad discuss alterative parameter estimatio methods. Sectio 7 presets the basic ideas ad cocepts of the stochastic processes or time series. These are used to model streamflows, raifall, temperature or other pheomea whose values chage with time. The sectio cotais a descriptio of Markov chais, a special type of stochastic process used i may stochastic optimizatio ad simulatio models. Sectio 8 illustrates how sythetic flows ad other time-series iputs ca be geerated for stochastic simulatios. Stochastic simulatio is itroduced with a example i Sectio 9. May topics receive oly brief treatmet i this itroductory chapter. Additioal iformatio ca be foud i applied statistical texts or book chapters such as Bejami ad Corell (1970), Haa (1977), Kite (1988), Stediger et al. (1993), Kottegoda ad Rosso (1997), ad Ayyub ad McCue (00).. Probability Cocepts ad Methods This sectio itroduces the basic cocepts ad defiitios used i aalyses ivolvig probability ad statistics. These cocepts are used throughout this chapter ad later chapters i the book..1. Radom Variables ad Distributios The basic cocept i probability theory is that of the radom variable. By defiitio, the value of a radom variable caot be predicted with certaity. It depeds, at least i part, o the outcome of a chace evet. Examples are: (1) the umber of years util the flood stage of a river washes away a small bridge; () the umber of times durig a reservoir s life that the level of the pool will drop below a specified level; (3) the raifall depth ext moth; ad (4) ext year s maximum flow at a gauge site o a uregulated stream. The values of all of these radom evets or variables are ot kowable before the evet has occurred. Probability ca be used to describe the likelihood that these radom variables will equal specific values or be withi a give rage of specific values.
4 Cocepts i Probability, Statistics ad Stochastic Modellig 171 The first two examples illustrate discrete radom variables, radom variables that take o values that are discrete (such as positive itegers). The secod two examples illustrate cotiuous radom variables. Cotiuous radom variables take o ay values withi a specified rage of values. A property of all cotiuous radom variables is that the probability that the value of ay of those radom variables will equal some specific umber ay specific umber is always zero. For example, the probability that the total raifall depth i a moth will be exactly 5.0 cm is zero, while the probability that the total raifall will lie betwee 4 ad 6 cm could be ozero. Some radom variables are combiatios of cotiuous ad discrete radom variables. Let X deote a radom variable ad x a possible value of that radom variable X. Radom variables are geerally deoted by capital letters, ad particular values they take o by lowercase letters. For ay real-valued radom variable X, its cumulative distributio fuctio F X (x), ofte deoted as just the cdf, equals the probability that the value of X is less tha or equal to a specific value or threshold x: F X (x) Pr[X x] (7.5) This cumulative distributio fuctio F X (x) is a odecreasig fuctio of x because Pr[X x] Pr[X x δ] for δ 0 (7.6) I additio, lim F ( x) 1 X x ad lim F ( x) 0 X x (7.7) (7.8) The first limit equals 1 because the probability that X takes o some value less tha ifiity must be uity; the secod limit is zero because the probability that X takes o o value must be zero. The left half of Figure 7.1 illustrates the cumulative distributio fuctio (upper) ad its derivative, the probability desity fuctio, f X (x), (lower) of a cotiuous radom variable X. If X is a real-valued discrete radom variable that takes o specific values x 1, x,, the the probability mass fuctio p X (x i ) is the probability X takes o the value x i. p X (x i ) Pr[X x i ] (7.9) The value of the cumulative distributio fuctio F X (x) for a discrete radom variable is the sum of the probabilities of all x i that are less tha or equal to x. FX( x) px( xi) (7.10) The right half of Figure 7.1 illustrates the cumulative distributio fuctio (upper) ad the probability mass fuctio p X (x i ) (lower) of a discrete radom variable. The probability desity fuctio f X (x) (lower left plot i Figure 7.1) for a cotiuous radom variable X is the aalogue of the probability mass fuctio (lower right plot i Figure 7.1) of a discrete radom variable X. The probability desity fuctio, ofte called the pdf, is the derivative of the cumulative distributio fuctio so that: dfx ( x) fx ( x) 0 (7.11) dx It is ecessary to have (7.1) Equatio 7.1 idicates that the area uder the probability desity fuctio is 1. If a ad b are ay two costats, the cumulative distributio fuctio or the desity fuctio may be used to determie the probability that X is greater tha a ad less tha or equal to b where Pr[ a X b] F ( b) F ( a) f ( x) dx (7.13) The area uder a probability desity fuctio specifies the relative frequecy with which the value of a cotiuous radom variable falls withi ay specified rage of values, that is, ay iterval alog the horizotal axis. Life is seldomly so simple that oly a sigle quatity is ucertai. Thus, the joit probability distributio of two or more radom variables ca also be defied. If X ad Y are two cotiuous real-valued radom variables, their joit cumulative distributio fuctio is: F ( x, y) Pr[ X x ad Y y] XY x x fx ( x) 1 i x If two radom variables are discrete, the y X X X a f (, u v) dudv XY F ( x, y) p ( x, y ) XY XY i i xi x yi y b (7.14) (7.15)
5 17 Water Resources Systems Plaig ad Maagemet 1.0 a 1.0 b E0057a Figure 7.1. Cumulative distributio ad probability desity or mass fuctios of radom variables: (a) cotiuous distributios; (b) discrete distributios. F X (x) possible values of a radom variable X x F X (x) possible values of a radom variable X x F X (x) F X (x) possible values of a radom variable X x possible values of a radom variable X x where the joit probability mass fuctio is: p XY (x i, y i ) Pr[X x i ad Y y i ] (7.16) If X ad Y are two radom variables, ad the distributio of X is ot iflueced by the value take by Y, ad vice versa, the the two radom variables are said to be idepedet. For two idepedet radom variables X ad Y, the joit probability that the radom variable X will be betwee values a ad b ad that the radom variable Y will be betwee values c ad d is simply the product of those separate probabilities. Pr[a X b ad c Y d] Pr[a X b] Pr[c Y d] (7.17) This applies for ay values a, b, c, ad d. As a result, F XY (x, y) F X (x)f Y (y) (7.18) which implies for cotiuous radom variables that f XY (x, y) f X (x)f Y (y) (7.19) ad for discrete radom variables that p XY (x, y) p X (x)p Y (y) (7.0) Other useful cocepts are those of the margial ad coditioal distributios. If X ad Y are two radom variables whose joit cumulative distributio fuctio F XY (x, y) has bee specified, the F X (x), the margial cumulative distributio of X, is just the cumulative distributio of X igorig Y. The margial cumulative distributio fuctio of X equals F X (x) Pr[X x] lim F ( x, y) (7.1) where the limit is equivalet to lettig Y take o ay value. If X ad Y are cotiuous radom variables, the margial desity of X ca be computed from f ( x) f ( x, y) dy X XY y XY (7.) The coditioal cumulative distributio fuctio is the cumulative distributio fuctio for X give that Y has take a particular value y. Thus the value of Y may have bee observed ad oe is iterested i the resultig coditioal distributio for the so far uobserved value of X. The coditioal cumulative distributio fuctio for cotiuous radom variables is give by
6 Cocepts i Probability, Statistics ad Stochastic Modellig 173 fxy (, s y) ds FX Y( x y) Pr[ X x Y y] (7.3) fy ( y) where the coditioal desity fuctio is fxy ( x, y) fx Y( x y) (7.4) fy ( y) For discrete radom variables, the probability of observig X x, give that Y y equals pxy ( x, y) px Y( x y) (7.5) py ( y) These results ca be exteded to more tha two radom variables. Kottegoda ad Rosso (1997) provide more detail... Expectatio Kowledge of the probability desity fuctio of a cotiuous radom variable, or of the probability mass fuctio of a discrete radom variable, allows oe to calculate the expected value of ay fuctio of the radom variable. Such a expectatio may represet the average raifall depth, average temperature, average demad shortfall or expected ecoomic beefits from system operatio. If g is a real-valued fuctio of a cotiuous radom variable X, the expected value of g(x) is: E[ gx ( )] gxf ( ) ( x) dx whereas for a discrete radom variable E[ gx ( )] gx ( ) p ( x) (7.6) (7.7) The expectatio operator,e[ ], has several importat properties. I particular, the expectatio of a liear fuctio of X is a liear fuctio of the expectatio of X. Thus, if a ad b are two o-radom costats, E[a bx] a be[x] (7.8) The expectatio of a fuctio of two radom variables is give by E[( gxy, )] gxyf (, ) (, xy) dxdy or i XY E[( gxy, )] gx (, y) p ( x, y ) i i X i j X i i x XY i i (7.9) If X ad Y are idepedet, the expectatio of the product of a fuctio g( ) of X ad a fuctio h( ) of Y is the product of the expectatios: E[g(X) h(y)] E[g(X)] E[h(Y)] (7.30) This follows from substitutio of Equatios 7.19 ad 7.0 ito Equatio Quatiles, Momets ad Their Estimators While the cumulative distributio fuctio provides a complete specificatio of the properties of a radom variable, it is useful to use simpler ad more easily uderstood measures of the cetral tedecy ad rage of values that a radom variable may assume. Perhaps the simplest approach to describig the distributio of a radom variable is to report the value of several quatiles. The pth quatile of a radom variable X is the smallest value x p such that X has a probability p of assumig a value equal to or less tha x p : Pr[X x p ] p Pr[X x p ] (7.31) Equatio 7.31 is writte to isist if at some value x p, the cumulative probability fuctio jumps from less tha p to more tha p, the that value x p will be defied as the pth quatile eve though F X (x p ) p. If X is a cotiuous radom variable, the i the regio where f X (x) 0, the quatiles are uiquely defied ad are obtaied by solutio of F X (x p ) p (7.3) Frequetly reported quatiles are the media x 0.50 ad the lower ad upper quartiles x 0.5 ad x The media describes the locatio or cetral tedecy of the distributio of X because the radom variable is, i the cotiuous case, equally likely to be above as below that value. The iterquartile rage [x 0.5, x 0.75 ] provides a easily uderstood descriptio of the rage of values that the radom variable might assume. The pth quatile is also the 100 p percetile. I a give applicatio particularly whe safety is of cocer it may be appropriate to use other quatiles. I floodplai maagemet ad the desig of flood cotrol structures, the 100-year flood x 0.99 is a commoly selected desig value. I water quality maagemet, a river s miimum seve-day-average low flow expected oce i te years is commoly used i the Uited States as the
7 174 Water Resources Systems Plaig ad Maagemet critical plaig value: Here the oe-i-te year value is the 10 th percetile of the distributio of the aual miima of the seve-day average flows. The atural sample estimate of the media x 0.50 is the media of the sample. I a sample of size where x (1) x () x () are the observatios ordered by magitude, ad for a o-egative iteger k such that k (eve) or k 1 (odd), the sample estimate of the media is x for k 1 ( k 1 ) xˆ x( k) x( k 1 ) for k (7.33) Sample estimates of other quatiles may be obtaied by usig x (i) as a estimate of x q for q i/( 1) ad the iterpolatig betwee observatios to obtai xˆp for the desired p. This oly works for 1/( 1) p /( 1) ad ca yield rather poor estimates of x p whe ( 1)p is ear either 1 or. A alterative approach is to fit a reasoable distributio fuctio to the observatios, as discussed i Sectio 3, ad the estimate x p usig Equatio 7.3, where F X (x) is the fitted distributio. Aother simple ad commo approach to describig a distributio s cetre, spread ad shape is by reportig the momets of a distributio. The first momet about the origi is the mea of X ad is give by µ X E X [ X] xf ( x) dx (7.34) Momets other tha the first are ormally measured about the mea. The secod momet measured about the mea is the variace, deoted Var(X) or σ X, where: σ X Var( X) E[( X µ X) ] (7.35) The stadard deviatio σ X is the square root of the variace. While the mea µ X is a measure of the cetral value of X, the stadard deviatio σ X is a measure of the spread of the distributio of X about µ X. Aother measure of the variability i X is the coefficiet of variatio, X CV X σ (7.36) µ X The coefficiet of variatio expresses the stadard deviatio as a proportio of the mea. It is useful for comparig the relative variability of the flow i rivers of differet sizes, or of raifall variability i differet regios whe the radom variable is strictly positive. The third momet about the mea, deoted λ X, measures the asymmetry, or skewess, of the distributio: λ X E[(X µ X ) 3 ] (7.37) Typically, the dimesioless coefficiet of skewess γ X is reported rather tha the third momet λ X. The coefficiet of skewess is the third momet rescaled by the cube of the stadard deviatio so as to be dimesioless ad hece uaffected by the scale of the radom variable: γ (7.38) Streamflows ad other atural pheomea that are ecessarily o-egative ofte have distributios with positive skew coefficiets, reflectig the asymmetric shape of their distributios. Whe the distributio of a radom variable is ot kow, but a set of observatios {x 1,,x } is available, the momets of the ukow distributio of X ca be estimated based o the sample values usig the followig equatios. The sample estimate of the mea: X The sample estimate of the variace: σˆx λ σ X X 3 X i 1 X / 1 SX Xi X ( ) ( 1) The sample estimate of skewess: λˆ ( X X i X) 3 ( 1)( ) (7.39a) (7.39b) (7.39c) The sample estimate of the coefficiet of variatio: CV ˆ X SX/ X (7.39d) The sample estimate of the coefficiet of skewess: γˆx λˆx /S X 3 i i 1 i 1 (7.39e) The sample estimate of the mea ad variace are ofte deoted as x _ ad s x where the lower case letters are used whe referrig to a specific sample. All of these
8 Cocepts i Probability, Statistics ad Stochastic Modellig 175 sample estimators provide oly estimates of actual or true values. Uless the sample size is very large, the differece betwee the estimators ad the true values of µ X, σ X, λx, CVX, ad γx may be large. I may ways, the field of statistics is about the precisio of estimators of differet quatities. Oe wats to kow how well the mea of twety aual raifall depths describes the true expected aual raifall depth, or how large the differece betwee the estimated 100-year flood ad the true 100-year flood is likely to be. As a example of the calculatio of momets, cosider the flood data i Table 7.. These data have the followig sample momets: _ x s X CV X 0.55 γˆx 0.71 As oe ca see, the data are positively skewed ad have a relatively large coefficiet of variace. Whe discussig the accuracy of sample estimates, two quatities are ofte cosidered, bias ad variace. A estimator θˆ of a kow or ukow quatity θ is a fuctio of the observed values of the radom variable X, say i differet time periods, X 1,,X, that will be available to estimate the value of θ; θˆ may be writte θˆ [X 1, X,, X ] to emphasize that θˆ itself is a radom variable. Its value depeds o the sample values of the radom variable that will be observed. A estimator θˆ of a quatity θ is biased if E[θˆ] θ ad ubiased if E[θˆ] θ. The quatity {E[θˆ] θ} is geerally called the bias of the estimator. A ubiased estimator has the property that its expected value equals the value of the quatity to be estimated. The sample mea is a ubiased estimate of the populatio mea µ X because 1 1 E[ X] E Xi E[ Xi] µ X (7.40) i 1 i 1 The estimator S X of the variace of X is a ubiased estimator of the true variace σ X for idepedet observatios (Bejami ad Corell, 1970): E S X σ X (7.41) However, the correspodig estimator of the stadard deviatio, S X, is i geeral a biased estimator of σ X because E[ S X ] date σ X (7.4) The secod importat statistic ofte used to assess the accuracy of a estimator θˆ is the variace of the estimator Var θˆ, which equals E{(θˆ E[θˆ]) }. For the mea of a set of idepedet observatios, the variace of the sample mea is: X Var(X) σ discharge m 3/s * Value for 1945 is missig. date discharge m 3/s Table 7.. Aual maximum discharges o Magra River, Italy, at Calamazza, *. (7.43) It is commo to call σ x / the stadard error of xˆ rather tha its stadard deviatio. The stadard error of a average is the most commoly reported measure of its precisio. The bias measures the differece betwee the average value of a estimator ad the quatity to be estimated. E01101b
9 176 Water Resources Systems Plaig ad Maagemet The variace measures the spread or width of the estimator s distributio. Both cotribute to the amout by which a estimator deviates from the quatity to be estimated. These two errors are ofte combied ito the mea square error. Uderstadig that θ is fixed ad the estimator θˆ is a radom variable, the mea squared error is the expected value of the squared distace (error) betwee θ ad its estimator θˆ: MSE(θˆ) E[(θˆ θ) ] E{[θˆ E(θˆ)] [E(θˆ) θ]} [Bias] Var(θˆ) (7.44) where [Bias] is E(θˆ) θ. Equatio 7.44 shows that the MSE, equal to the expected average squared deviatio of the estimator θˆ from the true value of the parameter θ, ca be computed as the bias squared plus the variace of the estimator. MSE is a coveiet measure of how closely θˆ approximates θ because it combies both bias ad variace i a logical way. Estimatio of the coefficiet of skewess γ X provides a good example of the use of the MSE for evaluatig the total deviatio of a estimate from the true populatio value. The sample estimate γˆx of γ X is ofte biased, has a large variace, ad its absolute value was show by Kirby (1974) to be bouded by the square root of the sample size : γˆx (7.45) The bouds do ot deped o the true skew, γ X. However, the bias ad variace of γˆx do deped o the sample size ad the actual distributio of X. Table 7.3 cotais the expected value ad stadard deviatio of the estimated coefficiet of skewess γˆx whe X has either a ormal distributio, for which γ X 0, or a gamma distributio with γ X 0.5, 0.50, 1.00,.00, or These values are adapted from Wallis et al. (1974 a,b) who employed momet estimators slightly differet tha those i Equatio For the ormal distributio, E[γˆ] 0 ad Var [γˆx] 5/. I this case, the skewess estimator is ubiased but highly variable. I all the other cases i Table 7.3, the skewess estimator is biased. To illustrate the magitude of these errors, cosider the mea square error of the skew estimator γˆx calculated from a sample of size 50 whe X has a gamma distributio with γ X 0.50, a reasoable value for aual streamflows. The expected value of γˆx is 0.45; its variace equals (0.37), its stadard deviatio is squared. Usig Equatio 7.44, the mea square error of γˆx is: MSE(γˆX) ( ) ( 037. ) (7.46) A ubiased estimate of γ X is simply (0.50/0.45) γˆx. Here the estimator provided by Equatio 7.39e has bee scaled to elimiate bias. This ubiased estimator has a mea squared error of: MSE 050. ˆ 048. γ X 050. ( ) ( 037. ) (7.47) The mea square error of the ubiased estimator of γˆx is larger tha the mea square error of the biased estimate. Ubiasig γˆx results i a larger mea square error for all the cases listed i Table 7.3 except for the ormal distributio for which γ X 0, ad the gamma distributio with γ X As show here for the skew coefficiet, biased estimators ofte have smaller mea square errors tha ubiased estimators. Because the mea square error measures the total average deviatio of a estimator from the quatity beig estimated, this result demostrates that the strict or uquestioig use of ubiased estimators is ot advisable. Additioal iformatio o the samplig distributio of quatiles ad momets is cotaied i Stediger et al. (1993)..4. L-Momets ad Their Estimators L-momets are aother way to summarize the statistical properties of hydrological data based o liear combiatios of the origial observatios (Hoskig, 1990). Recetly, hydrologists have foud that regioalizatio methods (to be discussed i Sectio 5) usig L-momets are superior to methods usig traditioal momets (Hoskig ad Wallis, 1997; Stediger ad Lu, 1995). L-momets have also proved useful for costructio of goodess-of-fit tests (Hoskig et al., 1985; Chowdhury et al., 1991; Fill ad Stediger, 1995), measures of regioal homogeeity ad distributio selectio methods (Vogel ad Feessey, 1993; Hoskig ad Wallis, 1997).
10 Cocepts i Probability, Statistics ad Stochastic Modellig 177 distributio of X expected value of γ X sample size E01101c Table 7.3. Samplig properties of coefficiet of skewess estimator. Source: Wallis et al. (1974b) who oly divided by i the estimators of the momets, whereas i Equatios 7.39b ad 7.39c, we use the geerally-adopted coefficiets of 1/( 1) ad /( 1)( ) for the variace ad skew. ormal gamma γ X γ X γ X γ X γ X γ X = = = = = = upper boud o skew ^ stadard deviatio of γ X distributio of X sample size ormal gamma γ X γ X γ X γ X γ X γ X = = = = = = The first L-momet desigated as λ 1 is simply the arithmetic mea: λ 1 E[X] (7.48) Now let X (i ) be the i th largest observatio i a sample of size (i correspods to the largest). The, for ay distributio, the secod L-momet, λ, is a descriptio of scale based upo the expected differece betwee two radomly selected observatios: λ (1/) E[X ( 1) X (1 ) ] (7.49) Similarly, L-momet measures of skewess ad kurtosis use three ad four radomly selected observatios, respectively. λ 3 (1/3) E[X (3 3) X ( 3) X (1 3) ] (7.50) λ 4 (1/4) E[X (4 4) 3X (3 4) 3X ( 4) X (1 4) ] (7.51) Sample L-momet estimates are ofte computed usig itermediate statistics called probability weighted momets (PWMs). The r th probability weighted momet is defied as: β r E{X[F(X)] r } (7.5) where F(X) is the cumulative distributio fuctio of X. Recommeded (Ladwehr et al., 1979; Hoskig ad Wallis, 1995) ubiased PWM estimators, b r, of β r are computed as: b0 X 1 b1 ( j 1) X ( j ) ( 1) j 1 b ( j 1)( j ) X ( j ) ( 1)( ) j 3 (7.53)
11 178 Water Resources Systems Plaig ad Maagemet These are examples of the geeral formula for computig estimators b r of β r. 1 j 1 1 br X j r r 1 j 1 r 1 r X( j) r 1 (7.54) for r 1,, 1. L-momets are easily calculated i terms of PWMs usig: λ 1 β 0 j r 1 j r 1 λ β 1 β 0 ( ) λ 3 6β 6β 1 β 0 λ 4 0β 3 30β 1β 1 β 0 (7.55) Wag (1997) provides formulas for directly calculatig L-momet estimators of λ r. Measures of the coefficiet of variatio, skewess ad kurtosis of a distributio ca be computed with L-momets, as they ca with traditioal product momets. Where skew primarily measures the asymmetry of a distributio, the kurtosis is a additioal measure of the thickess of the extreme tails. Kurtosis is particularly useful for comparig symmetric distributios that have a skewess coefficiet of zero. Table 7.4 provides defiitios of the traditioal coefficiet of variatio, coefficiet of skewess ad coefficiet of kurtosis, as well as the L-momet, L-coefficiet of variatio, L-coefficiet of skewess ad L-coefficiet of kurtosis. The flood data i Table 7. ca be used to provide a example of L-momets. Equatio 7.53 yields estimates of the first three Probability Weighted Momets: b 0 1,549.0 b b (7.56) Recall that b 0 is just the sample average x _. The sample L-momets are easily calculated usig the probability weighted momets. Oe obtais: λˆ1 b 0 1,549 λˆ b 1 b λˆ3 6b 6b 1 b 0 80 (7.55) Thus, the sample estimates of the L-coefficiet of variatio, t, ad L-coefficiet of skewess, t 3, are: t 0.95 t (7.58) Table 7.4. Defiitios of dimesioless product-momet ad L-momet ratios. ame commo symbol defiitio E01101d product-momet ratios coefficiet of variatio skewess kurtosis CVX γx κ X σx / µ X E [ (X -µ X ) 3 ] / σ X E [ (X -µ X ) 4 ] / σ X 3 4 L-momet ratios * L-coefficiet of variatio * L-CV, τ skewess L-skewess, τ kurtosis L-kurtosis, τ 3 4 λ / λ / λ λ / λ Hoskig ad Wallis (1997) use τ istead of τ to represet the L-CV ratio 3 4 λ 1
12 Cocepts i Probability, Statistics ad Stochastic Modellig Distributios of Radom Evets A frequet task i water resources plaig is the developmet of a model of some probabilistic or stochastic pheomea such as streamflows, flood flows, raifall, temperatures, evaporatio, sedimet or utriet loads, itrate or orgaic compoud cocetratios, or water demads. This ofte requires oe to fit a probability distributio fuctio to a set of observed values of the radom variable. Sometimes, oe s immediate objective is to estimate a particular quatile of the distributio, such as the 100-year flood, 50-year six-hour-raifall depth, or the miimum seve-day-average expected oce-i-te-year flow. The the fitted distributio ca supply a estimate of that quatity. I a stochastic simulatio, fitted distributios are used to geerate possible values of the radom variable i questio. Rather tha fittig a reasoable ad smooth mathematical distributio, oe could use the empirical distributio represeted by the data to describe the possible values that a radom variable may assume i the future ad their frequecy. I practice, the true mathematical form for the distributio that describes the evets is ot kow. Moreover, eve if it was, its fuctioal form may have too may parameters to be of much practical use. Thus, usig the empirical distributio represeted by the data itself has substatial appeal. Geerally, the free parameters of the theoretical distributio are selected (estimated) so as to make the fitted distributio cosistet with the available data. The goal is to select a physically reasoable ad simple distributio to describe the frequecy of the evets of iterest, to estimate that distributio s parameters, ad ultimately to obtai quatiles, performace idices ad risk estimates of satisfactory accuracy for the problem at had. Use of a theoretical distributio has several advatages over use of the empirical distributio: It presets a smooth iterpretatio of the empirical distributio. As a result quatiles, performace idices ad other statistics computed usig the fitted distributio should be more accurate tha those computed with the empirical distributio. It provides a compact ad easy-to-use represetatio of the data. It is likely to provide a more realistic descriptio of the rage of values that the radom variable may assume ad their likelihood. For example, by usig the empirical distributio, oe implicitly assumes that o values larger or smaller tha the sample maximum or miimum ca occur. For may situatios, this is ureasoable. Ofte oe eeds to estimate the likelihood of extreme evets that lie outside the rage of the sample (either i terms of x values or i terms of frequecy). Such extrapolatio makes little sese with the empirical distributio. I may cases, oe is ot iterested i the values of a radom variable X, but istead i derived values of variables Y that are fuctios of X. This could be a performace fuctio for some system. If Y is the performace fuctio, iterest might be primarily i its mea value E[Y], or the probability some stadard is exceeded, Pr{Y stadard}. For some theoretical X-distributios, the resultig Y-distributio may be available i closed form, thus makig the aalysis rather simple. (The ormal distributio works with liear models, the logormal distributio with product models, ad the gamma distributio with queuig systems.) This sectio provides a brief itroductio to some useful techiques for estimatig the parameters of probability distributio fuctios ad for determiig if a fitted distributio provides a reasoable or acceptable model of the data. Sub-sectios are also icluded o families of distributios based o the ormal, gamma ad geeralized-extreme-value distributios. These three families have foud frequet use i water resources plaig (Kottegoda ad Rosso, 1997) Parameter Estimatio Give a set of observatios to which a distributio is to be fit, oe first selects a distributio fuctio to serve as a model of the distributio of the data. The choice of a distributio may be based o experiece with data of that type, some uderstadig of the mechaisms givig rise to the data, ad/or examiatio of the observatios themselves. Oe ca the estimate the parameters of the chose distributio ad determie if the fitted distributio provides a acceptable model of the data. A model is geerally judged to be uacceptable if it is ulikely that
13 180 Water Resources Systems Plaig ad Maagemet oe could have observed the available data were they actually draw from the fitted distributio. I may cases, good estimates of a distributio s parameters are obtaied by the maximum-likelihoodestimatio procedure. Give a set of idepedet observatios {x 1,, x } of a cotiuous radom variable X, the joit probability desity fuctio for the observatios is: fx x x 1, X, X3,, X( 1,, θ ) = f ( x θ) f ( x θ) f ( x θ) X 1 X X (7.59) where θ is the vector of the distributio s parameters. The maximum likelihood estimator of θ is that vector θ which maximizes Equatio 7.59 ad thereby makes it as likely as possible to have observed the values {x 1,, x }. Cosiderable work has goe ito studyig the properties of maximum likelihood parameter estimates. Uder rather geeral coditios, asymptotically the estimated parameters are ormally distributed, ubiased ad have the smallest possible variace of ay asymptotically ubiased estimator (Bickel ad Doksum, 1977). These, of course, are asymptotic properties, valid for large sample sizes. Better estimatio procedures, perhaps yieldig biased parameter estimates, may exist for small sample sizes. Stediger (1980) provides such a example. Still, maximum likelihood procedures are recommeded with moderate ad large samples, eve though the iterative solutio of oliear equatios is ofte required. A example of the maximum likelihood procedure for which closed-form expressios for the parameter estimates are obtaied is provided by the logormal distributio. The probability desity fuctio of a logormally distributed radom variable X is: 1 1 fx( x) exp [l( x) µ ] x πσ σ (7.60) Here, the parameters µ ad σ are the mea ad variace of the logarithm of X, ad ot of X itself. Maximizig the logarithm of the joit desity for {x 1,,x } is more coveiet tha maximizig the joit probability desity itself. Hece, the problem ca be expressed as the maximizatio of the log-likelihood fuctio L l f[( x µσ, )] l( xi π ) l( σ ) (7.61) The maximum ca be obtaied by equatig to zero the partial derivatives L/ µ ad L/ σ whereby oe obtais: L 1 0 [l( xi) µ ] µ σ i 1 L [l( xi) µ ] σ σ σ These equatios yield the estimators 1 µˆ l( x i ) σˆ (7.6) (7.63) The secod-order coditios for a maximum are met ad these values maximize Equatio It is useful to ote that if oe defies a ew radom variable Y l(x), the the maximum likelihood estimators of the parameters µ ad σ, which are the mea ad variace of the Y distributio, are the sample estimators of the mea ad variace of Y: µˆ y _ i 1 i 1 l f( x µσ, ) i i 1 i 1 1 [l( x i ) µ ˆ ] i 1 i i 1 1 l( ) σ [ x i µ ] i 1 σˆ [( 1)/]S Y (7.64) The correctio [( 1)/] i this last equatio is ofte eglected. The secod commoly used parameter estimatio procedure is the method of momets. The method of momets is ofte a quick ad simple method for obtaiig parameter estimates for may distributios. For a distributio with m 1, or 3 parameters, the first m momets of the postulated distributio i Equatios 7.34, 7.35 ad 7.37 are equated to the estimates of those momets calculated usig Equatios The resultig oliear equatios are solved for the ukow parameters.
14 Cocepts i Probability, Statistics ad Stochastic Modellig 181 For the logormal distributio, the mea ad variace of X as a fuctio of the parameters µ ad σ are give by 1 µ X exp µ σ σ exp( µ σ)[exp ( σ) 1] X (7.65) Substitutig x _ for µ X ad s X for σ X ad solvig for µ ad σ oe obtais σˆ l s / x x 1 µˆ l l x σˆ (7.66) 1 s / x The data i Table 7. provide a illustratio of both fittig methods. Oe ca easily compute the sample mea ad variace of the logarithms of the flows to obtai µˆ 7.0 ( 1 ) X X σˆ (0.565) (7.67) Alteratively, the sample mea ad variace of the flows themselves are x _ s X 661,800 (813.5) (7.68) Substitutig those two values i Equatio 7.66 yields µˆ 7.4 σ X (0.4935) (7.69) Method of momets ad maximum likelihood are just two of may possible estimatio methods. Just as method of momets equates sample estimators of momets to populatio values ad solves for a distributio s parameters, oe ca simply equate L-momet estimators to populatio values ad solve for the parameters of a distributio. The resultig method of L-momets has received cosiderable attetio i the hydrological literature (Ladwehr et al., 1978; Hoskig et al., 1985; Hoskig ad Wallis, 1987; Hoskig, 1990; Wag, 1997). It has bee show to have sigificat advatages whe used as a basis for regioalizatio procedures that will be discussed i Sectio 5 (Lettemaier et al., 1987; Stediger ad Lu, 1995; Hoskig ad Wallis, 1997). Bayesia procedures provide aother approach that is related to maximum likelihood estimatio. Bayesia iferece employs the likelihood fuctio to represet the iformatio i the data. That iformatio is augmeted with a prior distributio that describes what is kow about costraits o the parameters ad their likely values beyod the iformatio provided by the recorded data available at a site. The likelihood fuctio ad the prior probability desity fuctio are combied to obtai the probability desity fuctio that describes the posterior distributio of the parameters: f θ (θ x 1, x,, x ) f X (x 1, x,, x θ)ξ(θ) (7.70) The symbol meas proportioal to ad ξ(θ) is the probability desity fuctio for the prior distributio for θ (Kottegoda ad Rosso, 1997). Thus, except for a costat of proportioality, the probability desity fuctio describig the posterior distributio of the parameter vector θ is equal to the product of the likelihood fuctio f X (x 1, x,, x θ) ad the probability desity fuctio for the prior distributio ξ(θ) for θ. Advatages of the Bayesia approach are that it allows the explicit modellig of ucertaity i parameters (Stediger, 1997; Kuczera, 1999) ad provides a theoretically cosistet framework for itegratig systematic flow records with regioal ad other hydrological iformatio (Vices et al., 1975; Stediger, 1983; Kuczera, 1983). Martis ad Stediger (000) illustrate how a prior distributio ca be used to eforce realistic costraits upo a parameter as well as providig a descriptio of its likely values. I their case, use of a prior of the shape parameter κ of a geeralized extreme value (GEV) distributio (discussed i Sectio 3.6) allowed defiitio of geeralized maximum likelihood estimators that, over the κ-rage of iterest, performed substatially better tha maximum likelihood, momet, ad L-momet estimators. While Bayesia methods have bee available for decades, the computatioal challege posed by the solutio of Equatio 7.70 has bee a obstacle to their use. Solutios to Equatio 7.70 have bee available for special cases such as ormal data, ad biomial ad Poisso samples (Raiffa ad Schlaifer, 1961; Bejami ad Corell, 1970; Zeller, 1971). However, a ew ad very geeral set of Markov Chai Mote Carlo (MCMC) procedures (discussed i Sectio 7.) allows umerical computatio of the posterior distributios of parameters
15 18 Water Resources Systems Plaig ad Maagemet for a very broad class of models (Gilks et al., 1996). As a result, Bayesia methods are ow becomig much more popular ad are the stadard approach for may difficult problems that are ot easily addressed by traditioal methods (Gelma et al., 1995; Carli ad Louis, 000). The use of Mote Carlo Bayesia methods i flood frequecy aalysis, raifall ruoff modellig, ad evaluatio of evirometal pathoge cocetratios are illustrated by Wag (001), Bates ad Campbell (001) ad Craiiceau et al. (00), respectively. Fially, a simple method of fittig flood frequecy curves is to plot the ordered flood values o special probability paper ad the to draw a lie through the data (Gumbel, 1958). Eve today, that simple method is still attractive whe some of the smallest values are zero or uusually small, or have bee cesored as will be discussed i Sectio 4 (Kroll ad Stediger, 1996). Plottig the raked aual maximum series agaist a probability scale is always a excellet ad recommeded way to see what the data look like ad for determiig whether or ot a fitted curve is cosistet with the data (Stediger et al., 1993). Statisticias ad hydrologists have ivestigated which of these methods most accurately estimates the parameters themselves or the quatiles of the distributio. Oe also eeds to determie how accuracy should be measured. Some studies have used average squared deviatios, some have used average absolute weighted deviatios with differet weights o uder ad over-estimatio, ad some have used the squared deviatios of the log-quatile estimator (Slack et al., 1975; Kroll ad Stediger, 1996). I almost all cases, oe is also iterested i the bias of a estimator, which is the average value of the estimator mius the true value of the parameter or quatile beig estimated. Special estimators have bee developed to compute desig evets that o average are exceeded with the specified probability ad have the aticipated risk of beig exceeded (Beard, 1960, 1997; Rasmusse ad Rosbjerg, 1989, 1991a,b; Stediger, 1997; Rosbjerg ad Madse, 1998). 3.. Model Adequacy After estimatig the parameters of a distributio, some check of model adequacy should be made. Such checks vary from simple comparisos of the observatios with the fitted model (usig graphs or tables) to rigorous statistical tests. Some of the early ad simplest methods of parameter estimatio were graphical techiques. Although quatitative techiques are geerally more accurate ad precise for parameter estimatio, graphical presetatios are ivaluable for comparig the fitted distributio with the observatios for the detectio of systematic or uexplaied deviatios betwee the two. The observed data will plot as a straight lie o probability graph paper if the postulated distributio is the true distributio of the observatio. If probability graph paper does ot exist for the particular distributio of iterest, more geeral techiques ca be used. Let x (i) be the ith largest value i a set of observed values {x i } so that x (1) x () x (). The radom variable X (i) provides a reasoable estimate of the pth quatile x p of the true distributio of X for p i/( 1). I fact, whe oe cosiders the cumulative probability U i associated with the radom variable X (i), U i F X (X (i) ), ad if the observatios X (i) are idepedet, the the U i have a beta distributio (Gumbel, 1958) with probability desity fuctio: fu ( u )! ui 1 ( 1 u) i 0 u 1 i ( i 1)!( 1)! (7.71) This beta distributio has mea i E[ Ui] 1 ad variace i ( i 1) Var( Ui ) ( 1) ( ) (7.7a) (7.7b) A good graphical check of the adequacy of a fitted distributio G(x) is obtaied by plottig the observatios x (i) versus G 1 [i/( 1)] (Wilk ad Gaadesika, 1968). Eve if G(x) equalled to a exact degree the true X-distributio F X [x], the plotted poits would ot fall exactly o a 45 lie through the origi of the graph. This would oly occur if F X [x (i) ] exactly equalled i/( 1), ad therefore each x (i) exactly equalled F X 1 [i/( 1)]. A appreciatio for how far a idividual observatio x (i) ca be expected to deviate from G 1 [i/( 1)] ca be obtaied by plottig G 1 [u i (0.75) ] ad G 1 [u i (0.5) ], where u i (0.75) ad u i (0.5) are the upper ad lower quartiles of the distributio of U i obtaied from itegratig the probability
16 Cocepts i Probability, Statistics ad Stochastic Modellig 183 desity fuctio i Equatio The required icomplete beta fuctio is also available i may software packages, icludig Microsoft Excel. Stediger et al. (1993) show that u (1) ad (1 u () ) fall betwee 5/ ad 3( 1) with a probability of 90%, thus illustratig the great ucertaity associated with the cumulative probability of the smallest value ad the exceedace probability of the largest value i a sample. Figures 7.a ad 7.b illustrate the use of this quatile quatile plottig techique by displayig the results of fittig a ormal ad a logormal distributio to the aual maximum flows i Table 7. for the Magra River, Italy, at Calamazza for the years The observatios of X (i), give i Table 7., are plotted o the vertical axis agaist the quatiles G 1 [i/( 1)] o the horizotal axis. A probability plot is essetially a scatter plot of the sorted observatios X (i) versus some approximatio of their expected or aticipated value, represeted by G 1 (p i ), where, as suggested, p i i/( 1). The p i values are called plottig positios. A commo alterative to i/( 1) is (i 0.5)/, which results from a probabilistic iterpretatio of the empirical distributio of the data. May reasoable plottig positio formulas have bee proposed based upo the sese i which G 1 (p i ) should approximate X (i). The Weibull formula i/( 1) ad the Haze formula (i 0.5)/ bracket most of the reasoable choices. Popular formulas are summarized by Stediger et al. (1993), who also discuss the geeratio of probability plots for may distributios commoly employed i hydrology. Rigorous statistical tests are available for tryig to determie whether or ot it is reasoable to assume that a give set of observatios could have bee draw from a particular family of distributios. Although ot the most powerful of such tests, the Kolmogorov Smirov test provides bouds withi which every observatio should lie if the sample is actually draw from the assumed distributio. I particular, for G F X, the test specifies that E0057c E0057d observed values X observed values X (i) ad Kolmogorov-Smirov bouds (m 3 (i) ad Kolmogorov-Smirov bouds (m 3 /sec) /sec) upper 90% cofidece iterval for all poits lower 90% cofidece iterval for all poits quatiles of fitted ormal distributio G -1 [ i /(+1)] m 3 /sec) upper 90% cofidece iterval for all poits lower 90% cofidece iterval for all poits quatiles of fitted logormal distributio G -1 [i/(+1)] (m 3 /sec) 1 i 1 1 Pr G Cα X() i G Cα i 1 α i (7.73) Figure 7.. Plots of aual maximum discharges of Magra River, Italy, versus quatiles of fitted (a) ormal ad (b) logormal distributios. where C α is the critical value of the test at sigificace level α. Formulas for C α as a fuctio of are cotaied i Table 7.5 for three cases: (1) whe G is completely
17 184 Water Resources Systems Plaig ad Maagemet specified idepedet of the sample s values; () whe G is the ormal distributio ad the mea ad variace are estimated from the sample with x _ ad s X ; ad (3) whe G is the expoetial distributio ad the scale parameter is estimated as 1/(x _ ). Chowdhury et al. (1991) provide critical values for the Gumbel ad geeralized extreme value (GEV) distributios (Sectio 3.6) with kow shape parameter κ. For other distributios, the values obtaied from Table 7.5 may be used to costruct approximate simultaeous cofidece itervals for every X (i). Figures 7.a ad b cotai 90% cofidece itervals for the plotted poits costructed i this maer. For the ormal distributio, the critical value of C α equals /( / ), where correspods to α For 40, oe computes C α As ca be see i Figure 7.a, the aual maximum flows are ot cosistet with the hypothesis that they were draw from a ormal distributio; three of the observatios lie outside the simultaeous 90% cofidece itervals for all the poits. This demostrates a statistically sigificat lack of fit. The fitted ormal distributio uderestimates the quatiles correspodig to small ad large probabilities while overestimatig the quatiles i a itermediate rage. I Figure 7.b, deviatios betwee the fitted logormal distributio ad the observatios ca be attributed to the differeces betwee F X (x (i) ) ad i/( 1). Geerally, the poits are all ear the 45 lie through the origi, ad o major systematic deviatios are apparet. The Kolmogorov Smirov test coveietly provides bouds withi which every observatio o a probability plot should lie if the sample is actually draw from the assumed distributio, ad thus is useful for visually evaluatig the adequacy of a fitted distributio. However, it is ot the most powerful test available for estimatig which distributio a set of observatios is likely to have bee draw from. For that purpose, several other more aalytical tests are available (Fillibe, 1975; Hoskig, 1990; Chowdhury et al., 1991; Kottegoda ad Rosso, 1997). The Probability Plot Correlatio test is a popular ad powerful test of whether a sample has bee draw from a postulated distributio, though it is ofte weaker tha alterative tests at rejectig thi-tailed alteratives (Fillibe, 1975; Fill ad Stediger, 1995). A test with greater power has a greater probability of correctly determiig that a sample is ot from the postulated distributio. The Probability Plot Correlatio Coefficiet test employs the correlatio r betwee the ordered observatios x (i) ad the correspodig fitted quatiles w i G 1 (p i ), determied by plottig positios p i for each x (i). Values of r ear 1.0 suggest that the observatios could have bee draw from the fitted distributio: r measures the liearity of the probability plot providig a quatitative assessmet of fit. If x _ deotes the average value of the observatios ad w _ deotes the average value of the fitted quatiles, the ( x() i x)( wi w) r (7.74) ( x() i x) ( wi w) 0. 5 ( ) Table 7.5. Critical values of Kolmogorov Smirov statistic as a fuctio of sample size (after Stephes, 1974). sigificace level α E01101e F x completely specified: C α ( / ) F x ormal with mea ad variace estimated as x ad s x Cα ( / ) F x expoetial with scale parameter b estimated as 1 / (x) ( Cα + 0. / ) ( / ) values of C α are calculated as follows: for case with α = 0.10, C α = / ( / )
18 Cocepts i Probability, Statistics ad Stochastic Modellig 185 Table 7.6 provides critical values for r for the ormal distributio, or the logarithms of logormal variates, based upo the Blom plottig positio that has p i (i 3/8)/( 1/4). Values for the Gumbel distributio are reproduced i Table 7.7 for use with the Grigorte plottig positio p i (i 0.44)/( 0.1). The table also applies to logarithms of Weibull variates (Stediger et al., 1993). Other tables are available for the GEV (Chowdhury et al., 1991), the Pearso type 3 (Vogel ad McMarti, 1991), ad expoetial ad other distributios (D Agostio ad Stephes, 1986). Recetly developed L-momet ratios appear to provide goodess-of-fit tests that are superior to both the Kolmogorov Smirov ad the Probability Plot Correlatio test (Hoskig, 1990; Chowdhury et al., 1991; Fill ad Stediger, 1995). For ormal data, the L-skewess estimator τˆ3 (or t 3 ) would have mea zero ad Var τˆ3 ( /)/, allowig costructio of a powerful test of ormality agaist skewed alteratives usig the ormally distributed statistic Z t3 / ( / )/ (7.75) , sigificace level Table 7.6. Lower critical values of the probability plot correlatio test statistic for the ormal distributio usig p i (i 3/8)/( 1/4) (Vogel, 1987). E01101f with a reject regio Z z α/. Chowdhury et al. (1991) derive the samplig variace of the L-CV ad L-skewess estimators τˆ ad τˆ3 as a fuctio of κ for the GEV distributio. These allow costructio of a test of whether a particular data set is cosistet with a GEV distributio with a regioally estimated value of κ, or a regioal κ ad a regioal coefficiet of variatio, CV. Fill ad Stediger (1995) show that the τˆ3 L-skewess estimator provides a test for the Gumbel versus a geeral GEV distributio usig the ormally distributed statistic sigificace level E01101g Z (τˆ3 0.17)/ ( / )/ (7.76) with a reject regio Z z α/. The literature is full of goodess-of-fit tests. Experiece idicates that amog the better tests there is ofte ot a great deal of differece (D Agostio ad Stephes, 1986). Geeratio of a probability plot is most ofte a good idea because it allows the modeller to see what the data look like ad where problems occur. The Kolmogorov Smirov test helps the eye , Table 7.7. Lower critical values of the probability plot correlatio test statistic for the Gumbel distributio usig p i (i 0.44)/( 0.1) (Vogel, 1987).
19 186 Water Resources Systems Plaig ad Maagemet iterpret a probability plot by addig bouds to a graph, illustratig the magitude of deviatios from a straight lie that are cosistet with expected variability. Oe ca also use quatiles of a beta distributio to illustrate the possible error i idividual plottig positios, particularly at the extremes where that ucertaity is largest. The probability plot correlatio test is a popular ad powerful goodess-of-fit statistic. Goodess-of-fit tests based upo sample estimators of the L-skewess τˆ3 for the ormal ad Gumbel distributio provide simple ad useful tests that are ot based o a probability plot Normal ad Logormal Distributios The ormal distributio ad its logarithmic trasformatio, the logormal distributio, are arguably the most widely used distributios i sciece ad egieerig. The probability desity fuctio of a ormal radom variable is fx( x) 1 1 exp ( x µ ) πσ σ for X (7.77) where µ ad σ are equivalet to µ X ad σ X, the mea ad variace of X. Iterestigly, the maximum likelihood estimators of µ ad σ are almost idetical to the momet estimates x _ ad s X. The ormal distributio is symmetric about its mea µ X ad admits values from to. Thus, it is ot always satisfactory for modellig physical pheomea such as streamflows or pollutat cocetratios, which are ecessarily o-egative ad have skewed distributios. A frequetly used model for skewed distributios is the logormal distributio. A radom variable X has a logormal distributio if the atural logarithm of X, l(x), has a ormal distributio. If X is logormally distributed, the by defiitio l(x) is ormally distributed, so that the desity fuctio of X is for x 0 ad µ l(η). Here η is the media of the X-distributio. A logormal radom variable takes o values i the rage [0, ]. The parameter µ determies the scale of the X-distributio whereas σ determies the shape of the distributio. The mea ad variace of the logormal distributio are give i Equatio Figure 7.3 illustrates the various shapes that the logormal probability desity fuctio ca assume. It is highly skewed with a thick right had tail for σ 1, ad approaches a symmetric ormal distributio as σ 0. The desity fuctio always has a value of zero at x 0. The coefficiet of variatio ad skew are: CV X [exp(σ ) 1] 1/ (7.79) γ X 3CV X CV X 3 (7.80) The maximum likelihood estimates of µ ad σ are give i Equatio 7.63 ad the momet estimates i Equatio For reasoable-sized samples, the maximum likelihood estimates geerally perform as well or better tha the momet estimates (Stediger, 1980). The data i Table 7. were used to calculate the parameters of the logormal distributio that would describe these flood flows. The results are reported i Equatio The two-parameter maximum likelihood ad method of momets estimators idetify parameter estimates for which the distributio skewess coefficiets f(x) σ = 0. σ = 0.5 σ = dl ( x) fx( x) exp [ 1( x) µ ] πσ σ dx 1 1 exp [ l( x / η)] x πσ σ (7.78) E0057e x Figure 7.3. Logormal probability desity fuctios with various stadard deviatios σ.
20 Cocepts i Probability, Statistics ad Stochastic Modellig 187 are.06 ad 1.7, which is substatially greater tha the sample skew of A useful geeralizatio of the two-parameter logormal distributio is the shifted logormal or three-parameter logormal distributio obtaied whe l(x τ) is described by a ormal distributio, ad X τ. Theoretically, τ should be positive if, for physical reasos, X must be positive; practically, egative values of τ ca be allowed whe the resultig probability of egative values of X is sufficietly small. Ufortuately, maximum likelihood estimates of the parameters µ, σ, ad τ are poorly behaved because of irregularities i the likelihood fuctio (Giesbrecht ad Kempthore, 1976). The method of momets does fairly well whe the skew of the fitted distributio is reasoably small. A method that does almost as well as the momet method for low-skew distributios, ad much better for highly skewed distributios, estimates τ by: x τˆ () x( ) xˆ. x x xˆ (7.81) provided that x (1) x () xˆ0.50 0, where x (1) ad x () are the smallest ad largest observatios ad xˆ0.50 is the sample media (Stediger, 1980; Hoshi et al., 1984). If x (1) x () xˆ0.50 0, the the sample teds to be egatively skewed ad a three-parameter logormal distributio with a lower boud caot be fit with this method. Good estimates of µ ad σ to go with τˆ i Equatio 7.81 are (Stediger, 1980): µˆ l () 1 ( ) x τˆ s /( x τˆ ) 1 X σˆ sx l 1 ( x τˆ ) (7.8) For the data i Table 7., Equatios 7.81 ad 7.8 yield the hybrid momet-of-momets estimates of µˆ 7.606, σˆ (0.3659) ad τˆ for the threeparameter logormal distributio. This distributio has a coefficiet of skewess of 1.19, which is more cosistet with the sample skewess estimator tha were the values obtaied whe a twoparameter logoral distributio was fit to the data. Alteratively, oe ca estimate µ ad σ by the sample mea ad variace of l(x τˆ) which yields the hybrid maximum likelihood estimates µˆ 7.605, σˆ (0.3751) ad agai τˆ The two sets of estimates are surprisigly close i this istace. I this secod case, the fitted distributio has a coefficiet of skewess of 1.. Natural logarithms have bee used here. Oe could have just as well use base 10 commo logarithms to estimate the parameters; however, i that case the relatioships betwee the log-space parameters ad the real-space momets chage slightly (Stediger et al., 1993, Equatio ) Gamma Distributios The gamma distributio has log bee used to model may atural pheomea, icludig daily, mothly ad aual streamflows as well as flood flows (Bobée ad Ashkar, 1991). For a gamma radom variable X, x fx( x) β x x ( ) ( ) α β α 1 β e β 0 Γ α µ X β σ α X β γ X CVX for β 0 α (7.83) The gamma fuctio, Γ(α), for iteger α is (α 1)!. The parameter α 0 determies the shape of the distributio; β is the scale parameter. Figure 7.4 illustrates the differet shapes that the probability desity fuctio for a gamma variable ca assume. As α, the gamma distributio approaches the symmetric ormal distributio, whereas for 0 α 1, the distributio has a highly asymmetric J-shaped probability desity fuctio whose value goes to ifiity as x approaches zero. The gamma distributio arises aturally i may problems i statistics ad hydrology. It also has a very reasoable shape for such o-egative radom variables as raifall ad streamflow. Ufortuately, its cumulative distributio fuctio is ot available i closed form, except for iteger α, though it is available i may software packages icludig Microsoft Excel. The gamma
21 188 Water Resources Systems Plaig ad Maagemet f(x).5.0 αˆ For the two-parameter gamma distributio, ( x) s X 1.5 βˆ x s X (7.85) E0057f Figure 7.4. The gamma distributio fuctio for various values of the shape parameter α. family icludes a very special case: the expoetial distributio is obtaied whe α 1. The gamma distributio has several geeralizatios (Bobée ad Ashkar, 1991). If a costat τ is subtracted from X so that (X τ) has a gamma distributio, the distributio of X is a three-parameter gamma. This is also called a Pearso type 3 distributio, because the resultig distributio belogs to the third type of distributios suggested by the statisticia Karl Pearso. Aother variatio is the log Pearso type 3 distributio obtaied by fittig the logarithms of X with a Pearso type 3 distributio. The log Pearso distributio is discussed further i the ext sectio. The method of momets may be used to estimate the parameters of the gamma distributio. For the threeparameter gamma distributio, s τˆ x X γ ˆ X σˆ 4 ( γˆ X ) βˆ γˆ s X x X (7.84) where x _, s X, ad γ X are estimates of the mea, variace, ad coefficiet of skewess of the distributio of X (Bobée ad Robitaille, 1977). α α α α = 0.50 = 1.00 =.00 = 8.00 Agai, the flood record i Table 7. ca be used to illustrate the differet estimatio procedures. Usig the first three sample momets, oe would obtai for the three-parameter gamma distributio the parameter estimates τˆ αˆ βˆ /47. Usig oly the sample mea ad variace yields the method of momet estimators of the parameters of the two-parameter gamma distributio (τ 0), αˆ 3.67 βˆ /47. The fitted two-parameter gamma distributio has a coefficiet of skewess γ of 1.05, whereas the fitted threeparameter gamma reproduces the sample skew of As occurred with the three-parameter logormal distributio, the estimated lower boud for the three-parameter gamma distributio is egative (τˆ 735.6), resultig i a three-parameter model that has a smaller skew coefficiet tha was obtaied with the correspodig two-parameter model. The reciprocal of βˆ is also reported. While βˆ has iverse x-uits, 1/βˆ is a atural scale parameter that has the same uits as x ad thus ca be easier to iterpret. Studies by Thom (1958) ad Matalas ad Wallis (1973) have show that maximum likelihood parameter estimates are superior to the momet estimates. For the two-parameter gamma distributio, Greewood ad Durad (1960) give approximate formulas for the maximum likelihood estimates (also Haa, 1977). However, the maximum likelihood estimators are ofte ot used i practice because they are very sesitive to the smallest observatios that sometimes suffer from measuremet error ad other distortios.
22 Cocepts i Probability, Statistics ad Stochastic Modellig 189 Whe plottig the observed ad fitted quatiles of a gamma distributio, a approximatio to the iverse of the distributio fuctio is ofte useful. For γ 3, the Wilso Hilferty trasformatio x G γ xn γ 3 µ σ 1 γ 6 36 γ (7.86) gives the quatiles x G of the gamma distributio i terms of x N, the quatiles of the stadard-ormal distributio. Here µ, σ, ad γ are the mea, stadard deviatio, ad coefficiet of skewess of x G. Kirby (197) ad Chowdhury ad Stediger (1991) discuss this ad other more complicated but more accurate approximatios. Fortuately the availability of excellet approximatios of the gamma cumulative distributio fuctio ad its iverse i Microsoft Excel ad other packages has reduced the eed for such simple approximatios Log-Pearso Type 3 Distributio The log-pearso type 3 distributio (LP3) describes a radom variable whose logarithms have a Pearso type 3 distributio. This distributio has foud wide use i modellig flood frequecies ad has bee recommeded for that purpose (IACWD, 198). Bobée (1975) ad Bobée ad Ashkar (1991) discuss the uusual shapes that this hybrid distributio may take allowig egative values of β. The LP3 distributio has a probability desity fuctio give by fx( x) β [ (l( x) )] exp{ (l( x) )} x ( α ) β ξ α 1 β ξ (7.87) with α 0, ad β either positive or egative. For β 0, values are restricted to the rage 0 x exp(ξ). For β 0, values have a lower boud so that exp(ξ) X. Figure 7.5 illustrates the probability desity fuctio for the LP3 distributio as a fuctio of the skew γ of the P3 distributio describig l(x), with σ lx 0.3. The LP3 desity fuctio for γ ca assume a wide rage of shapes with both positive ad egative skews. For γ, the log-space P3 distributio is equivalet to a expoetial distributio fuctio, which decays expoetially as x moves away from the lower boud (β 0) or upper E0057g boud (β 0): as a result the LP3 distributio has a similar shape. The space with 1 γ may be more realistic for describig variables whose probability desity fuctio becomes thier as x takes o large values. For γ 0, the two-parameter logormal distributio is obtaied as a special case. The LP3 distributio has mea ad variace µ σ X e ξ β β 1 α α ξ β β e β β 1 for β, or β 0. X 3.5 f (x) γ = +.0 γ = γ = 0 γ = γ = x Figure 7.5. Log-Pearso type 3 probability desity fuctios for differet values of coefficiet of skewess γ. (7.88) For 0 β, the variace is ifiite. These expressios are seldom used, but they do reveal the character of the distributio. Figures 7.6 ad 7.7 provide plots of the real-space coefficiet of skewess ad coefficiet of variatio of a log-pearso type 3 variate X as a fuctio of the stadard deviatio σ Y ad coefficiet of skew γ Y of the log-trasformatio Y l(x). Thus the stadard deviatio σ Y ad skew γ Y of Y are i log space. For γ Y 0, the log-pearso type 3 distributio reduces to the two-parameter logormal distributio discussed above, because i this case Y has a ormal distributio. For the logormal distributio, the stadard deviatio σ Y serves as the sole shape parameter, ad the coefficiet of variatio of X for small σ Y is just σ Y. Figure 7.7 shows that the situatio is more α
23 190 Water Resources Systems Plaig ad Maagemet real-space coefficiet of skewess γ X log-space stadard deviatio σ Y real-space coefficiet of variatio CV X log-space stadard deviatio σ Y E0057h log-space coefficiet of skewess γ Y E0057j log-space coefficiet of skewess γ Y Figure 7.6. Real-space coefficiet of skewess γ X for LP3 distributed X as a fuctio of log-space stadard deviatio σ Y ad coefficiet of skewess γ Y where Y l(x). Figure 7.7. Real-space coefficiet of variatio CV X for LP3 distributed X as a fuctio of log-space stadard deviatio σ Y ad coefficiet of skewess γ Y where Y l(x). complicated for the LP3 distributio. However, for small σ Y, the coefficiet of variatio of X is approximately σ Y. Agai, the flood flow data i Table 7. ca be used to illustrate parameter estimatio. Usig atural logarithms, oe ca estimate the log-space momets with the stadard estimators i Equatios 7.39 that yield: µˆ 7.0 σˆ γˆ For the LP3 distributio, aalysis geerally focuses o the distributio of the logarithms Y l(x) of the flows, which would have a Pearso type 3 distributio with momets µ Y, σ Y ad γ Y (IACWD, 198; Bobée ad Ashkar, 1991). As a result, flood quatiles are calculated as x p exp{µ Y σ Y K p [γ Y ]} (7.89) where K p [γ Y ] is a frequecy factor correspodig to cumulative probability p for skewess coefficiet γ Y. (K p [γ Y ] correspods to the quatiles of a three-parameter gamma distributio with zero mea, uit variace, ad skewess coefficiet γ Y.) Sice 1967 the recommeded procedure for flood frequecy aalysis by federal agecies i the Uited States has used this distributio. Curret guidelies i Bulleti 17B (IACWD, 198) suggest that the skew γ Y be estimated by a weighted average of the at-site sample skewess coefficiet ad a regioal estimate of the skewess coefficiet. Bulleti 17B also icludes tables of frequecy factors, a map of regioal skewess estimators, checks for low outliers, cofidece iterval formula, a discussio of expected probability ad a weightedmomets estimator for historical data Gumbel ad GEV Distributios The aual maximum flood is the largest flood flow durig a year. Oe might expect that the distributio of aual maximum flood flows would belog to the set of extreme value distributios (Gumbel, 1958; Kottegoda ad Rosso, 1997). These are the distributios obtaied i the limit, as the sample size becomes large, by takig the largest of idepedet radom variables. The Extreme Value (EV) type I distributio, or Gumbel distributio, has ofte bee used to describe flood flows. It has the cumulative distributio fuctio: F X (x) exp{ exp[ (x ξ)/α]} (7.90) where ξ is the locatio parameter. It has a mea ad variace of µ X ξ 0.577α σ X π α / α (7.91)
24 Cocepts i Probability, Statistics ad Stochastic Modellig 191 f(x) f(x) κ = -0.3 κ = -0.1 κ = x κ κ κ = -0.3 = -0.1 = 0.1 E0057m E0057k x Figure 7.9. Right-had tails of GEV distributios show i Figure 7.8. Figure 7.8. GEV desity distributios for selected shape parameter κ values. Its skewess coefficiet has a fixed value equal to γ X The geeralized extreme value (GEV) distributio is a geeral mathematical expressio that icorporates the type I, II, ad III extreme value (EV) distributios for maxima (Gumbel, 1958; Hoskig et al., 1985). I recet years it has bee used as a geeral model of extreme evets icludig flood flows, particularly i the cotext of regioalizatio procedures (NERC, 1975; Stediger ad Lu, 1995; Hoskig ad Wallis, 1997). The GEV distributio has the cumulative distributio fuctio: F X (x) exp{ [1 κ (x ξ)/α] 1/κ } for κ 0 (7.9) From Equatio 7.9, it is clear that for κ 0 (the typical case for floods), x must exceed ξ α/κ, whereas for κ 0, x must be o greater tha ξ α/κ (Hoskig ad Wallis, 1987). The mea, variace, ad skewess coefficiet are (for κ 1/3): µ X ξ (α/κ) [1 Γ(1 κ)], σ X (α/κ) {Γ(1 κ) [Γ(1 κ)] } (7.93) γ X (Sig κ){ Γ(1 3κ) 3Γ(1 κ) Γ(1 κ) [Γ(1 κ)] 3 }/{Γ(1 κ) [Γ(1 κ)] } 3/ where Γ(1 κ) is the classical gamma fuctio. The Gumbel distributio is obtaied whe κ 0. For κ 0.3, the geeral shape of the GEV distributio is similar to the Gumbel distributio, though the right-had tail is thicker for κ 0, ad thier for κ 0, as show i Figures 7.8 ad 7.9. The parameters of the GEV distributio are easily computed usig L-momets ad the relatioships (Hoskig et al. (1985): κ c.9554c α κλ /[Γ(1 κ)(1 κ )] (7.94) ξ λ 1 (α/κ)[γ(1 κ) 1] where c λ /(λ 3 3λ ) l()/l(3) [/(τ 3 3)] l()/l(3) As oe ca see, the estimator of the shape parameter κ will deped oly upo the L-skewess estimator τˆ3. The estimator of the scale parameter α will the deped o the estimate of κ ad of λ. Fially, oe must also use the sample mea λ 1 (Equatio 7.48) to determie the estimate of the locatio parameter ξ. Usig the flood data i Table 7. ad the sample L-momets computed i Sectio, oe obtais first c which yields κˆ 07036, ξˆ 1,165.0 ad αˆ The small value of the fitted κ parameter meas that the fitted distributio is essetially a Gumbel distributio. Agai, ξ is a locatio parameter, ot a lower boud, so its value resembles a reasoable x value. Madse et al. (1997a) show that momet estimators ca provide more precise quatile estimators. Martis ad
25 19 Water Resources Systems Plaig ad Maagemet Stediger (001b) foud that with occasioal uiformative samples, the MLE estimator of κ could be etirely urealistic resultig i absurd quatile estimators. However the use of a realistic prior distributio o κ yielded geeralized maximum likelihood estimators (GLME) that performed better tha momet ad L-momet estimators over the rage of κ of iterest. The geeralized maximum likelihood estimators (GMLE) are obtaied by maximizig the log-likelihood fuctio, augmeted by a prior desity fuctio o κ. A prior distributio that reflects geeral world-wide geophysical experiece ad physical realism is i the form of a beta distributio: π(κ) Γ(p) Γ(q) (0.5 κ) p 1 (0.5 κ) q 1 /Γ(p q) (7.95) for 0.5 κ 0.5 with p 6 ad q 9. Moreover, this prior assigs reasoable probabilities to the values of κ withi that rage. For κ outside the rage 0.4 to 0. the resultig GEV distributios do ot have desity fuctios cosistet with flood flows ad raifall (Martis ad Stediger, 000). Other estimators implicitly have similar costraits. For example, L-momets restricts κ to the rage κ 1, ad the method of momets estimator employs the sample stadard deviatio so that κ 0.5. Use of the sample skew itroduces the costrait that κ 0.3. The give a set of idepedet observatios {x 1,, x } draw for a GEV distributio, the geeralized likelihood fuctio is: l{ L( ξακ,, x1,, x)} l( α) κ l( yi) ( yi) κ l[ π( κ )] i i 1 with y [ 1 ( κ/ α)( x ξ)] i (7.96) For feasible values of the parameters, y i is greater tha 0 (Hoskig et al., 1985). Numerical optimizatio of the geeralized likelihood fuctio is ofte aided by the additioal costrait that mi{y 1,,y } ε for some small ε 0 so as to prohibit the search geeratig ifeasible values of the parameters for which the likelihood fuctio is udefied. The costrait should ot be bidig at the fial solutio. The data i Table 7. agai provide a coveiet data set for illustratig parameter estimators. The L-momet estimators were used to geerate a iitial solutio. Numerical optimizatio of the likelihood fuctio i Equatio 7.96 yielded the maximum likelihood estimators of the GEV parameters: κˆ 359, ξˆ ad αˆ 60.. Similarly, use of the geophysical prior (Equatio 7.95) yielded the geeralized maximum likelihood estimators κˆ 83, ξˆ ad αˆ Here the record legth of forty years is too short to defie reliably the shape parameter κ so that result of usig the prior is to pull κ slightly toward the mea of the prior. The other two parameters adjust accordigly L-Momet Diagrams Sectio 3 preseted several families of distributios. The L-momet diagram i Figure 7.10 illustrates the relatioships betwee the L-kurtosis (τ 4 ) ad L-skewess (τ 3 ) for a umber of distributios ofte used i hydrology. It shows that distributios with the same coefficiet of skewess still differ i the thickess of their tails. This thickess is described by their kurtosis. Tail shapes are importat if a aalysis is sesitive to the likelihood of extreme evets. The ormal ad Gumbel distributios have a fixed shape ad thus are preseted by sigle poits that fall o the Pearso type 3 (P3) curve for γ 0, ad the geeralized extreme value (GEV) curve for κ 0, respectively. The L-kurtosis/L-skewess relatioships for the twoparameter ad three-parameter gamma or P3 distributios are idetical, as they are for the two-parameter ad three-parameter logormal distributios. This is because the additio of a locatio parameter does ot chage the rage of fudametal shapes that ca be geerated. However, for the same skewess coefficiet, the logormal distributio has a larger kurtosis tha the gamma or P3 distributio, ad thus assigs larger probabilities to the largest evets. As the skewess of the logormal ad gamma distributios approaches zero, both distributios become ormal ad their kurtosis/skewess relatioships merge. For the same L-skewess, the L-kurtosis of the GEV distributio is geerally larger tha that of the logormal distributio. For positive κ yieldig almost symmetric or eve egatively skewed GEV distributios, the GEV has a smaller kurtosis tha the three-parameter logormal distributio.
26 Cocepts i Probability, Statistics ad Stochastic Modellig 193 E0057b l-kurtosis Normal P3 LN Gumbel GEV Pareto l-skewess Figure Relatioships betwee L-skewess ad L-kurtosis for various distributios. The latter ca be egatively skewed whe the log ormal locatio parameter τ is used as a upper boud. Figure 7.10 also icludes the three-parameter geeralized Pareto distributio, whose cdf is: F X (x) 1 [1 κ(x ξ)/α] 1/κ (7.97) (Hoskig ad Wallis, 1997). For κ 0 it correspods to the expoetial distributio (gamma with α 1). This poit is where the Pareto ad P3 distributio L-kurtosis/L-skewess lies cross. The Pareto distributio becomes icreasig more skewed for κ 0, which is the rage of iterest i hydrology. The geeralized Pareto distributio with κ 0 is ofte used to describe peaksover-a-threshold ad other variables whose probability desity fuctio has its maximum at their lower boud. I that rage for a give L-skewess, the Pareto distributio always has a larger kurtosis tha the gamma distributio. I these cases the α parameter for the gamma distributio would eed to be i the rage 0 α 1, so that both distributios would be J-shaped. As show i Figure 7.10, the GEV distributio has a thicker right-had tail tha either the gamma/pearso type 3 distributio or the logormal distributio. 4. Aalysis of Cesored Data There are may istaces i water resources plaig where oe ecouters cesored data. A data set is cesored if the values of observatios that are outside a specified rage of values are ot specifically reported (David, 1981). For example, i water quality ivestigatios may costituets have cocetratios that are reported as T, where T is a reliable detectio threshold (MacBerthouex ad Brow, 00). Thus the cocetratio of the water quality variable of iterest was too small to be reliably measured. Likewise, low-flow observatios ad raifall depths ca be rouded to or reported as zero. Several approaches are available for aalysis of cesored data sets, icludig probability plots ad probability-plot regressio, coditioal probability models ad maximum likelihood estimators (Haas ad Scheff, 1990; Helsel, 1990; Kroll ad Stediger, 1996; MacBerthouex ad Brow, 00). Historical ad physical paleoflood data provide aother example of cesored data. Before the begiig of a cotiuous measuremet program o a stream or river, the stages of uusually large floods ca be estimated o the basis of the memories of people who have experieced these evets ad/or physical markigs i the watershed (Stediger ad Baker, 1987). Aual maximum floods that were ot uusual were ot recorded or were they remembered. These missig data are cesored data. They cover periods betwee occasioal large floods that have bee recorded or that have left some evidece of their occurrece (Stediger ad Coh, 1986). The discussio below addresses probability-plot methods for use with cesored data. Probability-plot methods have a log history of use for this purpose because they are relatively simple to use ad to uderstad. Moreover, recet research has show that they are relatively efficiet whe the majority of values are observed, ad uobserved values are kow oly to be below (or above) some detectio limit or perceptio threshold that serves as a lower (or upper) boud. I such cases, probability-plot regressio estimators of momets ad quatiles are as accurate as maximum likelihood estimators. They are almost as good as estimators computed with complete samples (Helsel ad Coh, 1988; Kroll ad Stediger, 1996). Perhaps the simplest method for dealig with cesored data is adoptio of a coditioal probability model. Such models implicitly assume that the data are draw from oe of two classes of observatios: those below a sigle threshold, ad those above the threshold. This model is
27 194 Water Resources Systems Plaig ad Maagemet appropriate for simple cases where cesorig occurs because small observatios are recorded as zero, as ofte happes with low-flow, low pollutat cocetratio, ad some flood records. The coditioal probability model itroduces a extra parameter P 0 to describe the probability that a observatio is zero. If r of a total of observatios were observed because they exceeded the threshold, the P 0 is estimated as ( r)/. A cotiuous distributio G X (x) is derived for the strictly positive o-zero values of X. The the parameters of the G distributio ca be estimated usig ay procedure appropriate for complete ucesored samples. The ucoditioal cdf for ay value x 0, is the F X (x) P 0 ( 1 P 0 ) G(x) (7.98) This model completely decouples the value of P 0 from the parameters that describe the G distributio. Sectio 7. discusses probability plots ad plottig positios useful for graphical displays of data to allow a visual examiatio of the empirical frequecy curve. Suppose that amog samples a detectio limit is exceeded by the observatios r times. The atural estimator of the exceedace probability P 0 of the perceptio threshold is agai ( r)/. If the r values that exceeded the threshold are idexed by i 1,, r, wherei x (r) is the largest observatio, reasoable plottig positios withi the iterval [P 0, 1] are: p i P 0 (1 P 0 ) [(i a)/(r 1 a)] (7.99) where a defies the plottig positio that is used; a 0 is reasoable (Hirsch ad Stediger, 1987). Helsel ad Coh (1988) show that reasoable choices for a geerally make little differece. Both papers discuss developmet of plottig positios whe there are differet thresholds, as occurs whe the aalytical precisio of istrumetatio chages over time. If there are may exceedaces of the threshold so that r (1 a), p i is idistiguishable from p i [i ( r) a]/( 1 a) (7.100) where agai, i 1,, r. These values correspod to the plottig positios that would be assiged to the largest r observatios i a complete sample of values. The idea behid the probability-plot regressio estimators is to use the probability plot for the observed data to defie the parameters of the whole distributio. Ad if a sample mea, sample variace or quatiles are eeded, the the distributio defied by the probability plot is used to fill i the missig (cesored) observatios so that stadard estimators of the mea, stadard deviatio ad of quatiles ca be employed. Such fill-i procedures are efficiet ad relatively robust for fittig a distributio ad estimatig various statistics with cesored water quality data whe a modest umber of the smallest observatios are cesored (Helsel, 1990; Kroll ad Stediger, 1996). Ulike the coditioal probability approach, here the below threshold probability P 0 is liked with the selected probability distributio for the above-threshold observatios. The observatios below the threshold are cesored but are i all other respects evisioed as comig from the same distributio that is used to describe the observed above-threshold values. Whe water quality data are well described by a logormal distributio, available values l[x (1) ] l[x (r) ] ca be regressed upo F 1 [p i ] µ σf 1 [p i ] for i 1,, r, where the r largest observatios i a sample of size are available. If regressio yields costat m ad slope s correspodig to the first ad secod populatio momets µ ad σ, a good estimator of the pth quatile is x p exp[m sz p ] (7.101) wherei z p is the p th quatile of the stadard ormal distributio. To estimate sample meas ad other statistics oe ca fill i the missig observatios with x(j) exp{y(j)} for j 1,, ( r) (7.10) where y(j) m sf 1 {P 0 [(j a)/( r 1 a)]} (7.103) Oce a complete sample is costructed, stadard estimators of the sample mea ad variace ca be calculated, as ca medias ad rages. By fillig i the missig small observatios, ad the usig complete-sample estimators of statistics of iterest, the procedure is relatively isesitive to the assumptio that the observatios actually have a logormal distributio. Maximum likelihood estimators are quite flexible, ad are more efficiet tha plottig-positio methods whe the values of the observatios are ot recorded because they are below or above the perceptio threshold (Kroll ad Stediger, 1996). Maximum likelihood methods
28 Cocepts i Probability, Statistics ad Stochastic Modellig 195 allow the observatios to be represeted by exact values, rages ad various thresholds that either were or were ot exceeded at various times. This ca be particularly importat with historical flood data sets because the magitudes of may historical floods are ot recorded precisely, ad it may be kow that a threshold was ever crossed or was crossed at most oce or twice i a log period (Stediger ad Coh, 1986; Stediger, 000; O Coell et al., 00). Ufortuately, maximum likelihood estimators for the LP3 distributio have prove to be problematic. However, recetly developed expected momet estimators seem to do as well as maximum likelihood estimators with the LP3 distributio (Coh et al., 1997, 001; Griffs et al., 004). While ofte a computatioal challege, maximum likelihood estimators for complete samples, ad samples with some observatios cesored, pose o coceptual challege. Oe eed oly write the maximum likelihood fuctio for the data ad proceed to seek the parameter values that maximize that fuctio. Thus if F(x θ) ad f(x θ) are the cumulative distributio ad probability desity fuctios that should describe the data, ad θ is the vector of parameters of the distributio, the for the case described above wherei x 1,, x r are r of observatios that exceeded a threshold T, the likelihood fuctio would be (Stediger ad Coh, 1986): L(θ r,, x 1,, x r ) F(T θ) ( r) f(x 1 θ)f(x θ) f(x r θ) (7.104) Here, ( r) observatios were below the threshold T, ad the probability a observatio is below T is F(T θ) which the appears i Equatio to represet that observatio. I additio, the specific values of the r observatios x 1,, x r are available, where the probability a observatio is i a small iterval of width δ aroud x i is δ f(x i θ). Thus strictly speakig the likelihood fuctio also icludes a term δ r. Here what is kow of the magitude of all of the observatios is icluded i the likelihood fuctio i the appropriate way. If all that were kow of some observatio was that it exceeded a threshold M, the that value should be represeted by a term [1 F(M θ)] i the likelihood fuctio. Similarly, if all that were kow was that the value was betwee L ad M, the a term [F(M θ) F(L θ)] should be icluded i the likelihood fuctio. Differet thresholds ca be used to describe differet observatios correspodig to chages i the quality of measuremet procedures. Numerical methods ca be used to idetify the parameter vector that maximizes the likelihood fuctio for the data available. 5. Regioalizatio ad Idex-Flood Method Research has demostrated the potetial advatages of idex flood procedures (Lettemaier et al., 1987; Stediger ad Lu, 1995; Hoskig ad Wallis, 1997; Madse, ad Rosbjerg, 1997a). The idea behid the idex-flood approach is to use the data from may hydrologically similar basis to estimate a dimesioless flood distributio (Wallis, 1980). Thus this method substitutes space for time by usig regioal iformatio to compesate for havig relatively short records at each site. The cocept uderlyig the idex-flood method is that the distributios of floods at differet sites i a regio are the same except for a scale or idex-flood parameter that reflects the size, raifall ad ruoff characteristics of each watershed. Research is revealig whe this assumptio may be reasoable. Ofte a more sophisticated multi-scalig model is appropriate (Gupta ad Dawdy, 1995a; Robiso ad Sivapala, 1997). Geerally the mea is employed as the idex flood. The problem of estimatig the p th quatile x p is the reduced to estimatio of the mea, µ x, for a site ad the ratio x p /µ x of the p th quatile to the mea. The mea ca ofte be estimated adequately with the record available at a site, eve if that record is short. The idicated ratio is estimated usig regioal iformatio. The British Flood Studies Report (NERC, 1975) calls these ormalized flood distributios growth curves. Key to the success of the idex-flood approach is idetificatio of sets of basis that have similar coefficiets of variatio ad skew. Basis ca be grouped geographically, as well as by physiographic characteristics icludig draiage area ad elevatio. Regios eed ot be geographically cotiguous. Each site ca potetially be assiged its ow uique regio cosistig of sites with which it is particularly similar (Zriji ad Bur, 1994), or regioal regressio equatios ca be derived to compute ormalized regioal quatiles as a fuctio of a site s physiographic characteristics ad other statistics (Fill ad Stediger, 1998).
29 196 Water Resources Systems Plaig ad Maagemet Clearly the ext step for regioalizatio procedures, such as the idex-flood method, is to move away from estimates of regioal parameters that do ot deped upo basi size ad other physiographic parameters. Gupta et al. (1994) argue that the basic premise of the idex-flood method that the coefficiet of variatio of floods is relatively costat is icosistet with the kow relatioships betwee the coefficiet of variatio (CV) ad draiage area (see also Robiso ad Sivapala, 1997). Recetly, Fill ad Stediger (1998) built such a relatioship ito a idex-flood procedure by usig a regressio model to explai variatios i the ormalized quatiles. Tasker ad Stediger (1986) illustrated how oe might relate log-space skew to physiographic basi characteristics (see also Gupta ad Dawdy, 1995b). Madse ad Rosbjerg (1997b) did the same for a regioal model of κ for the GEV distributio. I both studies, oly a biary variable represetig regio was foud useful i explaiig variatios i these two shape parameters. Oce a regioal model of alterative shape parameters is derived, there may be some advatage to combiig such regioal estimators with at-site estimators employig a empirical Bayesia framework or some other weightig schemes. For example, Bulleti 17B recommeds weigh at-site ad regioal skewess estimators, but almost certaily places too much weight o the at-site values (Tasker ad Stediger, 1986). Examples of empirical Bayesia procedures are provided by Kuczera (198), Madse ad Rosbjerg (1997b) ad Fill ad Stediger (1998). Madse ad Rosbjerg s (1997b) computatio of a κ-model with a New Zealad data set demostrates how importat it ca be to do the regioal aalysis carefully, takig ito accout the cross-correlatio amog cocurret flood records. Whe oe has relatively few data at a site, the idexflood method is a effective strategy for derivig flood frequecy estimates. However, as the legth of the available record icreases, it becomes icreasigly advatageous to also use the at-site data to estimate the coefficiet of variatio as well. Stediger ad Lu (1995) foud that the L-momet/GEV idex-flood method did quite well for humid regios (CV 0.5) whe 5, ad for semi-arid regios (CV 1.0) for 60, if reasoable care is take i selectig the statios to be icluded i a regioal aalysis. However, with loger records, it became advatageous to use the at-site mea ad L-CV with a regioal estimator of the shape parameter for a GEV distributio. I may cases this would be roughly equivalet to fittig a Gumbel distributio correspodig to a shape parameter κ 0. Gabriele ad Arell (1991) develop the idea of havig regios of differet sizes for differet parameters. For realistic hydrological regios, these ad other studies illustrate the value of regioalizig estimators of the shape, ad ofte the coefficiet of variatio of a distributio. 6. Partial Duratio Series Two geeral approaches are available for modellig flood ad precipitatio series (Lagbei, 1949). A aual maximum series cosiders oly the largest evet i each year. A partial duratio series (PDS) or peaksover-threshold (POT) approach icludes all idepedet peaks above a trucatio or threshold level. A objectio to usig aual maximum series is that it employs oly the largest evet i each year, regardless of whether the secod-largest evet i a year exceeds the largest evets of other years. Moreover, the largest aual flood flow i a dry year i some arid or semi-arid regios may be zero, or so small that callig them floods is misleadig. Whe cosiderig raifall series or pollutat discharge evets, oe may be iterested i modellig all evets that occur withi a year that exceed some threshold of iterest. Use of a partial duratio series framework avoids such problems by cosiderig all idepedet peaks that exceed a specified threshold. Furthermore, oe ca estimate aual exceedace probabilities from the aalysis of partial duratio series. Argumets i favour of partial duratio series are that relatively log ad reliable records are ofte available, ad if the arrival rate for peaks over the threshold is large eough (1.65 evets/year for the Poisso-arrival with expoetial-exceedace model), partial duratio series aalyses should yield more accurate estimates of extreme quatiles tha the correspodig aual-maximum frequecy aalyses (NERC, 1975; Rosbjerg, 1985). However, whe fittig a three-parameter distributio, there seems to be little advatage from the use of a partial duratio series approach over a aual maximum approach. This is true eve whe the partial duratio series icludes may more peaks tha the maximum series because both cotai the same largest evets (Martis ad Stediger, 001a).
30 Cocepts i Probability, Statistics ad Stochastic Modellig 197 A drawback of partial duratio series aalyses is that oe must have criteria to idetify oly idepedet peaks (ad ot multiple peaks correspodig to the same evet). Thus, such aalysis ca be more complicated tha aalyses usig aual maxima. Partial duratio models, perhaps with parameters that vary by seaso, are ofte used to estimate expected damages from hydrological evets whe more tha oe damage-causig evet ca occur i a seaso or withi a year (North, 1980). A model of a partial duratio series has at least two compoets: first, oe must model the arrival rate of evets larger tha the threshold level; secod, oe must model the magitudes of those evets. For example, a Poisso distributio has ofte bee used to model the arrival of evets, ad a expoetial distributio to describe the magitudes of peaks that exceed the threshold. There are several geeral relatioships betwee the probability distributio for aual maximum ad the frequecy of evets i a partial duratio series. For a partial duratio series model, let λ be the average arrival rate of flood peaks greater tha the threshold x 0 ad let G(x) be the probability that flood peaks, whe they occur, are less tha x x 0, ad thus those peaks fall i the rage [x 0, x]. The aual exceedace probability for a flood, deoted 1/T a, correspodig to a aual retur period T a, is related to the correspodig exceedace probability q e [1 G(x)] for level x i the partial duratio series by 1/T a 1 exp{ λq e } 1 exp{ 1/T p } (7.105) where T p 1/(λq e ) is the average retur period for level x i the partial duratio series. May differet choices for G(x) may be reasoable. I particular, the geeralized Pareto distributio (GPD) is a simple distributio useful for describig floods that exceed a specified lower boud. The cumulative distributio fuctio for the geeralized three-parameter Pareto distributio is: F X (x) 1 [1 κ(x ξ)/α] 1/κ (7.106) with mea ad variace µ X ξ α/(1 κ)κ σ X α /[(1 κ) (1 κ)] (7.107) where for κ 0, ξ x, whereas for κ 0, ξ x ξ α/κ (Hoskig ad Wallis, 1987). A special case of the GPD is the two-parameter expoetial distributio with κ 0. Method of momet estimators work relatively well (Rosbjerg et al., 199). Use of a geeralized Pareto distributio for G(x) with a Poisso arrival model yields a GEV distributio for the aual maximum series greater tha x 0 (Smith, 1984; Stediger et al., 1993; Madse et al., 1997a). The Poisso- Pareto ad Poisso-GPD models provide very reasoable descriptios of flood risk (Rosbjerg et al., 199). They have the advatage that they focus o the distributio of the larger flood evets, ad regioal estimates of the GEV distributio s shape parameter κ from aual maximum ad partial duratio series aalyses ca be used iterchageably. Madse ad Rosbjerg (1997a) use a Poisso-GPD model as the basis of a partial duratio series idex-flood procedure. Madse et al. (1997b) show that the estimators are fairly efficiet. They pooled iformatio from may sites to estimate the sigle shape parameter κ ad the arrival rate where the threshold was a specified percetile of the daily flow duratio curve at each site. The at-site iformatio was used to estimate the mea above-threshold flood. Alteratively, oe could use the at-site data to estimate the arrival rate as well. 7. Stochastic Processes ad Time Series May importat radom variables i water resources are fuctios whose values chage with time. Historical records of raifall or streamflow at a particular site are a sequece of observatios called a time series. I a time series, the observatios are ordered by time, ad it is geerally the case that the observed value of the radom variable at oe time iflueces oe s assessmet of the distributio of the radom variable at later times. This meas that the observatios are ot idepedet. Time series are coceptualized as beig a sigle observatio of a stochastic process, which is a geeralizatio of the cocept of a radom variable. This sectio has three parts. The first presets the cocept of statioarity ad the basic statistics geerally used to describe the properties of a statioary stochastic process. The secod presets the defiitio of a Markov process ad the Markov chai model. Markov chais are
31 198 Water Resources Systems Plaig ad Maagemet a coveiet model for describig may pheomea, ad are ofte used i sythetic flow ad raifall geeratio ad optimizatio models. The third part discusses the samplig properties of statistics used to describe the characteristics of may time series Describig Stochastic Processes A radom variable whose value chages through time accordig to probabilistic laws is called a stochastic process. A observed time series is cosidered to be oe realizatio of a stochastic process, just as a sigle observatio of a radom variable is oe possible value the radom variable may assume. I the developmet here, a stochastic process is a sequece of radom variables {X(t)} ordered by a discrete time variable t 1,, 3, The properties of a stochastic process must geerally be determied from a sigle time series or realizatio. To do this, several assumptios are usually made. First, oe geerally assumes that the process is statioary. This meas that the probability distributio of the process is ot chagig over time. I additio, if a process is strictly statioary, the joit distributio of the radom variables X(t 1 ),,X(t ) is idetical to the joit distributio of X(t 1 t),,x(t t) for ay t; the joit distributio depeds oly o the differeces t i t j betwee the times of occurrece of the evets. For a statioary stochastic process, oe ca write the mea ad variace as µ X E[X(t)] (7.108) ad σ Var[X(t)] (7.109) Both are idepedet of time t. The autocorrelatios, the correlatio of X with itself, are give by ρ X (k) Cov[X(t), X(t k)]/σ X (7.110) for ay positive iteger time lag k. These are the statistics most ofte used to describe statioary stochastic processes. Whe oe has available oly a sigle time series, it is ecessary to estimate the values of µ X, σ X, ad ρ X (k) from values of the radom variable that oe has observed. The mea ad variace are geerally estimated essetially as they were i Equatio µˆx X X t (7.111) T σ X T 1 ( Xt X) T t 1 (7.11) while the autocorrelatios ρ X (k) for ay time lag k ca be estimated as ( Jekis ad Watts, 1968) ρˆx(k) r k T t=1 T k t 1 ( x x)( x x) t k t T t 1 ( x x) t (7.113) The samplig distributio of these estimators depeds o the correlatio structure of the stochastic process givig rise to the time series. I particular, whe the observatios are positively correlated, as is usually the case i atural streamflows or aual beefits i a river basi simulatio, the variaces of the estimated x _ ad σˆ X are larger tha would be the case if the observatios were idepedet. It is sometimes wise to take this iflatio ito accout. Sectio 7.3 discusses the samplig distributio of these statistics. All of this aalysis depeds o the assumptio of statioarity, for oly the do the quatities defied i Equatios to have the iteded meaig. Stochastic processes are ot always statioary. Agricultural ad urba developmet, deforestatio, climatic variability ad chages i regioal resource maagemet ca alter the distributio of raifall, streamflows, pollutat cocetratios, sedimet loads ad groudwater levels over time. If a stochastic process is ot essetially statioary over the time spa i questio, the statistical techiques that rely o the statioary assumptio do ot apply ad the problem geerally becomes much more difficult. 7.. Markov Processes ad Markov Chais A commo assumptio i may stochastic water resources models is that the stochastic process X(t) is a Markov process. A first-order Markov process has the property that the depedece of future values of the process o past values depeds oly o the curret value ad ot o previous values or observatios. I symbols for k 0,
32 Cocepts i Probability, Statistics ad Stochastic Modellig 199 F X [X(t k) X(t), X(t 1), X(t ), ] F X [X(t k) X(t)] (7.114) For Markov processes, the curret value summarizes the state of the processes. As a cosequece, the curret value of the process is ofte referred to as the state. This makes physical sese as well whe oe refers to the state or level of a aquifer or reservoir. A special kid of Markov process is oe whose state X(t) ca take o oly discrete values. Such a process is called a Markov chai. Ofte i water resources plaig, cotiuous stochastic processes are approximated by Markov chais. This is doe to facilitate the costructio of simple stochastic models. This sectio presets the basic otatio ad properties of Markov chais. Cosider a stream whose aual flow is to be represeted by a discrete radom variable. Assume that the distributio of streamflows is statioary. I the followig developmet, the cotiuous radom variable represetig the aual streamflows (or some other process) is approximated by a radom variable Q y i year y, which takes o oly discrete values q i (each value represetig a cotiuous rage or iterval of possible streamflows) with ucoditioal probabilities p i where p i i 1 (7.115) It is frequetly the case that the value of Q y 1 is ot idepedet of Q y. A Markov chai ca model such depedece. This requires specificatio of the trasitio probabilities p ij, p ij Pr[Q y 1 q j Q y q i ] (7.116) A trasitio probability is the coditioal probability that the ext state is q j, give that the curret state is q i. The trasitio probabilities must satisfy j 1 p 1 ij 1 for all i (7.117) Figure 7.11 shows a possible set of trasitio probabilities i a matrix. Each elemet p ij i the matrix is the probability of a trasitio from streamflow q i i oe year to streamflow q j i the ext. I this example, a low flow teds to be followed by a low flow, rather tha a high flow, ad vice versa. P 1 j P 3 j P j P 4 j Figure Matrix (above) ad histograms (below) of streamflow trasitio probabilities showig probability of streamflow q j (represeted by idex j) i year y 1 give streamflow q i (represeted by idex i) i year y E0057q j j
33 00 Water Resources Systems Plaig ad Maagemet Let P be the trasitio matrix whose elemets are p ij. For a Markov chai, the trasitio matrix cotais all the iformatio ecessary to describe the behaviour of the process. Let p i y be the probability that the streamflow Q y is q i (i state i) i year y. The the probability that Q y 1 q j is the sum of the products of the probabilities p i y that Q y q i times the probability p ij that the ext state Q y 1 is q j give that Q y q i. I symbols, this relatioship is writte: y 1 y y pj p p j p p p p p p j y j i y ij i 1 (7.118) Lettig p y be the row vector of state residet probabilities p i y,, p y, this relatioship may be writte p (y 1) p (y) P (7.119) year P y P y P y P y y y + 1 y + y + 3 y + 4 y + 5 y + 6 y + 7 y + 8 y Table 7.8. Successive streamflow probabilities based o trasitio probabilities i Figure E01101h To calculate the probabilities of each streamflow state i year y, oe ca use p (y 1) i Equatio to obtai p (y ) p (y 1) P or p (y ) p y P Cotiuig i this maer, it is possible to compute the probabilities of each possible streamflow state for years y 1, y, y 3,, y k, as p (y k) p y P k (7.10) Returig to the four-state example i Figure 7.11, assume that the flow i year y is i the iterval represeted by q. Hece i year y the ucoditioal streamflow probabilities p i y are (0, 1, 0, 0). Kowig each p y i, the probabilities p y 1 j correspodig to each of the four streamflow states ca be determied. From Figure 7.11, the probabilities p y 1 j are 0., 0.4, 0.3 ad 0.1 for j 1,, 3 ad 4, respectively. The probability vectors for ie future years are listed i Table 7.8. As time progresses, the probabilities geerally reach limitig values. These are the ucoditioal or steady-state probabilities. The quatity p i has bee defied as the ucoditioal probability of q i. These are the steady-state probabilities which p (y k) approaches for large k. It is clear from Table 7.8 that as k becomes larger, Equatio becomes p p p j i ij i 1 (7.11) or i vector otatio, Equatio becomes p pp (7.1) where p is the row vector of ucoditioal probabilities (p 1,, p ). For the example i Table 7.8, the probability vector p equals (0.156, 0.309, 0.316, 0.19). The steady-state probabilities for ay Markov chai ca be foud by solvig simultaeous Equatio 7.1 for all but oe of the states j together with the costrait p i i 1 1 (7.13) Aual streamflows are seldom as highly correlated as the flows i this example. However, mothly, weekly ad especially daily streamflows geerally have high serial correlatios. Assumig that the ucoditioal steady-state probability distributios for mothly streamflows are statioary, a Markov chai ca be defied for each moth s streamflow. Sice there are twelve moths i a year, there would be twelve trasitio matrices, the elemets of which could be deoted as p ij t. Each defies the probability of a streamflow q j t 1 i moth t 1, give a streamflow q it i moth t. The steady-state statioary probability vectors for each moth ca be foud by the procedure outlied above, except that ow all twelve matrices are used to calculate all twelve steady-state probability vectors. However, oce the steady-state vector p is foud for oe moth, the others are easily computed usig Equatio 7.10 with t replacig y.
34 Cocepts i Probability, Statistics ad Stochastic Modellig Properties of Time-Series Statistics The statistics most frequetly used to describe the distributio of a cotiuous-state statioary stochastic process are the sample mea, variace ad various autocorrelatios. Statistical depedece amog the observatios, as is frequetly foud i time series, ca have a marked effect o the distributio of these statistics. This part of Sectio 7 reviews the samplig properties of these statistics whe the observatios are a realizatio of a stochastic process. The sample mea 1 X X i i 1 (7.14) whe viewed as a radom variable is a ubiased estimate of the mea of the process µ X, because 1 E[ X] E[ X i ] µ i 1 X (7.15) However, correlatio amog the X i s, so that ρ X (k) 0 for k 0, affects the variace of the estimated mea X _. Var( X) E[( X µ X)] 1 E ( Xt µ X)( Xs µ X) t 1 s 1 σ X 1 k 1 1 X k k 1 ρ ( ) (7.16) The variace of X _, equal to σ x / for idepedet observatios, is iflated by the factor withi the brackets. For ρ X (k) 0, as is ofte the case, this factor is a odecreasig fuctio of, so that the variace of X _ is iflated by a factor whose importace does ot decrease with icreasig sample size. This is a importat observatio, because it meas that the average of a correlated time series will be less precise tha the average of a sequece of idepedet radom variables of the same legth with the same variace. A commo model of stochastic series has ρ X (k) [ρ X (1)] k ρ k (7.17) This correlatio structure arises from the autoregressive Markov model discussed at legth i Sectio 8. For this correlatio structure Var( X) (7.18) Substitutio of the sample estimates for σ X ad ρ X (1) i the equatio above ofte yields a more realistic estimate of the variace of X _ tha does the estimate s X / if the correlatio structure ρ X (k) ρ k is reasoable; otherwise, Equatio 7.16 may be employed. Table 7.9 illustrates the effect of correlatio amog the X t values o the stadard error of their mea, equal to the square root of the variace i Equatio The properties of the estimate of the variace of X, σˆ X sample size _ Table 7.9. Stadard error of X whe σ x 0.5 ad ρ X (k) ρ k. σ X ρ [( 1 ρ) ( 1 ρ )] 1 ( 1 ρ) 1 vx ( Xt X) t (7.19) are also affected by correlatio amog the X t s. Here v rather tha s is used to deote the variace estimator, because is employed i the deomiator rather tha 1. The expected value of v x becomes 1 1 k E[ vx] σx 1 1 ρx k (7.130) ( ) k 1 The bias i v X depeds o terms ivolvig ρ X (1) through ρ X ( 1). Fortuately, the bias i v X decreases with ad is geerally uimportat whe compared to its variace. Correlatio amog the X t s also affects the variace of v X. Assumig that X has a ormal distributio (here the variace of v X depeds o the fourth momet of X), the variace of v X for large is approximately (Kedall ad Stuart, 1966, Sec. 48.1). 4 X Var( vx) σ 1 ρ k 1 correlatio of cosecutive observatios p = p = 0.3 p = 0.6 X ( k) E01101j (7.131)
35 0 Water Resources Systems Plaig ad Maagemet where for ρ X (k) ρ k, Equatio becomes X Var( vx ) 1 σ ρ 1 ρ (7.13) Like the variace of X _, the variace of v X is iflated by a factor whose importace does ot decrease with. This is illustrated by Table 7.10 which gives the stadard deviatio of v X divided by the true variace σ X as a fuctio of ad ρ whe the observatios have a ormal distributio ad ρ X (k) ρ k. This would be the coefficiet of variatio of v X were it ot biased. A fudametal problem of time-series aalyses is the estimatio or descriptio of the relatioship amog the radom variable values at differet times. The statistics used to describe this relatioship are the autocorrelatios. Several estimates of the autocorrelatios have bee suggested. A simple ad satisfactory estimate recommeded by Jekis ad Watts (1968) is: ( xt x)( xt k x) ρˆx(k) t 1 rk (7.133) ( xt x) Here r k is the ratio of two sums where the umerator cotais k terms ad the deomiator cotais terms. The estimate r k is biased, but ubiased estimates frequetly have larger mea square errors ( Jekis ad Watts, 1968). A compariso of the bias ad variace of r 1 is provided by the case whe the X t s are idepedet ormal variates. The (Kedall ad Stuart, 1966) 1 E[ r1] ad k 4 t 1 ( 1 Var( r1) ) ( 1) (7.134a) (7.134b) For 5, the expected value of r 1 is 4 rather tha the true value of zero; its stadard deviatio is This results i a mea square error of (E[r 1 ]) Var(r 1 ) Clearly, the variace of r 1 is the domiat term. For X t values that are ot idepedet, exact expressios for the variace of r k geerally are ot available. sample size However, for ormally distributed X t ad large (Kedall ad Stuart, 1966), Var( rk) 1 [ ρx () l ρx( l k) ρx( l k) l 4ρ ( k) ρ ( l) ρ ( k l) ρ ( k) ρ ( l)] (7.135) If ρ X (k) is essetially zero for k q, the the simpler expressio (Box et al., 1994) Var( rk) 1 1 ρ X() l t 1 (7.136) is valid for r k correspodig to k q. Thus for large, Var(r k ) l/ ad values of r k will frequetly be outside the rage of 1.65/, eve though ρ X (k) may be zero. If ρ X (k) ρ k, Equatio reduces to Var( r k ) 1 l l k ( ρ )( ρ ) k ρ l ρ I particular for r l, this gives 1 Var( r1) ( 1 ρ ) correlatio of cosecutive observatios p = p = 0.3 p = X X X q (7.137) (7.138) Approximate values of the stadard deviatio of r l for differet values of ad ρ are give i Table The estimates of r k ad r k j are highly correlated for small j; this causes plots of r k versus k to exhibit slowly varyig cycles whe the true values of ρ X (k) may be zero. This icreases the difficulty of iterpretig the sample autocorrelatios. k X X Table Stadard deviatio of v X /σ X whe observatios have a ormal distributio ad ρ X (k) ρ k. E01101k
36 Cocepts i Probability, Statistics ad Stochastic Modellig 03 sample correlatio of cosecutive observatios p = p = 0.3 p = 0.6 E01101m sythetic streamflow ad other sequeces size simulatio model of river basi system system performace future demads ad ecoomic data system desig ad operatig policy E0057p Table Approximate stadard deviatio of r 1 whe observatios have a ormal distributio ad ρ X (k) ρ k. 8. Sythetic Streamflow Geeratio 8.1. Itroductio This sectio is cocered primarily with ways of geeratig sample data such as streamflows, temperatures ad raifall that are used i water resource systems simulatio studies (e.g., as itroduced i the ext sectio). The models ad techiques discussed i this sectio ca be used to geerate ay umber of quatities used as iputs to simulatio studies. For example Wilks (1998, 00) discusses the geeratio of wet ad dry days, raifall depths o wet days ad associated daily temperatures. The discussio here is directed toward the geeratio of streamflows because of the historical developmet ad frequet use of these models i that cotext (Matalas ad Wallis, 1976). I additio, they are relatively simple compared to more complete daily weather geerators ad may other applicatios. Geerated streamflows have bee called sythetic to distiguish them from historical observatios (Fierig, 1967). The activity has bee called stochastic hydrological modellig. More detailed presetatios ca be foud i Marco et al. (1989) ad Salas (1993). River basi simulatio studies ca use may sets of streamflow, raifall, evaporatio ad/or temperature sequeces to evaluate the statistical properties of the performace of alterative water resources systems. For this purpose, sythetic flows ad other geerated quatities should resemble, statistically, those sequeces that are likely to be experieced durig the plaig period. Figure 7.1 illustrates how sythetic streamflow, raifall ad other stochastic sequeces are used i cojuctio Figure 7.1. Structure of a simulatio study, idicatig the trasformatio of a sythetic streamflow sequece, future demads ad a system desig ad operatig policy ito system performace statistics. with projectios of future demads ad other ecoomic data to determie how differet system desigs ad operatig policies might perform. Use of oly the historical flow or raifall record i water resource studies does ot allow for the testig of alterative desigs ad policies agaist the rage of sequeces that are likely to occur i the future. We ca be very cofidet that the future historical sequece of flows will ot be the historical oe, yet there is importat iformatio i that historical record. That iformatio is ot fully used if oly the historical sequece is simulated. Fittig cotiuous distributios to the set of historical flows ad the usig those distributios to geerate other sequeces of flows, all of which are statistically similar ad equally weighted, gives oe a broader rage of iputs to simulatio models. Testig desigs ad policies agaist that broader rage of flow sequeces that could occur more clearly idetifies the variability ad rage of possible future performace idicator values. This i tur should lead to the selectio of more robust system desigs ad policies. The use of sythetic streamflows is particularly useful for water resources systems havig large amouts of over-year storage. Use of oly the historical hydrological record i system simulatio yields oly oe time history of how the system would operate from year to year. I water resources systems with relatively little storage, so that reservoirs ad/or groudwater aquifers refill almost every year, sythetic hydrological sequeces may ot be eeded if historical sequeces of a reasoable legth are
37 04 Water Resources Systems Plaig ad Maagemet available. I this secod case, a twety-five-year historical record provides twety-five descriptios of the possible withi-year operatio of the system. This may be sufficiet for may studies. Geerally, use of stochastic sequeces is thought to improve the precisio with which water resources system performace idices ca be estimated, ad some studies have show this to be the case (Vogel ad Shallcross, 1996; Vogel ad Stediger, 1988). I particular, if system operatio performace idices have thresholds ad sharp breaks, the the coarse descriptios provided by historical series are likely to provide relative iaccurate estimates of the expected values of such statistics. For example, suppose that shortages oly ivoke o-liear pealties o average oe year i twety. The i a sixty-year simulatio there is a 19% probability that the pealty will be ivoked at most oce, ad a 18% probability it will be ivoked five or more times. Thus the calculatio of the aual average value of the pealty would be highly ureliable uless some smoothig of the iput distributios is allowed, associated with a log simulatio aalysis. O the other had, if oe is oly iterested i the mea flow, or average beefits that are mostly a liear fuctio of flows, the use of stochastic sequeces will probably add little iformatio to what is obtaied simply by simulatig the historical record. After all, the fitted models are ultimately based o the iformatio provided i the historical record, ad their use does ot produce ew iformatio about the hydrology of the basi. If i a geeral sese oe has available years of record, the statistics of that record ca be used to build a stochastic model for geeratig thousads of years of flow. These sythetic data ca the be used to estimate more exactly the system performace, assumig, of course, that the flow-geeratig model accurately represets ature. But the iitial ucertaity i the model parameters resultig from havig oly years of record would still remai (Schaake ad Vices, 1980). A alterative is to ru the historical record (if it is sufficietly complete at every site ad cotais o gaps of missig data) through the simulatio model to geerate years of output. That output series ca be processed to produce estimates of system performace. So the questio is the followig: Is it better to geerate multiple iput series based o ucertai parameter values ad use those to determie average system performace with great precisio, or is it sufficiet just to model the -year output series that results from simulatio of the historical series? The aswer seems to deped upo how well behaved the iput ad output series are. If the simulatio model is liear, it does ot make much differece. If the simulatio model were highly o-liear, the modellig the iput series would appear to be advisable. Or if oe is developig reservoir operatig policies, there is a tedecy to make a policy sufficietly complex to deal very well with the few droughts i the historical record, givig a false sese of security ad likely misrepresetig the probability of system performace failures. Aother situatio where stochastic data geeratig models are useful is whe oe wats to uderstad the impact o system performace estimates of the parameter ucertaity stemmig from short historical records. I that case, parameter ucertaity ca be icorporated ito streamflow geeratig models so that the geerated sequeces reflect both the variability that oe would expect i flows over time as well as the ucertaity of the parameter values of the models that describe that variability (Valdes et al., 1977; Stediger ad Taylor, 198a,b; Stediger et al., 1985; Vogel ad Stediger, 1988). If oe decides to use a stochastic data geerator, the challege is to use a model that appropriately describes the importat relatioships, but does ot attempt to reproduce more relatioships tha are justified or ca be estimated with available data sets. Two basic techiques are used for streamflow geeratio. If the streamflow populatio ca be described by a statioary stochastic process (a process whose parameters do ot chage over time), ad if a log historical streamflow record exists, the a statioary stochastic streamflow model may be fit to the historical flows. This statistical model ca the geerate sythetic sequeces that describe selected characteristics of the historical flows. Several such models are discussed below. The assumptio of statioarity is ot always plausible, particularly i river basis that have experieced marked chages i ruoff characteristics due to chages i lad cover, lad use, climate or the use of groudwater durig the period of flow record. Similarly, if the physical characteristics of a basi chage substatially i the future, the historical streamflow record may ot provide reliable estimates of the distributio of future uregulated
38 Cocepts i Probability, Statistics ad Stochastic Modellig 05 flows. I the absece of the statioarity of streamflows or a represetative historical record, a alterative scheme is to assume that precipitatio is a statioary stochastic process ad to route either historical or sythetic precipitatio sequeces through a appropriate raifall ruoff model of the river basi. 8.. Streamflow Geeratio Models The first step i the costructio of a statistical streamflow geeratig model is to extract from the historical streamflow record the fudametal iformatio about the joit distributio of flows at differet sites ad at differet times. A streamflow model should ideally capture what is judged to be the fudametal characteristics of the joit distributio of the flows. The specificatio of what characteristics are fudametal is of primary importace. Oe may wat to model as closely as possible the true margial distributio of seasoal flows ad/or the margial distributio of aual flows. These describe both how much water may be available at differet times ad also how variable is that water supply. Also, modellig the joit distributio of flows at a sigle site i differet moths, seasos ad years may be appropriate. The persistece of high flows ad of low flows, ofte described by their correlatio, affects the reliability with which a reservoir of a give size ca provide a give yield (Fierig, 1967; Lettemaier ad Burges, 1977a, 1977b; Thyer ad Kuczera, 000). For multi-compoet reservoir systems, reproductio of the joit distributio of flows at differet sites ad at differet times will also be importat. Sometimes, a streamflow model is said to resemble statistically the historical flows if it produces flows with the same mea, variace, skew coefficiet, autocorrelatios ad/or cross-correlatios as were observed i the historical series. This defiitio of statistical resemblace is attractive because it is operatioal ad oly requires a aalyst to oly fid a model that ca reproduce the observed statistics. The drawback of this approach is that it shifts the modellig emphasis away from tryig to fid a good model of margial distributios of the observed flows ad their joit distributio over time ad over space, give the available data, to just reproducig arbitrarily selected statistics. Defiig statistical resemblace i terms of momets may also be faulted for specifyig that the parameters of the fitted model should be determied usig the observed sample momets, or their ubiased couterparts. Other parameter estimatio techiques, such as maximum likelihood estimators, are ofte more efficiet. Defiitio of resemblace i terms of momets ca also lead to cofusio over whether the populatio parameters should equal the sample momets, or whether the fitted model should geerate flow sequeces whose sample momets equal the historical values. The two cocepts are differet because of the biases (as discussed i Sectio 7) i may of the estimators of variaces ad correlatios (Matalas ad Wallis, 1976; Stediger, 1980, 1981; Stediger ad Taylor, 198a). For ay particular river basi study, oe must determie what streamflow characteristics eed to be modelled. The decisio should deped o what characteristics are importat to the operatio of the system beig studied, the available data, ad how much time ca be spared to build ad test a stochastic model. If time permits, it is good practice to see if the simulatio results are i fact sesitive to the geeratio model ad its parameter values by usig a alterative model ad set of parameter values. If the model s results are sesitive to chages, the, as always, oe must exercise judgemet i selectig the appropriate model ad parameter values to use. This sectio presets a rage of statistical models for the geeratio of sythetic data. The ecessary sophisticatio of a data-geeratig model depeds o the iteded use of the geerated data. Sectio 8.3 below presets the simple autoregressive Markov model for geeratig aual flow sequeces. This model aloe is too simple for may practical studies, but is useful for illustratig the fudametals of the more complex models that follow. It seems, therefore, worth some time explorig the properties of this basic model. Subsequet sectios discuss how flows with ay margial distributio ca be produced, ad preset models for geeratig sequeces of flows that ca reproduce the persistece of historical flow sequeces. Other parts of this sectio preset models for geeratig cocurret flows at several sites ad for geeratig seasoal or mothly flows that preserve the characteristics of aual flows. More detailed discussios for those wishig to study sythetic streamflow models i greater depth ca be foud i Marco et al. (1989) ad Salas (1993).
39 06 Water Resources Systems Plaig ad Maagemet 8.3. A Simple Autoregressive Model A simple model of aual streamflows is the autoregressive Markov model. The historical aual flows q y are thought of as particular values of a statioary stochastic process Q y. The geeratio of aual streamflows ad other variables would be a simple matter if aual flows were idepedetly distributed. I geeral, this is ot the case ad a geeratig model for may pheomea should capture the relatioship betwee values i differet years or i other time periods. A commo ad reasoable assumptio is that aual flows are the result of a firstorder Markov process (as discussed i Sectio 7.). Assume for ow that aual streamflows are ormally distributed. I some areas the distributio of aual flows is i fact early ormal. Streamflow models that produce o-ormal streamflows are discussed as a extesio of this simple model. The joit ormal desity fuctio of two streamflows Q y ad Q w i years y ad w havig mea µ, variace σ, ad year-to-year correlatio ρ betwee flows is 1 fq,q ( y w) 0 5 πσ ( 1 ρ ). ( qy µ ) ρ ( qy µ )( qw µ ) ( qw µ ) exp σ ( 1 ρ ) (7.139) The joit ormal distributio for two radom variables with the same mea ad variace deped oly o their commo mea µ, variace σ, ad the correlatio ρ betwee the two (or equivaletly the covariace ρσ ). The sequetial geeratio of sythetic streamflows requires the coditioal distributio of the flow i oe year give the value of the flows i previous years. However, if the streamflows are a first-order (lag 1) Markov process, the the distributio of the flow i year y 1 depeds etirely o the value of the flow i year y. I additio, if the aual streamflows have a multivariate ormal distributio, the the coditioal distributio of Q y 1 is ormal with mea ad variace E[Q y 1 Q y q y ] µ ρ(q y µ) (7.140) Var(Q y 1 Q y q y ) σ (1 ρ ) where q y is the value of the radom variable Q y i year y. This relatioship is illustrated i Figure Notice that E041011a the larger the absolute value of the correlatio ρ betwee the flows, the smaller the coditioal variace of Q y 1, which i this case does ot deped at all o the value q y. Sythetic ormally distributed streamflows that have mea µ, variace σ, ad year-to-year correlatio ρ, are produced by the model Q (7.141) where V y is a stadard ormal radom variable, meaig that it has zero mea, E[V y ] 0, ad uit variace, E V y 1. The radom variable V y is added here to provide the variability i Q y 1 that remais eve after Q y is kow. By costructio, each V y is idepedet of past flows Q w where w y, ad V y is idepedet of V w for w y. These restrictios imply that E[V w V y ] 0 for w y (7.14) ad flow i year y+1 Q y + 1 E [Q y + 1 Q y=q y ] [µ + ρ (q y µ)] flow i year y +1 µ ρ ( Q µ ) V σ ρ y y y 1 ρ 1 E[(Q w µ)v y ] 0 for w y (7.143) Clearly, Q y 1 will be ormally distributed if both Q y ad V y are ormally distributed because sums of idepedet ormally distributed radom variables are ormally distributed. q y σ 1 - ρ Figure Coditioal distributio of Q y 1 give Q y q y for two ormal radom variables. Q y
40 Cocepts i Probability, Statistics ad Stochastic Modellig 07 It is a straightforward procedure to show that this basic model ideed produces streamflows with the specified momets, as demostrated below. Usig the fact that E[V y ] 0, the coditioal mea of Q y 1 give that Q y equals q y is E[ Qy+1 qy] E[ µ ρ ( qy µ ) Vyσ 1 ρ ] µ ρ ( qy µ ) (7.144) Sice E[V y ] Var[V y ] 1, the coditioal variace of Q y 1 is Var[ Qy 1 qy] E[{ Qy 1 E [ Qy 1 qy]} qy] E[{ µ ρ ( q y µ) Vyσ 1 ρ [ µ ρ ( qy µ )]} [ V σ 1 ρ E ] σ (1 ρ ) y (7.145) Thus, this model produces flows with the correct coditioal mea ad variace. To compute the ucoditioal mea of Q y 1 oe first takes the expectatio of both sides of Equatio to obtai E[ Qy 1 ] µ ρ (E[ Qy] µ ) E[ Vy] σ ρ 1 (7.146) where E[V y ] 0. If the distributio of streamflows is idepedet of time so that for all y, E[Q y 1 ] E[Q y ] E[Q], it is clear that (1 ρ) E[Q] (1 ρ) µ or E[Q] µ (7.147) Alteratively, if Q y for y 1 has mea µ, the Equatio idicates that Q will have mea µ. Thus repeated applicatio of the Equatio would demostrate that all Q y for y 1 have mea µ. The ucoditioal variace of the aual flows ca be derived by squarig both sides of Equatio to obtai E[( Q ) y 1 µ ] E[{ ρ ( Qy µ ) Vyσ 1 ρ}] ρ E[( Qy µ ) ] ρσ 1 ρ E[( µ ) V ] σ (1 ρ )E [ V] Q y y y (7.148) Because V y is idepedet of Q y (Equatio 7.143), the secod term o the right-had side of Equatio vaishes. Hece the ucoditioal variace of Q satisfies E[(Q y 1 µ) ] ρ E[(Q y µ) ] σ (1 ρ ) (7.149) Assumig that Q y 1 ad Q y have the same variace yields E[(Q µ) ] σ (7.150) so that the ucoditioal variace is σ, as required. Agai, if oe does ot wat to assume that Q y 1 ad Q y have the same variace, a recursive argumet ca be adopted to demostrate that if Q 1 has variace σ, the Q y for y 1 has variace σ. The covariace of cosecutive flows is aother importat issue. After all, the whole idea of buildig these time-series models is to describe the year-to-year correlatio of the flows. Usig Equatio oe ca show that the covariace of cosecutive flows must be E[( Qy 1 µ )( Qy µ )] E{[ ρ ( Qy µ ) V y σ 1 ρ ]( Q y µ )} ρe[( Q y µ ) ] ρσ (7.151) where E[(Q y µ)v y ] 0 because V y ad Q y are idepedet (Equatio 7.143). Over a loger time scale, aother property of this model is that the covariace of flows i year y ad y k is E[(Q y k µ)(q y µ)] ρ k σ (7.15) This equality ca be prove by iductio. It has already bee show for k 0 ad 1. If it is true for k j 1, the E[( Q µ )( Q µ )] E{ [ ρ ( Q 1 µ ) y j y y j V y j 1σ 1 ρ ]( Q y µ )} ρ E[( Qy µ )]( Qy j 1 µ )] j 1 j ρρ [ σ ] ρσ (7.153) where E[(Q y µ)v y j 1 ] 0 for j 1. Hece Equatio 7.15 is true for ay value of k. It is importat to ote that the results i Equatios to 7.15 do ot deped o the assumptio that the radom variables Q y ad V y are ormally distributed. These relatioships apply to all autoregressive Markov processes of the form i Equatio regardless of the
41 08 Water Resources Systems Plaig ad Maagemet distributios of Q y ad V y. However, if the flow Q y i year y 1 is ormally distributed with mea µ ad variace σ, ad if the V y are idepedet ormally distributed radom variables with mea zero ad uit variace, the the geerated Q y for y 1 will also be ormally distributed with mea µ ad variace σ. The ext sectio cosiders how this ad other models ca be used to geerate streamflows that have other tha a ormal distributio Reproducig the Margial Distributio Most models for geeratig stochastic processes deal directly with ormally distributed radom variables. Ufortuately, flows are ot always adequately described by the ormal distributio. I fact, streamflows ad may other hydrological data caot really be ormally distributed because of the impossibility of egative values. I geeral, distributios of hydrological data are positively skewed, havig a lower boud ear zero ad, for practical purposes, a ubouded right-had tail. Thus they look like the gamma or logormal distributio illustrated i Figures 7.3 ad 7.4. The asymmetry of a distributio is ofte measured by its coefficiet of skewess. I some streamflow models, the skew of the radom elemets V y is adjusted so that the models geerate flows with the desired mea, variace ad skew coefficiet. For the autoregressive Markov model for aual flows 3 E[( Q y 1 µ ) ] E ρ ( Q y µ ) V y σ 1 ρ 3 3 ρ E[( Qy µ ) ] 3 3 σ (1 ρ ) E V y so that γ Q Q µ ρ E[( ) 3] ( 1 ) / 3 3 σ 1 ρ 3 [ ] 3 3/ [ ] 3 E[ Vy] (7.154) (7.155) By appropriate choice of the skew of V y, E[V y 3 ], the desired skew coefficiet of the aual flows ca be produced. This method has ofte bee used to geerate flows that have approximately a gamma distributio by usig V y s with a gamma distributio ad the required skew. The resultig approximatio is ot always adequate (Lettemaier ad Burges, 1977a). The alterative ad geerally preferred method is to geerate ormal radom variables ad the trasform these variates to streamflows with the desired margial distributio. Commo choices for the distributio of streamflows are the two-parameter ad three-parameter logormal distributios or a gamma distributio. If Q y is a logormally distributed radom variable, the Q y τ exp(x y ) (7.156) where X y is a ormal radom variable. Whe the lower boud τ is zero, Q y has a two-parameter logormal distributio. Equatio trasforms the ormal variates X y ito logormally distributed streamflows. The trasformatio is easily iverted to obtai X y l(q y τ) for Q y τ (7.157) where Q y must be greater tha its lower boud τ. The mea, variace, skewess of X y ad Q y are related by the formulas (Matalas, 1967) 1 µ Q τ exp µ σ X X σ Q exp( µ X σ) exp( σ X [ X) 1] exp( 3σX) 3exp( σx) γ Q [ exp( σ ) 1 3 ] / X 3φ φ whereφ exp ) 1] 3 [ (7.158) If ormal variates X sy ad X y u are used to geerate logormally distributed streamflows Q sy ad Q y u at sites s ad u, the the lag-k correlatio of the Q y s, deoted ρ Q (k; s, u), is determied by the lag-k correlatio of the X variables, deoted ρ X (k; s, u), ad their variaces σ x (s) ad σ x (u), where exp[ ρx( ksu ;, ) σx( s) σx( u)] 1 ρq( ksu ;, ) exp σ ( ) / { [ X s ] 1 1 exp 1 / } { [ σ X ( u) ] 1} (7.159) s The correlatios of the X y ca be adjusted, at least i theory, to produce the observed correlatios amog the Q s y variates. However, more efficiet estimates of the true s correlatio of the Q y values are geerally obtaied by s trasformig the historical flows q y ito their ormal equivalet xs y l( qs y τ ) ad usig the historical correlatios of these X s y values as estimators of ρ X (k; s, u) (Stediger, 1981). ( σ X 1/
42 Cocepts i Probability, Statistics ad Stochastic Modellig 09 E041011b Some isight ito the effect of this logarithmic trasformatio ca be gaied by cosiderig the resultig model for aual flows at a sigle site. If the ormal variates follow the simple autoregressive Markov model X 1 µ ρ ( X µ ) V σ ρ (7.160) the the correspodig Q y follow the model (Matalas, 1967) Q y 1 τ D y {exp[µ x (1 ρ x )]}(Q y τ) ρx (7.161) where D Q y + 1 flow i year y parameters µ Q = 1.00 σq = 0.5 ρq = flow i year y y X y y X 1/ exp[(1 ρ ) σ V ] y X X y µ X = 30 σx = 0.46 ρx = Q y E [Q y + 1 q y ] 1 X 90% probability rage for Q y+1 Figure Coditioal mea of Q y 1 give Q y q y ad 90% probability rage for the value of Q y 1. (7.16) The coditioal mea ad stadard deviatio of Q y 1 give that Q y q y ow depeds o (q y τ ) ρx. Because the coditioal mea of Q y 1 is o loger a liear fuctio of q y, (as show i Figure 7.14), the streamflows are said to exhibit differetial persistece: low flows are ow more likely to follow low flows tha high flows are to follow high flows. This is a property ofte attributed to real streamflow distributios. Models ca be costructed to capture the relative persistece of wet ad dry periods (Matalas ad Wallis, 1976; Salas, 1993; Thyer ad Kuczera, 000). May weather geerators for precipitatio ad temperature iclude such tedecies by employig a Markov chai descriptio of the occurrece of wet ad dry days (Wilks, 1998) Multivariate Models If log cocurret streamflow records ca be costructed at the several sites at which sythetic streamflows are desired, the ideally a geeral multi-site streamflow model could be employed. O Coell (1977), Ledolter (1978), Salas et al. (1980) ad Salas (1993) discuss multivariate models ad parameter estimatio. Ufortuately, idetificatio of the most appropriate model structure is very difficult for geeral multivariate models. This sectio illustrates how the basic uivariate aual-flow model i Sectio 8.3 ca be geeralized to the multivariate case. This exercise reveals how multivariate models ca be costructed to reproduce specified variaces ad covariaces of the flow vectors of iterest, or some trasformatio of those values. This multi-site geeralizatio of the aual AR(1) or autoregressive Markov model follows the approach take by Matalas ad Wallis (1976). This geeral approach ca be further exteded to geerate multi-site/multi-seaso modellig procedures, as is doe i Sectio 8.6, employig what have bee called disaggregatio models. However, while the size of the model matrices ad vectors icreases, the model is fudametally the same from a mathematical viewpoit. Hece this sectio starts with the simpler case of a aual flow model. For simplicity of presetatio ad clarity, vector otatio is employed. Let Z y Z y 1,, Z y T be the colum vector of trasformed zero-mea aual flows at sites s 1,,,, so that s E[ Z y ] 0 (7.163) I additio, let V y V 1 y,,v y T be a colum vector s of stadard-ormal radom variables, where V y is r idepedet of V w for (r, w) (s, y) ad idepedet r of past flows Z w where y w. The assumptio that the variables have zero mea implicitly suggests that the mea value has already bee subtracted from all the variables. This makes the otatio simpler ad elimiates the eed to iclude a costat term i the models. With all the radom variables havig zero mea, oe ca focus o reproducig the variaces ad covariaces of the vectors icluded i a model.
43 10 Water Resources Systems Plaig ad Maagemet A sequece of sythetic flows ca be geerated by the model Z y 1 AZ y BV y (7.164) where A ad B are ( ) matrices whose elemets are chose to reproduce the lag 0 ad lag 1 cross-covariaces of the flows at each site. The lag 0 ad lag 1 covariaces ad cross-covariaces ca most ecoomically be maipulated by use of the two matrices S 0 ad S 1. The lag-zero covariace matrix, deoted S 0, is defied as S E[ ZZ y T ] 0 ad has elemets S (, ) E[ ZZ 0 ij i y y j ] (7.165) (7.166) The lag-oe covariace matrix, deoted S 1, is defied as S 1 T 1 E[ Zy Z y] ad has elemets S( 1 i, j ) E[ Z i y 1 Zy j ] (7.167) (7.168) The covariaces do ot deped o y because the streamflows are assumed to be statioary. Matrix S 1 cotais the lag 1 covariaces ad lag 1 crosscovariaces. S 0 is symmetric because the cross-covariace S 0 (i, j) equals S 0 (j, i). I geeral, S 1 is ot symmetric. The variace covariace equatios that defie the values of A ad B i terms of S 0 ad S 1 are obtaied by maipulatios of Equatio Multiplyig both sides of that equatio by Z y T ad takig expectatios yields T T T E[ Zy 1 Zy] E[ AZyZy] E[ BVyZy] (7.169) The secod term o the right-had side vaishes because the compoets of Z y ad V y are idepedet. Now the first term i Equatio 7.169, E[AZ y Z y T ], is a matrix whose (i, j)th elemet equals E y aikzz k y y j k 1 k 1 aike[ ZZ k y y j ] (7.170) The matrix with these elemets is the same as the matrix AE Z y Z y T. Hece, A the matrix of costats ca be pulled through the expectatio operator just as is doe i the scalar case where E[aZ y b] ae[z y ] b for fixed costats a ad b. Substitutig S 0 ad S 1 for the appropriate expectatios i Equatio yields 1 S 1 AS 0 or A S 1 S 0 (7.171) A relatioship to determie the matrix B is obtaied by multiplyig both sides of Equatio by its ow traspose (this is equivalet to squarig both sides of the scalar equatio a b) ad takig expectatios to obtai E[Z y 1 Z T y 1] E[AZ y Z T ya T ] E[AZ y V T yb T ] E[BV y Z y A T ] E[BV y V T yb T ] (7.17) The secod ad third terms o the right-had side of Equatio 7.17 vaish because the compoets of Z y ad V y are idepedet ad have zero mea. E[V y V T y ] equals the idetity matrix because the compoets of V y are idepedetly distributed with uit variace. Thus S 0 AS 0 A T BB T (7.173) Solvig for the B matrix, oe fids that it should satisfy the quadratic equatio BB T S 0 AS 0 A T S 0 S 1 S 1 T 0 S 1 (7.174) The last equatio results from substitutio of the relatioship for A give i Equatio ad the fact that S 0 is symmetric; hece, S 0 1 is symmetric. It should ot be too surprisig that the elemets of B are ot uiquely determied by Equatio The compoets of the radom vector V y may be combied i may ways to produce the desired covariaces as log as B satisfies Equatio A lower triagular matrix that satisfies Equatio ca be calculated by Cholesky decompositio (Youg, 1968; Press et al., 1986). Matalas ad Wallis (1976) call Equatio the lag- 1 model. They do ot call it the Markov model because the streamflows at idividual sites do ot have the covariaces of a autoregressive Markov process give i Equatio They suggest a alterative model for what they call the Markov model. It has the same structure as the lag-1 model except it does ot preserve the lag-1 cross-covariaces. By relaxig this requiremet, they obtai a simpler model with fewer parameters that geerates flows that have covariaces of a autoregressive Markov process at each site. I their Markov model, the ew A matrix is simply a diagoal matrix,
44 Cocepts i Probability, Statistics ad Stochastic Modellig 11 whose diagoal elemets are the lag-1 correlatios of flows at each site: A diag[ρ(1; i, i)] (7.175) where ρ(1; i, i) is the lag-oe correlatio of flows at site i. The correspodig B matrix depeds o the ew A matrix ad S 0, where as before BB T S 0 AS 0 A T (7.176) The idea of fittig time-series models to each site separately ad the correlatig the iovatios i those separate models to reproduce the cross-correlatio betwee the series is a very geeral ad useful modellig idea that has see a umber of applicatios with differet time-series models (Matalas ad Wallis, 1976; Stediger et al., 1985; Camacho et al., 1985; Salas, 1993) Multi-Seaso, Multi-Site Models I most studies of surface water systems it is ecessary to cosider the variatios of flows withi each year. Streamflows i most areas have withi-year variatios, exhibitig wet ad dry periods. Similarly, water demads for irrigatio, muicipal ad idustrial uses also vary, ad the variatios i demad are geerally out of phase with the variatio i withi-year flows; more water is usually desired whe streamflows are low, ad less is desired whe flows are high. This icreases the stress o water delivery systems ad makes it all the more importat that time-series models of streamflows, precipitatio ad other hydrological variables correctly reproduce the seasoality of hydrological processes. This sectio discusses two approaches to geeratig withi-year flows. The first approach is based o the disaggregatio of aual flows produced by a aual flow geerator to seasoal flows. Thus the method allows for reproductio of both the aual ad seasoal characteristics of streamflow series. The secod approach geerates seasoal flows i a sequetial maer, as was doe for the geeratio of aual flows. Thus the models are a direct geeralizatio of the aual flow models already discussed Disaggregatio Models The disaggregatio model proposed by Valecia ad Schaake (1973) ad exteded by Mejia ad Rousselle (1976) ad Tao ad Delleur (1976) allows for the geeratio of sythetic flows that reproduce statistics both at the aual ad at the seasoal level. Subsequet improvemets ad variatios are described by Stediger ad Vogel (1984), Maheepala ad Perera (1996), Koutsoyiais ad Maetas (1996) ad Tarboto et al. (1998). Disaggregatio models ca be used for either multiseaso sigle-site or multi-site streamflow geeratio. They represet a very flexible modellig framework for dealig with differet time or spatial scales. Aual flows for the several sites i questio or the aggregate total aual flow at several sites ca be the iput to the model (Grygier ad Stediger, 1988). These must be geerated by aother model, such as those discussed i the previous sectios. These aual flows or aggregated aual flows are the disaggregated to seasoal values. Let Z y Z y 1,, Z y N T be the colum vector of N trasformed ormally distributed aual or aggregate aual flows for N separate sites or basis. Next, let X y X 1 1 y,, X 1 Ty, X 1 y,, X Ty,, X 1 y,, X Ty T be the colum vector of T trasformed ormally distributed seasoal flows X s ty for seaso t, year y, ad site s 1,,. Assumig that the aual ad seasoal series, Z y s ad X s ty, have zero mea (after the appropriate trasformatio), the basic disaggregatio model is X y AZ y BV y (7.177) where V y is a vector of T idepedet stadard ormal radom variables, ad A ad B are, respectively, T N ad T T matrices. Oe selects values of the elemets of A ad B to reproduce the observed correlatios amog the elemets of X y ad betwee the elemets of X y ad Z y. Alteratively, oe could attempt to reproduce the observed correlatios of the utrasformed flows as opposed to the trasformed flows, although this is ot always possible (Hoshi et al., 1978) ad ofte produces poorer estimates of the actual correlatios of the flows (Stediger, 1981). The values of A ad B are determied usig the matrices S zz E[Z y Z y T ], S xx E[X y X y T ], S xz E[X y Z y T ], ad S zx E[Z y X y T ] where S zz was called S 0 earlier. Clearly, S T xz S zx. If S xz is to be reproduced, the by multiplyig Equatio o the right by Z y T ad takig expectatios, oe sees that A must satisfy T E[ XZ T ] E[AZZ ] y y y y (7.178)
45 1 Water Resources Systems Plaig ad Maagemet or S xz AS zz (7.179) Solvig for the coefficiet matrix A oe obtais A S S xz zz 1 (7.180) To obtai a equatio that determies the required values i the matrix B, oe ca multiply both sides of Equatio by their traspose ad take expectatios to obtai S xx AS zz A T BB T (7.181) Thus, to reproduce the covariace matrix S xx, the B matrix must satisfy BB T S xx AS zz A T (7.18) Equatios ad 7.18 for determiig A ad B are completely aalogous to Equatios ad for the A ad B matrices of the lag 1 models developed earlier. However, for the disaggregatio model as formulated, BB T, ad hece the matrix B, ca actually be sigular or early so (Valecia ad Schaake, 1973). This occurs because the real seasoal flows sum to the observed aual flows. Thus give the aual flow at a site ad all but oe (T 1) of the seasoal flows, the value of the uspecified seasoal flow ca be determied by subtractio. If the seasoal variables X s ty correspod to o-liear trasformatios of the actual flows Q s ty, the BB T is geerally sufficietly o-sigular that a B matrix ca be obtaied by Cholesky decompositio. O the other had, whe the model is used to geerate values of X s ty to be trasformed ito sythetic flows Q s ty, the costrait that these seasoal flows should sum to the give value of the aual flow is lost. Thus the geerated aual flows (equal to the sums of the seasoal flows) will deviate from the values that were to have bee the aual flows. Some distortio of the specified distributio of the aual flows results. This small distortio ca be igored, or each year s seasoal flows ca be scaled so that their sum equals the specified value of the aual flow (Grygier ad Stediger, 1988). The latter approach elimiates the distortio i the distributio of the geerated aual flows by distortig the distributio of the geerated seasoal flows. Koutsoyiais ad Maetas (1996) improve upo the simple scalig algorithm by icludig a step that rejects cadidate vectors X y if the required adjustmet is too large, ad istead geerates aother vector X y. This reduces the distortio i the mothly flows that results from the adjustmet step. The disaggregatio model has substatial data requiremets. Whe the dimesio of Z y is ad the dimesio of the geerated vector X y is m, the A matrix has m elemets. The lower diagoal B matrix ad the symmetric S xx matrix, upo which it depeds, each have m(m 1)/ ozero or o-redudat elemets. For example, whe disaggregatig two aggregate aual flow series to mothly flows at five sites, ad m ; thus, A has 10 elemets while B ad S xx each have 1,830 ozero or o-redudat parameters. As the umber of sites icluded i the disaggregatio icreases, the size of S xx ad B icreases rapidly. Very quickly the model ca become overly parameterized, ad there will be isufficiet data to estimate all parameters (Grygier ad Stediger, 1988). I particular, oe ca thik of Equatio as a series of liear models geeratig each mothly flow X k ty for k 1, t 1,, 1; k, t 1,, 1 up to k, t 1,, 1 that reproduces the correlatio of each X k ty with all aual flows, Z y k, ad all previously geerated mothly flows. The whe oe gets to the last flow i the last moth, the model will be attemptig to reproduce (1 1) 13 1 aual to mothly ad cross-correlatios. Because the model implicitly icludes a costat, this meas oe eeds k* 13 years of data to obtai a uique solutio for this critical equatio. For 3, k* 39. Oe could say that with a record legth of forty years, there would be oly oe degree of freedom left i the residual model error variace described by B. That would be usatisfactory. Whe flows at may sites or i may seasos are required, the size of the disaggregatio model ca be reduced by disaggregatio of the flows i stages. Such codesed models do ot explicitly reproduce every seaso-to-seaso correlatio (Lae, 1979; Stediger ad Vogel, 1984; Gryier ad Stediger, 1988; Koutsoyiais ad Maetas, 1996), Nor do they attempt to reproduce the cross-correlatios amog all the flow variates at the same site withi a year (Lae, 1979; Stediger et al., 1985). Cotemporaeous models, like the Markov model developed earlier i Sectio 8.5, are models developed for idividual sites whose iovatio vectors V y have the eeded cross-correlatios to reproduce the cross-correlatios of the cocurret flows (Camacho et al., 1985), as was doe i Equatio Grygier ad
46 Cocepts i Probability, Statistics ad Stochastic Modellig 13 Stediger (1991) describe how this ca be doe for a codesed disaggregatio model without geeratig icosistecies Aggregatio Models Oe ca start with aual or seasoal flows, ad break them dow ito flows i shorter periods represetig moths or weeks. Alteratively oe ca start with a model that describes the shortest time step flows. This latter approach has bee referred to as aggregatio to distiguish it from disaggregatio. Oe method for geeratig multi-seaso flow sequeces is to covert the time series of seasoal flows Q ty ito a homogeeous sequece of ormally distributed zero-mea uit-variace radom variables Z ty. These ca the be modelled by a extesio of the aual flow geerators that have already bee discussed. This trasformatio ca be accomplished by fittig a reasoable margial distributio to the flows i each seaso so as to be able to covert the observed flows q s ty ito their trasformed couterparts Z s ty, ad vice versa. Particularly whe shorter streamflow records are available, these simple approaches may yield a reasoable model of some streams for some studies. However, they implicitly assume that the stadardized series is statioary, i the sese that the seaso-to-seaso correlatios of the flows do ot deped o the seasos i questio. This assumptio seems highly questioable. This theoretical difficulty with the stadardized series ca be overcome by itroducig a separate streamflow model for each moth. For example, the classic Thomas Fierig model (Thomas ad Fierig, 1970) of mothly flows may be writte Z 1, β Z 1 β V t y t ty t ty (7.183) where the Z ty s are stadard ormal radom variables correspodig to the streamflow i seaso t of year y, β t is the seaso-to-seaso correlatio of the stadardized flows, ad V ty are idepedet stadard ormal radom variables. The problem with this model is that it ofte fails to reproduce the correlatio amog o-cosecutive moths durig a year ad thus misrepresets the risk of multi-moth ad multi-year droughts (Hoshi et al., 1978). For a aggregatio approach to be attractive, it is ecessary to use a model with greater persisitece tha the Thomas Fierig model. A geeral class of time-series models that allow reproductio of differet correlatio structures are the Box Jekis Autoregressive- Movig average models (Box et al., 1994). These models are preseted by the otatio ARMA(p,q) for a model which depeds o p previous flows, ad q extra iovatios V ty. For example, Equatio would be called a AR(1) or AR(1,0) model. A simple ARMA(1,1) model is Zt 1 φ1 Zt Vt+ 1 θ1 Vt (7.184) The correlatios of this model have the values ρ 1 (1 θ 1 φ 1 )(φ 1 θ 1 )/(1 θ 1 φ 1 θ 1 ) (7.185) for the first lag. For i 1 i ρ φ ρ i 1 1 (7.186) For φ values ear ad 0 θ 1 φ 1, the autocorrelatios ρ k ca decay much slower tha those of the stadard AR(1) model. The correlatio fuctio ρ k of geeral ARMA(p,q) model, p Z φ Z V θ V t 1 i t 1 i t 1 i 1 q j 1 j t 1 j (7.187) is a complex fuctio that must satisfy a umber of coditios to esure the resultat model is statioary ad ivertible (Box et al., 1994). ARMA(p,q) models have bee exteded to describe seasoal flow series by havig their coefficiets deped upo the seaso these are called periodic autoregressivemovig average models, or PARMA. Salas ad Obeysekera (199), Salas ad Feradez (1993), ad Claps et al., (1993) discuss the coceptual basis of such stochastic streamflow models. For example, Salas ad Obeysekera (199) foud that low-order PARMA models, such as a PARMA(,1), arise from reasoable coceptual represetatios of persistece i raifall, ruoff ad groudwater recharge ad release. Claps et al. (1993) observe that the PARMA(,) model, which may be eeded if oe wats to preserve year-to-year correlatio, poses a parameter estimatio challege that is almost umaageable (see also Rasmusse et al., 1996). The PARMA (1,1)
47 14 Water Resources Systems Plaig ad Maagemet model is more practical ad easy to exted to the multivariate case (Hirsch, 1979; Stediger et al., 1985; Salas, 1993; Rasmusse et al., 1996). Experiece has show that PARMA(1,1) models do a better job of reproducig the correlatio of seasoal flows beyod lag 1 tha does a Thomas Fierig PAR(1,0) model (see for example, Bartolii ad Salas, 1993). 9. Stochastic Simulatio This sectio itroduces stochastic simulatio. Much more detail o simulatio is cotaied i later chapters. As discussed i Chapter 3, simulatio is a flexible ad widely used tool for the aalysis of complex water resources systems. Simulatio is trial ad error. Oe must defie the system beig simulated, both its desig ad operatig policy, ad the simulate it to see how it works. If the purpose is to fid the best desig ad operatig policy, may such alteratives must be simulated ad their results must be compared. Whe the umber of alteratives to simulate becomes too large for the time ad moey available for such aalyses, some kid of prelimiary screeig, perhaps usig optimizatio models, may be justified. This use of optimizatio for prelimiary screeig that is, for elimiatig alteratives prior to a more detailed simulatio is discussed i Chapters 3, 4 ad later chapters. As with optimizatio models, simulatio models may be determiistic or stochastic. Oe of the most useful tools i water resources systems plaig is stochastic simulatio. While optimizatio ca be used to help defie reasoable desig ad operatig policy alteratives to be simulated, simulatios ca better reveal how each such alterative will perform. Stochastic simulatio of complex water resources systems o digital computers provides plaers with a way to defie the probability distributios of multiple performace idices of those systems. Whe simulatig ay system, the modeller desigs a experimet. Iitial flow, storage ad water quality coditios must be specified if these are beig simulated. For example, reservoirs ca start full, empty or at radom represetative coditios. The modeller also determies what data are to be collected o system performace ad operatio, ad how they are to be summarized. The legth of time the simulatio is to be ru must be specified ad, i the case of stochastic simulatios, the umber of rus to be made must also be determied. These cosideratios are discussed i more detail by Fishma (001) ad i other books o simulatio. The use of stochastic simulatio ad the aalysis of the output of such models are itroduced here primarily i the cotext of a example to illustrate what goes ito a stochastic simulatio model ad how oe ca deal with the iformatio that is geerated Geeratig Radom Variables Icluded i ay stochastic simulatio model is some provisio for the geeratio of sequeces of radom umbers that represet particular values of evets such as raifall, streamflows or floods. To geerate a sequece of values for a radom variable, the probability distributio for the radom variable must be specified. Historical data ad a uderstadig of the physical processes are used to select appropriate distributios ad to estimate their parameters (as discussed i Sectio 7.). Most computers have algorithms for geeratig radom umbers uiformly distributed (equally likely) betwee zero ad oe. This uiform distributio of radom umbers is defied by its cdf ad pdf; F U (u) 0 for u 0, u for 0 u 1 ad 1 if u 1 (7.188) so that f U (u) 1 if 0 u 1 ad 0 otherwise (7.189) These uiform radom variables ca the be trasformed ito radom variables with ay desired distributio. If F Q (q t ) is the cumulative distributio fuctio of a radom variable Q t i period t, the Q t ca be geerated usig the iverse of the distributio. Q t F 1 Q [U t ] (7.190) Here U t is the uiform radom umber used to geerate Q t. This is illustrated i Figure Aalytical expressios for the iverse of may distributios, such as the ormal distributio, are ot kow, so special algorithms are employed to efficietly geerate deviates with these distributios (Fishma, 001).
48 Cocepts i Probability, Statistics ad Stochastic Modellig 15 1 u t u t = F Q (q t ) year water demad ( x 10 7 m 3 /yr) E01101 E F Q (q t ) 9.. River Basi Simulatio particular value of Q t A example will demostrate the use of stochastic simulatio i the desig ad aalysis of water resources systems. Assume that farmers i a particular valley have bee plagued by frequet shortages of irrigatio water. They curretly draw water from a uregulated river to which they have water rights. A govermet agecy has proposed costructio of a moderate-size dam o the river upstream of the poits where the farmers withdraw water. The dam would be used to icrease the quatity ad reliability of irrigatio water available to the farmers durig the summer growig seaso. After prelimiary plaig, a reservoir with a active capacity of m 3 has bee proposed for a atural dam site. It is aticipated that, because of the icreased reliability ad availability of irrigatio water, the quatity of water desired will grow from a iitial level of m 3 /yr after costructio of the dam to m 3 /yr withi six years. After that, demad will grow more slowly to m 3 /yr the estimated maximum reliable yield. The projected demad for summer irrigatio water is show i Table 7.1. A simulatio study ca evaluate how the system will be expected to perform over a twety-year plaig period. Table 7.13 cotais statistics that describe the hydrology at the dam site. The estimated momets are computed from the forty-five-year historic record. q t -1 F Q ( ut ) Figure The probability distributio of a radom variable ca be iverted to produce values of the radom variable Table 7.1. Projected water demad for irrigatio water. Usig the techiques discussed i the previous sectio, a Thomas Fierig model is used to geerate twety-five logormally distributed sythetic streamflow sequeces. The statistical characteristics of the sythetic flows are those listed i Table Use of oly the forty-five-year historic flow sequece would ot allow examiatio of the system s performace over the large rage of streamflow sequeces, which could occur durig the twety-year plaig period. Joitly, the sythetic sequeces should be a descriptio of the rage of iflows that the system might experiece. A larger umber of sequeces could be geerated.
49 16 Water Resources Systems Plaig ad Maagemet Table Characteristics of the river flow. E01101p witer summer aual mea flow x 10 7 m3 stadard deviatio x 10 7 m3 correlatio of flows: witer with followig summer summer with followig witer The Simulatio Model The simulatio model is composed primarily of cotiuity costraits ad the proposed operatig policy. The volume of water stored i the reservoir at the begiig of seasos 1 (witer) ad (summer) i year y are deoted by S 1y ad S y. The reservoir s witer operatig policy is to store as much of the witer s iflow Q 1y as possible. The witer release R 1y is determied by the rule reservoir release reservoir empties S 1, y+1 = O R y = S y + Q y 1 1 R y = S y + Q y - K reservoir fills S 1, y+1 = K E0109c R 1y S1y Q1y K if S1y Q1y Rmi K Rmi if K S1y Q1y Rmi 0 S1y Q1y otherwise (7.191) where K is the reservoir capacity of m 3 ad R mi is m 3, the miimum release to be made if possible. The volume of water i storage at the begiig of the year s summer seaso is S y S 1y Q 1y R 1y (7.19) The summer release policy is to meet each year s projected demad or target release D y, if possible, so that R y S y Q y K if S y Q y D y K D y if 0 S y Q y D y K S y Q y otherwise (7.193) This operatig policy is illustrated i Figure The volume of water i storage at the begiig of the ext witer seaso is S 1,y 1 S y Q y R y (7.194) K reservoir capacity 9.4. Simulatio of the Basi water available S y + Q y Figure Summer reservoir operatig policy. The shaded area deotes the feasible regio of reservoir releases. The questio to be addressed by this simulatio study is how well the reservoir will meet the farmers water requiremets. Three steps are ivolved i aswerig this questio. First, oe must defie the performace criteria or idices to be used to describe the system s performace. The appropriate idices will, of course, deped o the problem at had ad the specific cocers of the users ad maagers of a water resources system. I this example of a reservoir-irrigatio system, several idices will be used relatig to the reliability with which target releases are met ad the severity of ay shortages. The secod step is to simulate the proposed system to evaluate the specified idices. For our reservoir-irrigatio system, the reservoir s operatio was simulated twety-five
50 Cocepts i Probability, Statistics ad Stochastic Modellig 17 times usig the twety-five sythetic streamflow sequeces, each twety years i legth. Each of the twety simulated years cosisted of first a witer ad the a summer seaso. At the begiig of the first witer seaso, the reservoir was take to be empty (S 1y 0 for y 1) because costructio would just have bee completed. The target release or demad for water i each year is give i Table The third ad fial step, after simulatig the system, is to iterpret the resultig iformatio so as to gai a uderstadig of how the system might perform both with the proposed desig ad operatig policy ad with modificatios i either the system s desig or its operatig policy. To see how this may be doe, cosider the operatio of our example reservoir-irrigatio system. The reliability p y of the target release i year y is the probability that the target release D y is met or exceeded i that year: p y Pr[R y D y ] (7.195) The system s reliability is a fuctio of the target release D y, the hydrology of the river, the size of the reservoir ad the operatig policy of the system. I this example, the reliability also depeds o the year i questio. Figure 7.17 shows the total umber of failures that occurred i each year of the twety-five simulatios. I three of these, the reservoir did ot cotai sufficiet water after the iitial witer seaso to meet the demad the first summer. After year 1, few failures occur i years through 9 because of the low demad. Surprisigly few failures occur i years 10 ad 13, whe demad has reached its peak; this is because the reservoir was ormally full at the begiig of this period as a result of lower demad i umber of failures i each year E0109m year of simulatio Figure Number of failures i each year of twety-five twety-year simulatios. the earlier years. Startig i years 14 ad after, failures occurred more frequetly because of the higher demad placed o the system. Thus oe has a sese of how the reliability of the target releases chages over time Iterpretig Simulatio Output Table 7.14 cotais several summary statistics of the twety-five simulatios. Colum of the table cotais the average failure frequecy i each simulatio, which equals the umber of years the target release was ot met divided by twety, the umber of years simulated. At the bottom of colum ad the other colums are several statistics that summarize the twety-five values of the differet performace idices. The sample estimates of the mea ad variace of each idex are give as oe way of summarizig the distributio of the observatios. Aother approach is specificatio of the sample media, the approximate iter-quartile rage x (6) x (0), ad/or the rage x (1) x (5) of the observatios, where x (i) is the ith largest observatio. Either set of statistics could be used to describe the cetre ad spread of each idex s distributio. Suppose that oe is iterested i the distributio of the system s failure frequecy or, equivaletly, the reliability with which the target ca be met. Table 7.14 reports that the mea failure rate for the twety-five simulatios is 84, implyig that the average reliability over the twety-year period is , or 9%. The media failure rate is 5, implyig a media reliability of 95%. These are both reasoable estimates of the cetre of the distributio of the failure frequecy. Note that the actual failure frequecy raged from 0 (seve times) to Thus the system s reliability raged from 100% to as low as 70%, 75% ad 80% i rus 17, 8, ad 11, respectively. Obviously, the farmers are iterested ot oly i kowig the mea failure frequecy but also the rage of failure frequecies they are likely to experiece. If oe kew the form of the distributio fuctio of the failure frequecy, oe could use the mea ad stadard deviatio of the observatios to determie a iterval withi which the observatios would fall with some pre-specified probability. For example, if the observatios are ormally distributed, there is a 90% probability that the idex falls withi the iterval µ x 1.65σ x. Thus, if the simulated failure rates are ormally distributed, the there is about a 90% probability
51 18 Water Resources Systems Plaig ad Maagemet Table Results of 5 twety-year simulatios. simulatio umber, i frequecy of failure to meet: 80% of target target total shortage TS x10 7 m3 average deficit, AD E01101q mea x stadard deviatio of values; s x media approximate iterquartile rage; x (6) -x (0) rage; x ( 1) -x (5)
52 Cocepts i Probability, Statistics ad Stochastic Modellig 19 that the actual failure rate observed i ay simulatio is withi the iterval x _ 1.65s x. I our case this iterval would be [ (81), (81)] [ 50, 0.18]. Clearly, the failure rate caot be less tha zero, so this iterval makes little sese i our example. A more reasoable approach to describig the distributio of a performace idex whose probability distributio fuctio is ot kow is to use the observatios themselves. If the observatios are of a cotiuous radom variable, the iterval x (i) x ( 1 i) provides a reasoable estimate of a iterval withi which the radom variable falls with probability i i i P (7.196) I our example, the rage x (1) x (5) of the twety-five observatios is a estimate of a iterval i which a cotiuous radom variable falls with probability (5 1 )/(5 1) 9%, while x (6) x (0) correspods to probability (5 1 6)/(5 1) 54%. Table 7.14 reports that, for the failure frequecy, x (1) x (5) equals , while x (6) x (0) equals Reflectio o how the failure frequecies are calculated remids us that the failure frequecy ca oly take o the discrete, o-egative values 0, 1/0, /0,, 0/0. Thus, the radom variable X caot be less tha zero. Hece, if the lower edpoit of a iterval is zero, as is the case here, the 0 x (k) is a estimate of a iterval withi which the radom variable falls with a probability of at least k/( 1). For k equal to 0 ad 5, the correspodig probabilities are 77% ad 96%. Ofte, the aalysis of a simulated system s performace cetres o the average value of performace idices, such as the failure rate. It is importat to kow the accuracy with which the mea value of a idex approximates the true mea. This is doe by the costructio of cofidece itervals. A cofidece iterval is a iterval that will cotai the ukow value of a parameter with a specified probability. Cofidece itervals for a mea are costructed usig the t statistic, x t µ s / x x (7.197) which, for large, has approximately a stadard ormal distributio. Certaily, 5 is ot very large, but the approximatio to a ormal distributio may be sufficietly good to obtai a rough estimate of how close the average frequecy of failure x _ is likely to be to µ x. A 100(1 α)% cofidece iterval for µ x is, approximately, sx x tα µ x x t or (7.198) If α 5, the usig a ormal distributio t α 1.65 ad Equatio becomes 57 µ x Hece, based o the simulatio output, oe ca be about 90% sure that the true mea failure frequecy lies betwee 5.7% ad 11%. This correspods to a reliability of betwee 89% ad 94%. By performig additioal simulatios to icrease the size of, the width of this cofidece iterval ca be decreased. However, this icrease i accuracy may be a illusio because the ucertaity i the parameters of the streamflow model has ot bee icorporated ito the aalysis. Failure frequecy or system reliability describes oly oe dimesio of the system s performace. Table 7.14 cotais additioal iformatio o the system s performace related to the severity of shortages. Colum 3 lists the frequecies with which the shortage exceeded 0% of that year s demad. This occurred i approximately % of the years, or i 4% of the years i which a failure occurred. Takig aother poit of view, failures i excess of 0% of demad occurred i ie out of twety-five, or i 36% of the simulatio rus. Colums 4 ad 5 of Table 7.14 cotai two other idices that pertai to the severity of the failures. The total shortfall, TS, i Colum 4 is calculated as the sum of the positive differeces betwee the demad ad the release i the summer seaso over the twety-year period. TS y [D y R y ] sx tα µ x t 5 α where [Q] Q if Q 0; 0 otherwise (7.199) The total shortfall equals the total amout by which the target release is ot met i years i which shortages occur. α
53 0 Water Resources Systems Plaig ad Maagemet Related to the total shortfall is the average deficit. The deficit is defied as the shortfall i ay year divided by the target release i that year. The average deficit, AD, is 0 1 Dy Ry AD m D y= 1 y (7.00) where m is the umber of failures (deficits) or ozero terms i the sum. Both the total shortfall ad the average deficit measure the severity of shortages. The mea total shortfall TS, equal to 1.00 for the twety-five simulatio rus, is a difficult umber to iterpret. While o shortage occurred i seve rus, the total shortage was 4.7 i ru 8, i which the shortfall i two differet years exceeded 0% of the target. The media of the total shortage values, equal to 0.76, is a easier umber to iterpret i that oe kows that half the time the total shortage was greater ad half the time less tha this value. The mea average deficit AD is 0.106, or 11%. However, this mea icludes a average deficit of zero i the seve rus i which o shortages occurred. The average deficit i the eightee rus i which shortages occurred is (11%)(5/18) 15%. The average deficit i idividual simulatios i which shortages occurred rages from 4% to 43%, with a media of 11.5%. After examiig the results reported i Table 7.14, the farmers might determie that the probability of a shortage exceedig 0% of a year s target release is higher tha they would like. They ca deal with more frequet mior shortages, ot exceedig 0% of the target, with little ecoomic hardship, particularly if they are wared at the begiig of the growig seaso that less tha the targeted quatity of water will be delivered. The they ca curtail their platig or plat crops requirig less water. I a attempt to fid out how better to meet the farmers eeds, the simulatio program was re-ru with the same streamflow sequeces ad a ew operatig policy i which oly 80% of the growig seaso s target release is provided (if possible) if the reservoir is less tha 80% full at the ed of the previous witer seaso. This gives the farmers time to adjust their platig schedules ad may icrease the quatity of water stored i the reservoir to be used the followig year if the drought persists. As the simulatio results with the ew policy i Table 7.15 demostrate, this ew operatig policy appears to have the expected effect o the system s operatio. With the ew policy, oly six severe shortages i excess of 0% of demad occur i the twety-five twety-year simulatios, as opposed to te such shortages with the origial policy. I additio, these severe shortages are all less severe tha the correspodig shortages that occur with the same streamflow sequece whe the origial policy is followed. The decrease i the severity of shortages is obtaied at a price. The overall failure frequecy has icreased from 8.4% to 14.%. However, the latter value is misleadig because i fourtee of the twety-five simulatios, a failure occurs i the first simulatio year with the ew policy, whereas oly three failures occur with the origial policy. Of course, these first-year failures occur because the reservoir starts empty at the begiig of the first witer ad ofte does ot fill that seaso. Igorig these first-year failures, the failure rates with the two policies over the subsequet ietee years are 8.% ad 1.0%. Thus the frequecy of failures i excess of 0% of demad is decreased from.0% to 1.% by icreasig the frequecy of all failures after the first year from 8.% to 1.0%. Reliability decreases, but so does vulerability. If the farmers are willig to put up with more frequet mior shortages, the it appears that they ca reduce their risk of experiecig shortages of greater severity. The precedig discussio has igored the statistical issue of whether the differeces betwee the idices obtaied i the two simulatio experimets are of sufficiet statistical reliability to support the aalysis. If care is ot take, observed chages i a performace idex from oe simulatio experimet to aother may be due to samplig fluctuatios rather tha to modificatios of the water resource system s desig or operatig policy. As a example, cosider the chage that occurred i the frequecy of shortages. Let X 1i ad X i be the simulated failure rates usig the ith streamflow sequece with the origial ad modified operatig policies. The radom variables Y i X 1i X i for i equal 1 through 5 are idepedet of each other if the streamflow sequeces are geerated idepedetly, as they were. Oe would like to cofirm that the radom variable Y teds to be egative more ofte tha it is positive, ad hece, that policy ideed results i more failures overall. A direct test of this theory is provided by the sig test. Of the twety-five paired simulatio rus, y i 0 i twetyoe cases ad y i 0 i four cases. We ca igore the times
54 Cocepts i Probability, Statistics ad Stochastic Modellig 1 simulatio umber, i frequecy of failure to meet: 80% of target target total shortage TS x10 7 m3 average deficit, AD E01101r Table Results of 5 twety-year simulatios with modified operatig policy to avoid severe shortages mea x stadard deviatio of values; s x media approximate iterquartile rage; x (6) -x (0) rage; x ( 1) -x (5)
55 Water Resources Systems Plaig ad Maagemet whe y i 0. Note that if y i 0 ad y i 0 were equally likely, the the probability of observig y i 0 i all twety-oe cases whe y i 0 is 1 or This is exceptioally strog proof that the ew policy has icreased the failure frequecy. A similar aalysis ca be made of the frequecy with which the release is less tha 80% of the target. Failure frequecies differ i the two policies i oly four of the twety-five simulatio rus. However, i all four cases where they differ, the ew policy resulted i fewer severe failures. The probability of such a lopsided result, were it equally likely that either policy would result i a lower frequecy of failures i excess of 0% of the target, is This is fairly strog evidece that the ew policy ideed decreases the frequecy of severe failures. Aother approach to this problem is to ask if the differece betwee the average failure rates _ x 1 ad _ x is statistically sigificat; that is, ca the differece betwee X 1 ad X be attributed to the fluctuatios that occur i the average of ay fiite set of radom variables? I this example the sigificace of the differece betwee the two meas ca be tested usig the radom variable Y i defied as X 1i X i for i equal 1 through 5. The mea of the observed y i s is 5 1 y ( x1i xi) x1 x 5 i (7.01) ad their variace is 5 s 1 y x1i xi y ( ) ( ) 5 i 1 (7.0) Now, if the sample size, equal to 5 here, is sufficietly large, the t defied by y Y t µ (7.03) sy / has approximately a stadard ormal distributio. The closer the distributio of Y is to that of the ormal distributio, the faster the covergece of the distributio of t is to the stadard ormal distributio with icreasig. If X 1i X i is ormally distributed, which is ot the case here, the each Y i has a ormal distributio ad t has Studet s t-distributio. If E[X 1i ] E[X i ], the µ Y equals zero, ad upo substitutig the observed values of _ y ad s Y ito Equatio 7.13, oe obtais t (7.04) / 5 The probability of observig a value of t equal to 7.5 or smaller is less tha 0.1% if is sufficietly large that t is ormally distributed. Hece it appears very improbable that µ Y equals zero. This example provides a illustratio of the advatage of usig the same streamflow sequeces whe simulatig both policies. Suppose that differet streamflow sequeces were used i all the simulatios. The the expected value of Y would ot chage, but its variace would be give by Var( Y) E[ X1 X ( µ 1 µ )] E[( X ) 1 µ 1 ] E[( X1 µ 1) ( X µ ) ] E[( X µ ) ] σ Cov( X, X ) σ (7.05) where Cov(X 1, X ) E[(X 1 µ 1 )(X µ )] ad is the covariace of the two radom variables. The covariace betwee X 1 ad X will be zero if they are idepedetly distributed, as they would be if differet radomly geerated streamflow sequeces were used i each simulatio. Estimatig σ x 1 ad σ x by their sample estimates, a estimate of what the variace of Y would be if Cov(X 1, X ) were zero is σˆy x σ σ ( ) ( ) ( ) x x (7.06) The actual sample estimate σ Y equals 40; if idepedet streamflow sequeces are used i all simulatios, σ Y will take a value ear rather tha 40 (Equatio 7.0). A stadard deviatio of 0.119, with µ y 0, yields a value of the t test statistic y µ Y t / 5 (7.07) If t is ormally distributed, the probability of observig a value less tha.44 is about 0.8%. This illustrates that use of the same streamflow sequeces i the simulatio of both policies allows oe to better distiguish the differeces i the policies performace. By usig the same streamflow sequeces, or other radom iputs, oe ca costruct a simulatio experimet i which variatios i performace caused by differet radom iputs are cofused as little as possible with the differeces i performace caused by chages i the system s desig or operatig policy. x
56 Cocepts i Probability, Statistics ad Stochastic Modellig Coclusios This chapter has itroduced statistical cocepts that aalysts use to describe the radomess or ucertaity of their data. Most of the data used by water resources systems aalysts is ucertai. This ucertaity comes from ot uderstadig as well as we would like how our water resources systems (icludig their ecosystems) fuctio as well as ot beig able to forecast, perfectly, the future. It is that simple. We do ot kow the exact amouts, qualities ad their distributios over space ad time of either the supplies of water we maage or the water demads we try to meet. We also do ot kow the beefits ad costs, however measured, of ay actios we take to maage both water supply ad water demad. The chapter bega with a itroductio to probability cocepts ad methods for describig radom variables ad parameters of their distributios. It the reviewed some of the commoly used probability distributios ad how to determie the distributios of sample data, how to work with cesored ad partial duratio series data, methods of regioalizatio, stochastic processes ad time-series aalyses. The chapter cocluded with a itroductio to a rage of uivariate ad multivariate stochastic models that are used to geerate stochastic streamflow, precipitatio depths, temperatures ad evaporatio. These methods are used to geerate temporal ad spatial stochastic process that serve as iputs to stochastic simulatio models for system desig, for system operatios studies, ad for evaluatio of the reliability ad precisio of differet estimatio algorithms. The fial sectio of this chapter provides a example of stochastic simulatio, ad the use of statistical methods to summarize the results of such simulatios. This is merely a itroductio to some of the statistical tools available for use whe dealig with ucertai data. May of the cocepts itroduced i this chapter will be used i the chapters that follow o costructig ad implemetig various types of optimizatio, simulatio ad statistical models. The refereces cited i the referece sectio provide additioal ad more detailed iformatio. Although may of the methods preseted i this ad i some of the followig chapters ca describe may of the characteristics ad cosequeces of ucertaity, it is uclear as to whether or ot society kows exactly what to do with such iformatio. Nevertheless, there seems to be a icreasig demad from stakeholders ivolved i plaig processes for iformatio related to the ucertaity associated with the impacts predicted by models. The challege is ot oly to quatify that ucertaity, but also to commuicate it i effective ways that iform, ad ot cofuse, the decisio-makig process. 11. Refereces AYYUB, B.M. ad MCCUEN, R.H. 00. Probability, statistics, ad reliability for egieers ad scietists. Boca Rato, Chapma ad Hill, CRC Press. BARTOLINI, P. ad SALAS, J Modellig of streamflow processes at differet time scales. Water Resources Research, Vol. 9, No. 8, pp BATES, B.C. ad CAMPBELL, E.P A Markov chai Mote Carlo scheme for parameter estimatio ad iferece i coceptual raifall ruoff modellig. Water Resources Research, Vol. 37, No. 4, pp BEARD, L.R Probability estimates based o small ormal-distributio samples. Joural of Geophysical Research, Vol. 65, No. 7, pp BEARD, L.R Estimatig flood frequecy ad average aual damages. Joural of Water Resources Plaig ad Maagemet, Vol. 13, No., pp BENJAMIN, J.R. ad CORNELL, C.A Probability, statistics ad decisios for civil egieers. New York, McGraw-Hill. BICKEL, P.J. ad DOKSUM, K.A Mathematical statistics: basic ideas ad selected topics. Sa Fracisco, Holde-Day. BOBÉE, B The log Pearso type 3 distributio ad its applicatios i hydrology. Water Resources Research, Vol. 14, No., pp BOBÉE, B. ad ASHKAR, F The gamma distributio ad derived distributios applied i hydrology. Littleto Colo., Water Resources Press. BOBÉE, B. ad ROBITAILLE, R The use of the Pearso type 3 distributio ad log Pearso type 3
57 4 Water Resources Systems Plaig ad Maagemet distributio revisited. Water Resources Research, Vol. 13, No., pp BOX, G.E.P.; JENKINS, G.M. ad RISINSEL, G.C Times series aalysis: forecastig ad cotrol, 3rd Editio. New Jersey, Pretice-Hall. CAMACHO, F.; MCLEOD, A.I. ad HIPEL, K.W Cotemporaeous autoregressive-movig average (CARMA) modellig i water resources. Water Resources Bulleti, Vol. 1, No. 4, pp CARLIN, B.P. ad LOUIS, T.A Bayes ad empirical Bayes methods for data aalysis, d Editio. New York, Chapma ad Hall, CRC. CHOWDHURY, J.U. ad STEDINGER, J.R Cofidece itervals for desig floods with estimated skew coefficiet. Joural of Hydraulic Egieerig, Vol. 117, No. 7, pp CHOWDHURY, J.U.; STEDINGER, J.R. ad LU, L.H Goodess-of-fit tests for regioal GEV flood distributios. Water Resources Research, Vol. 7, No. 7, pp CLAPS, P Coceptual basis for stochastic models of mothly streamflows. I: J.B. Marco, R. Harboe ad J.D. Salas (eds). Stochastic hydrology ad its use i water resources systems simulatio ad optimizatio, Dordrecht, Kluwer Academic, pp CLAPS, P.; ROSSI, F. ad VITALE, C Coceptualstochastic modellig of seasoal ruoff usig autoregressive movig average models ad differet scales of aggregatio. Water Resources Research, Vol. 9, No. 8, pp COHN, T.A.; LANE, W.L. ad BAIER, W.G A algorithm for computig momets-based flood quatile estimates whe historical flood iformatio is available. Water Resources Research, Vol. 33, No. 9, pp COHN, C.A.; LANE, W.L. ad STEDINGER, J.R Cofidece itervals for EMA flood quatile estimates. Water Resources Research, Vol. 37, No. 6, pp CRAINICEANU, C.M.; RUPPERT, D.; STEDINGER, J.R. ad BEHR, C.T. 00. Improvig MCMC mixig for a GLMM describig pathoge cocetratios i water supplies, i case studies i Bayesia aalysis. New York, Spriger-Verlag. D AGOSTINO, R.B. ad STEPHENS, M.A Goodess-of-fit procedures. New York, Marcel Dekker. DAVID, H.A Order statistics, d editio. New York, Wiley. FIERING, M.B Streamflow sythesis. Cambridge, Mass., Harvard Uiversity Press. FILL, H. ad STEDINGER, J L-momet ad PPCC goodess-of-fit tests for the Gumbel distributio ad effect of autocorrelatio. Water Resources Research, Vol. 31, No. 1, pp FILL, H. ad STEDINGER, J Usig regioal regressio withi idex flood procedures ad a empirical Bayesia estimator. Joural of Hydrology, Vol. 10, Nos 1 4, pp FILLIBEN, J.J The probability plot correlatio test for ormality. Techometrics, Vol. 17, No. 1, pp FISHMAN, G.S Discrete-evet simulatio: modellig, programmig, ad aalysis. Berli, Spriger-Verlag. GABRIELE, S. ad ARNELL, N A hierarchical approach to regioal flood frequecy aalysis. Water Resources Research, Vol. 7, No. 6, pp GELMAN, A.; CARLIN, J.B.; STERN, H.S. ad RUBIN, D.B Bayesia data aalysis. Boca Rato, Chapma ad Hall, CRC. GILKS, W.R.; RICHARDSON, S. ad SPIEGELHALTER, D.J. (eds) Markov chai Mote Carlo i practice. Lodo ad New York, Chapma ad Hall. GIESBRECHT, F. ad KEMPTHORNE, O Maximum likelihood estimatio i the three-parameter log ormal distributio. Joural of the Royal Statistical Society B, Vol. 38, No. 3, pp GREENWOOD, J.A. ad DURAND, D Aids for fittig the gamma distributio by maximum likelihood. Techometrics, Vol., No. 1, pp GRIFFS, V.W.; STEDINGER, J.R. ad COHN, T.A LP3 quatile estimators with regioal skew iformatio ad low outlier adjustmets. Water Resources Research, Vol. 40, forthcomig. GRYGIER, J.C. ad STEDINGER, J.R Codesed disaggregatio procedures ad coservatio correctios. Water Resources Research, Vol. 4, No. 10, pp
58 Cocepts i Probability, Statistics ad Stochastic Modellig 5 GRYGIER, J.C. ad STEDINGER, J.R SPIGOT: a sythetic flow geeratio software package, user s maual ad techical descriptio, versio.6. Ithaca, N.Y., School of Civil ad Evirometal Egieerig, Corell Uiversity. GUMBEL, E.J Statistics of extremes. New York, Columbia Uiversity Press. GUPTA, V.K. ad DAWDY, D.R. 1995a. Physical iterpretatio of regioal variatios i the scalig expoets of flood quatiles. Hydrological Processes, Vol. 9, Nos. 3 4, pp GUPTA, V.K. ad DAWDY, D.R. 1995b. Multiscalig ad skew separatio i regioal floods. Water Resources Research, Vol. 31, No. 11, pp GUPTA, V.K.; MESA, O.J. ad DAWDY, D.R Multiscalig theory of flood peaks: regioal quatile aalysis. Water Resources Research, Vol. 30, No. 1, pp HAAN, C.T Statistical methods i hydrology. Ames, Iowa, Iowa State Uiversity Press. HAAS, C.N. ad SCHEFF, P.A Estimatio of averages i trucated samples. Evirometal Sciece ad Techology, Vol. 4, No. 6, pp HELSEL, D.R Less tha obvious: statistical treatmet of data below the detectio limit. Eviro. Sci. ad Techol., Vol. 4, No. 1, pp HELSEL, D.R. ad COHN, T.A Estimatio of descriptive statistics for multiple cesored water quality data. Water Resources Research, Vol. 4, No. 1, pp HIRSCH, R.M Sythetic hydrology ad water supply reliability. Water Resources Research, Vol. 15, No. 6, pp HIRSCH, R.M. ad STEDINGER, J.R Plottig positios for historical floods ad their precisio. Water Resources Research, Vol. 3, No. 4, pp HOSHI, K.; BURGES, S.J. ad YAMAOKA, I Reservoir desig capacities for various seasoal operatioal hydrology models. Proceedigs of the Japaese Society of Civil Egieers, No. 73, pp HOSHI, K.; STEDINGER, J.R. ad BURGES, S Estimatio of log ormal quatiles: Mote Carlo results ad first-order approximatios. Joural of Hydrology, Vol. 71, Nos 1, pp HOSKING, J.R.M L-momets: aalysis ad estimatio of distributios usig liear combiatios of order statistics. Joural of Royal Statistical Society, B, Vol. 5, No., pp HOSKING, J.R.M. ad WALLIS, J.R Parameter ad quatile estimatio for the geeralized Pareto distributio. Techometrics, Vol. 9, No. 3, pp HOSKING, J.R.M. ad WALLIS, J.R A compariso of ubiased ad plottig-positio estimators of L-momets. Water Resources Research, Vol. 31, No. 8, pp HOSKING, J.R.M. ad WALLIS, J.R Regioal frequecy aalysis: a approach based o L-momets. Cambridge, Cambridge Uiversity Press. HOSKING, J.R.M.; WALLIS, J.R. ad WOOD, E.F Estimatio of the geeralized extreme-value distributio by the method of probability weighted momets. Techometrics, Vol. 7, No. 3, pp IACWD (Iteragecy Advisory Committee o Water Data) Guidelies for determiig flood flow frequecy, Bulleti 17B. Resto, Va., US Departmet of the Iterior, US Geological Survey, Office of Water Data Coordiatio. JENKINS, G.M. ad WATTS, D.G Spectral Aalysis ad its Applicatios. Sa Fracisco, Holde-Day. KENDALL, M.G. ad STUART, A The advaced theory of statistics, Vol. 3. New York, Hafer. KIRBY, W Algebraic boudess of sample statistics. Water Resources Research, Vol. 10, No., pp. 0. KIRBY, W Computer orieted Wilso Hilferty trasformatio that preserves the first 3 momets ad lower boud of the Pearso Type 3 distributio. Water Resources Research, Vol. 8, No. 5, pp KITE, G.W Frequecy ad risk aalysis i hydrology. Littleto, Colo. Water Resources Publicatios. KOTTEGODA, M. ad ROSSO, R Statistics, probability, ad reliability for civil ad evirometal egieers. New York, McGraw-Hill.
59 6 Water Resources Systems Plaig ad Maagemet KOUTSOYIANNIS, D. ad MANETAS, A Simple disaggregatio by accurate adjustig procedures. Water Resources Research, Vol. 3, No. 7, pp KROLL, K. ad STEDINGER, J.R Estimatio of momets ad quatiles with cesored data. Water Resources Research, Vol. 3, No. 4, pp KUCZERA, G Comprehesive at-site flood frequecy aalysis usig Mote Carlo Bayesia iferece. Water Resources Research, Vol. 35, No. 5, pp KUCZERA, G Effects of samplig ucertaity ad spatial correlatio o a empirical Bayes procedure for combiig site ad regioal iformatio. Joural of Hydrology, Vol. 65, No. 4, pp KUCZERA, G Combiig site-specific ad regioal iformatio: a empirical Bayesia approach. Water Resources Research, Vol. 18, No., pp LANDWEHR, J.M.; MATALAS, N.C. ad WALLIS, J.R Some comparisos of flood statistics i real ad log space. Water Resources Research, Vol. 14, No. 5, pp LANDWEHR, J.M.; MATALAS, N.C. ad WALLIS, J.R Probability weighted momets compared with some traditioal techiques i estimatig Gumbel parameters ad quatiles. Water Resources Research, Vol. 15, No. 5, pp LANE, W Applied stochastic techiques (users maual). Dever, Colo., Bureau of Reclamatio, Egieerig ad Research Ceter, December. LANGBEIN, W.B Aual floods ad the partial duratio flood series, EOS. Trasactios of the America Geophysical Uio, Vol. 30, No. 6, pp LEDOLTER, J The aalysis of multivariate time series applied to problems i hydrology. Joural of Hydrology, Vol. 36, No. 3 4, pp LETTENMAIER, D.P. ad BURGES, S.J. 1977a. Operatioal assessmet of hydrological models of log-term persistece. Water Resources Research, Vol. 13, No. 1, pp LETTENMAIER, D.P. ad BURGES, S.J. 1977b. A operatioal approach to preservig skew i hydrological models of log-term persistece. Water Resources Research, Vol. 13, No., pp LETTENMAIER, D.P.; WALLIS, J.R. ad WOOD, E.F Effect of regioal heterogeeity o flood frequecy estimatio. Water Resources Research, Vol. 3, No., pp MACBERTHOUEX, P. ad BROWN, L.C. 00. Statistics for evirometal egieers, d Editio. Boca Rato, Fla., Lewis, CRC Press. MADSEN, H.; PEARSON, C.P.; RASMUSSEN, P.F. ad ROSBJERG, D. 1997a. Compariso of aual maximum series ad partial duratio series methods for modellig extreme hydrological evets 1: at-site modellig. Water Resources Research, Vol. 33, No. 4, pp MADSEN, H.; PEARSON, C.P. ad ROSBJERG, D. 1997b. Compariso of aual maximum series ad partial duratio series methods for modellig extreme hydrological evets : regioal modellig. Water Resources Research, Vol. 33, No. 4, pp MADSEN, H. ad ROSBJERG, D. 1997a. The partial duratio series method i regioal idex flood modellig. Water Resources Research, Vol. 33, No. 4, pp MADSEN, H. ad ROSBJERG, D. 1997b. Geeralized least squares ad empirical Bayesia estimatio i regioal partial duratio series idex-flood modellig. Water Resources Research, Vol. 33, No. 4, pp MAHEEPALA, S. ad PERERA, B.J.C Mothly hydrological data geeratio by disaggregatio. Joural of Hydrology, No. 178, MARCO, J.B.; HARBOE, R. ad SALAS, J.D. (eds) Stochastic hydrology ad its use i water resources systems simulatio ad optimizatio. NATO ASI Series. Dordrecht, Kluwer Academic. MARTINS, E.S. ad STEDINGER, J.R Geeralized maximum likelihood GEV quatile estimators for hydrological data. Water Resources Research, Vol. 36, No. 3, pp MARTINS, E.S. ad STEDINGER, J.R. 001a. Historical iformatio i a GMLE-GEV framework with partial duratio ad aual maximum series. Water Resources Research, Vol. 37, No. 10, pp MARTINS, E.S. ad STEDINGER, J.R. 001b. Geeralized maximum likelihood Pareto-Poisso flood
60 Cocepts i Probability, Statistics ad Stochastic Modellig 7 risk aalysis for partial duratio series. Water Resources Research, Vol. 37, No. 10, pp MATALAS, N.C Mathematical assessmet of sythetic hydrology. Water Resources Research, Vol. 3, No. 4, pp MATALAS, N.C. ad WALLIS, J.R Eureka! It fits a Pearso type 3 distributio. Water Resources Research, Vol. 9, No. 3, pp MATALAS, N.C. ad WALLIS, J.R Geeratio of sythetic flow sequeces. I: A.K. Biswas (ed.), Systems approach to water maagemet. New York, McGraw-Hill. MEJIA, J.M. ad ROUSSELLE, J Disaggregatio models i hydrology revisited. Water Resources Research, Vol. 1, No., pp NERC (Natural Evirometal Research Coucil) Flood studies report, Vol. 1: hydrological studies. Lodo. NORTH, M Time-depedet stochastic model of floods. Joural of the Hydraulics Divisio, ASCE, Vol. 106, No. HY5, pp O CONNELL, D.R.H.; OSTENAA, D.A.; LEVISH, D.R. ad KLINGER, R.E. 00. Bayesia flood frequecy aalysis with paleohydrological boud data. Water Resources Research, Vol. 38, No. 5, pp O CONNELL, P.E ARMA models i sythetic hydrology. I: T.A. Ciriai, V. Maioe ad J.R. Wallis (eds), Mathematical models for surface water hydrology. New York, Wiley. PRESS, W.H.; FLANNERY, B.P.; TEUKOLSKY, S.A. ad VETTERLING, W.T Numerical recipes: the art of scietific computig. Cambridge, UK, Cambridge Uiversity Press. RAIFFA, H. ad SCHLAIFER, R Applied statistical decisio theory. Cambridge, Mass., MIT Press. RASMUSSEN, P.F. ad ROSBJERG, D Risk estimatio i partial duratio series. Water Resources Research, Vol. 5, No. 11, pp RASMUSSEN, P.F. ad ROSBJERG, D. 1991a. Evaluatio of risk cocepts i partial duratio series. Stochastic Hydrology ad Hydraulics, Vol. 5, No. 1, pp RASMUSSEN, P.F. ad ROSBJERG, D. 1991b. Predictio ucertaity i seasoal partial duratio series. Water Resources Research, Vol. 7, No. 11, pp RASMUSSEN, R.F.; SALAS, J.D.; FAGHERAZZI, L.; RASSAM.; J.C. ad BOBÉE, R Estimatio ad validatio of cotemporaeous PARMA models for streamflow simulatio. Water Resources Research, Vol. 3, No. 10, pp ROBINSON, J.S. ad SIVAPALAN, M Temporal scales ad hydrological regimes: implicatios for flood frequecy scalig. Water Resources Research, Vol. 33, No. 1, pp ROSBJERG, D Estimatio i Partial duratio series with idepedet ad depedet peak values. Joural of Hydrology, No. 76, pp ROSBJERG, D. ad MADSEN, H Desig with ucertai desig values, I: H. Wheater ad C. Kirby (eds), Hydrology i a chagig eviromet, New York, Wiley. Vol. 3, pp ROSBJERG, D.; MADSEN, H. ad RASMUSSEN, P.F Predictio i partial duratio series with geeralized Pareto-distributed exceedaces. Water Resources Research, Vol. 8, No. 11, pp SALAS, J.D Aalysis ad modellig of hydrological time series. I: D. Maidmet (ed.), Hadbook of hydrology, Chapter 17. New York, McGraw-Hill. SALAS, J.D.; DELLEUR, J.W.; YEJEVICH, V. ad LANE, W.L Applied modellig of hydrological time series. Littleto, Colo., Water Resources Press Publicatios. SALAS, J.D. ad FERNANDEZ, B Models for data geeratio i hydrology: uivariate techiques. I: J.B. Marco, R. Harboe ad J.D. Salas (eds), Stochastic hydrology ad its use i water resources systems simulatio ad optimizatio, Dordrecht, Kluwer Academic. pp SALAS, J.D. ad OBEYSEKERA, J.T.B Coceptual basis of seasoal streamflow time series. Joural of Hydraulic Egieerig, Vol. 118, No. 8, pp SCHAAKE, J.C. ad VICENS, G.J Desig legth of water resource simulatio experimets. Joural of Water Resources Plaig ad Maagemet, Vol. 106, No. 1, pp
61 8 Water Resources Systems Plaig ad Maagemet SLACK, J.R.; WALLIS, J.R. ad MATALAS, N.C O the value of iformatio i flood frequecy aalysis. Water Resources Research, Vol. 11, No. 5, pp SMITH, R.L Threshold methods for sample extremes. I: J. Tiago de Oliveira (ed.), Statistical extremes ad applicatios, Dordrecht, D. Reidel. pp STEDINGER, J.R Fittig log ormal distributios to hydrological data. Water Resources Research, Vol. 16, No. 3, pp STEDINGER, J.R Estimatig correlatios i multivariate streamflow models. Water Resources Research, Vol. 17, No. 1, pp STEDINGER, J.R Estimatig a regioal flood frequecy distributio. Water Resources Research, Vol. 19, No., pp STEDINGER, J.R Expected probability ad aual damage estimators. Joural of Water Resources Plaig ad Maagemet, Vol. 13, No., pp [With discussio, Leo R. Beard, Joural of Water Resources Plaig ad Maagemet, Vol. 14, No. 6, pp ] STEDINGER, J.R Flood frequecy aalysis ad statistical estimatio of flood risk. I: E.E. Wohl (ed.), Ilad flood hazards: huma, riparia ad aquatic commuities, Chapter 1. Staford, UK, Cambridge Uiversity Press. STEDINGER, J.R. ad BAKER, V.R Surface water hydrology: historical ad paleoflood iformatio. Reviews of Geophysics, Vol. 5, No., pp STEDINGER, J.R. ad COHN, T.A Flood Frequecy aalysis with historical ad paleoflood iformatio. Water Resources Research, Vol., No. 5, pp STEDINGER, J.R. ad LU, L Appraisal of regioal ad idex flood quatile estimators. Stochastic Hydrology ad Hydraulics, Vol. 9, No. 1, pp STEDINGER, J.R.; PEI, D. ad COHN, T.A A disaggregatio model for icorporatig parameter ucertaity ito mothly reservoir simulatios. Water Resources Research, Vol. 1, No. 5, pp STEDINGER, J.R. ad TAYLOR, M.R. 198a. Sythetic streamflow geeratio. Part 1: model verificatio ad validatio. Water Resources Research, Vol. 18, No. 4, pp STEDINGER, J.R. ad TAYLOR, M.R. 198b. Sythetic streamflow geeratio. Part : effect of parameter ucertaity. Water Resources Research, Vol. 18, No. 4, pp STEDINGER, J.R. ad VOGEL, R Disaggregatio procedures for the geeratio of serially correlated flow vectors. Water Resources Research, Vol. 0, No. 1, pp STEDINGER, J.R.; VOGEL, R.M. ad FOUFOULA- GEORGIOU, E Frequecy aalysis of extreme evets, I: D. Maidmet (ed.), Hadbook of hydrology, Chapter 18. New York, McGraw-Hill. STEPHENS, M Statistics for Goodess of Fit, Joural of the America Statistical Associatio, Vol. 69, pp TAO, P.C. ad DELLEUR, J.W Multistatio, multiyear sythesis of hydrological time series by disaggregatio. Water Resources Research, Vol. 1, No. 6, pp TARBOTON, D.G.; SHARMA, A. ad LALL, U Disaggregatio procedures for stochastic hydrology based o oparametric desity estimatio. Water Resources Research, Vol. 34, No. 1, pp TASKER, G.D. ad STEDINGER, J.R Estimatig geeralized skew with weighted least squares regressio. Joural of Water Resources Plaig ad Maagemet, Vol. 11, No., pp THOM, H.C.S A ote o the gamma distributio. Mothly Weather Review, Vol. 86, No. 4, pp THOMAS, H.A., JR. ad FIERING, M.B Mathematical sythesis of streamflow sequeces for the aalysis of river basis by simulatio. I: A. Maass, M.M. Hufschmidt, R. Dorfma, H.A. Thomas, Jr., S.A. Margli ad G.M. Fair (eds), Desig of water resources systems. Cambridge, Mass., Harvard Uiversity Press. THYER, M. ad KUCZERA, G Modellig log-term persistece i hydroclimatic time series usig a hidde state Markov model. Water Resources Research, Vol. 36, No. 11, pp VALDES, J.R.; RODRIGUEZ, I. ad VICENS, G Bayesia geeratio of sythetic streamflows : the
62 Cocepts i Probability, Statistics ad Stochastic Modellig 9 multivariate case. Water Resources Research, Vol. 13, No., pp VALENCIA, R. ad SCHAAKE, J.C., Jr Disaggregatio processes i stochastic hydrology. Water Resources Research, Vol. 9, No. 3, pp VICENS, G.J.; RODRÍGUEZ-ITURBE, I. ad SCHAAKE, J.C., Jr A Bayesia framework for the use of regioal iformatio i hydrology. Water Resources Research, Vol. 11, No. 3, pp VOGEL, R.M The probability plot correlatio coefficiet test for the ormal, logormal, ad Gumbel distributioal hypotheses. Water Resources Research, Vol., No. 4, pp VOGEL, R.M. ad FENNESSEY, N.M L-momet diagrams should replace product momet diagrams. Water Resources Research, Vol. 9, No. 6, pp VOGEL, R.M. ad MCMARTIN, D.E Probability plot goodess-of-fit ad skewess estimatio procedures for the Pearso type III distributio. Water Resources Research, Vol. 7, No. 1, pp VOGEL, R.M. ad SHALLCROSS, A.L The movig blocks bootstrap versus parametric time series models. Water Resources Research, Vol. 3, No. 6, pp VOGEL, R.M. ad STEDINGER, J.R The value of stochastic streamflow models i over-year reservoir desig applicatios. Water Resources Research, Vol. 4, No. 9, pp WALLIS, J.R Risk ad ucertaities i the evaluatio of flood evets for the desig of hydraulic structures. I: E. Guggio, G. Rossi ad E. Todii (eds), Piee e Siccita, Cataia, Italy, Fodazioe Politecica del Mediterraeo. pp WALLIS, J.R.; MATALAS, N.C. ad SLACK, J.R. 1974a. Just a Momet! Water Resources Research, Vol. 10, No., pp WALLIS, J.R.; MATALAS, N.C. ad SLACK, J.R. 1974b. Just a momet! Appedix. Sprigfield, Va., Natioal Techical Iformatio Service, PB WANG, Q A. Bayesia joit probability approach for flood record augmetatio. Water Resources Research, Vol. 37, No. 6, pp WANG, Q.J LH momets for statistical aalysis of extreme evets. Water Resources Research, Vol. 33, No. 1, pp WILK, M.B. ad GNANADESIKAN, R Probability plottig methods for the aalysis of data. Biometrika, Vol. 55, No. 1, pp WILKS, D.S. 00. Realizatios of daily weather i forecast seasoal climate. Joural of Hydrometeorology, No. 3, pp WILKS, D.S Multi-site geeralizatio of a daily stochastic precipitatio geeratio model. Joural of Hydrology, No. 10, pp YOUNG, G.K Discussio of Mathematical assessmet of sythetic hydrology by N.C. Matalas ad reply. Water Resources Research, Vol. 4, No. 3, pp ZELLNER, A A itroductio to Bayesia iferece i ecoometrics, New York, Wiley. ZRINJI, Z. ad BURN, D.H Flood Frequecy aalysis for ugauged sites usig a regio of ifluece approach. Joural of Hydrology, No. 13, pp. 1 1.
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
PSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
Case Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
Confidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
Hypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
Output Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
Chapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
Measures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
Subject CT5 Contingencies Core Technical Syllabus
Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value
Maximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
Normal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
Exploratory Data Analysis
1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios
University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
Statistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
Modified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)
CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:
Center, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.
GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
Determining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
Institute of Actuaries of India Subject CT1 Financial Mathematics
Istitute of Actuaries of Idia Subject CT1 Fiacial Mathematics For 2014 Examiatios Subject CT1 Fiacial Mathematics Core Techical Aim The aim of the Fiacial Mathematics subject is to provide a groudig i
MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)
MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:
0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
A probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
Chapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
CHAPTER 3 THE TIME VALUE OF MONEY
CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all
Research Method (I) --Knowledge on Sampling (Simple Random Sampling)
Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
Overview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
Quadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about
INVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
Data Analysis and Statistical Behaviors of Stock Market Fluctuations
44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:
Now here is the important step
LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"
Hypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
Lesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig
Chapter XIV: Fundamentals of Probability and Statistics *
Objectives Chapter XIV: Fudametals o Probability ad Statistics * Preset udametal cocepts o probability ad statistics Review measures o cetral tedecy ad dispersio Aalyze methods ad applicatios o descriptive
The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
Present Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC [email protected] May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships
Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the
Analyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
Soving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
LECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.
Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.
HCL Dynamic Spiking Protocol
ELI LILLY AND COMPANY TIPPECANOE LABORATORIES LAFAYETTE, IN Revisio 2.0 TABLE OF CONTENTS REVISION HISTORY... 2. REVISION.0... 2.2 REVISION 2.0... 2 2 OVERVIEW... 3 3 DEFINITIONS... 5 4 EQUIPMENT... 7
CHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:
A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio
hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation
HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics
The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
A Mathematical Perspective on Gambling
A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal
Sequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval
Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio
Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean
1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.
1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
Math C067 Sampling Distributions
Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters
CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations
CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad
NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,
NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical
Asymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION
www.arpapress.com/volumes/vol8issue2/ijrras_8_2_04.pdf CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION Elsayed A. E. Habib Departmet of Statistics ad Mathematics, Faculty of Commerce, Beha
W. Sandmann, O. Bober University of Bamberg, Germany
STOCHASTIC MODELS FOR INTERMITTENT DEMANDS FORECASTING AND STOCK CONTROL W. Sadma, O. Bober Uiversity of Bamberg, Germay Correspodig author: W. Sadma Uiversity of Bamberg, Dep. Iformatio Systems ad Applied
Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
Incremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
, a Wishart distribution with n -1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets
BENEIT-CST ANALYSIS iacial ad Ecoomic Appraisal usig Spreadsheets Ch. 2: Ivestmet Appraisal - Priciples Harry Campbell & Richard Brow School of Ecoomics The Uiversity of Queeslad Review of basic cocepts
Estimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
The Stable Marriage Problem
The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV [email protected] 1 Itroductio Imagie you are a matchmaker,
NATIONAL SENIOR CERTIFICATE GRADE 11
NATIONAL SENIOR CERTIFICATE GRADE MATHEMATICS P EXEMPLAR 007 MARKS: 50 TIME: 3 hours This questio paper cosists of pages, 4 diagram sheets ad a -page formula sheet. Please tur over Mathematics/P DoE/Exemplar
Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
Ekkehart Schlicht: Economic Surplus and Derived Demand
Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/
Unbiased Estimation. Topic 14. 14.1 Introduction
Topic 4 Ubiased Estimatio 4. Itroductio I creatig a parameter estimator, a fudametal questio is whether or ot the estimator differs from the parameter i a systematic maer. Let s examie this by lookig a
THE HEIGHT OF q-binary SEARCH TREES
THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
How to read A Mutual Fund shareholder report
Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.
Installment Joint Life Insurance Actuarial Models with the Stochastic Interest Rate
Iteratioal Coferece o Maagemet Sciece ad Maagemet Iovatio (MSMI 4) Istallmet Joit Life Isurace ctuarial Models with the Stochastic Iterest Rate Nia-Nia JI a,*, Yue LI, Dog-Hui WNG College of Sciece, Harbi
Confidence Intervals for Linear Regression Slope
Chapter 856 Cofidece Iterval for Liear Regreio Slope Itroductio Thi routie calculate the ample ize eceary to achieve a pecified ditace from the lope to the cofidece limit at a tated cofidece level for
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
Descriptive Statistics
Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote
THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
Example: Probability ($1 million in S&P 500 Index will decline by more than 20% within a
Value at Risk For a give portfolio, Value-at-Risk (VAR) is defied as the umber VAR such that: Pr( Portfolio loses more tha VAR withi time period t)
Department of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
Section 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
Confidence intervals and hypothesis tests
Chapter 2 Cofidece itervals ad hypothesis tests This chapter focuses o how to draw coclusios about populatios from sample data. We ll start by lookig at biary data (e.g., pollig), ad lear how to estimate
Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio
Basic Elements of Arithmetic Sequences and Series
MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic
Basic Data Analysis Principles. Acknowledgments
CEB - Basic Data Aalysis Priciples Basic Data Aalysis Priciples What to do oce you get the data Whe we reaso about quatitative evidece, certai methods for displayig ad aalyzig data are better tha others.
5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
Engineering Data Management
BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package
INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology
Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology
