Keywords: L-moments, frequency analysis, goodness-of-fit test, stochastic simulation, acceptance regions, Pearson type III distribution.

Size: px
Start display at page:

Download "Keywords: L-moments, frequency analysis, goodness-of-fit test, stochastic simulation, acceptance regions, Pearson type III distribution."

Transcription

1 Establishing acceptance regions for L-moments based goodness-of-fit tests for the Pearson type III distribution Yii-Chen Wu, Jun-Jih Liou, Ke-Sheng Cheng* Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. *Corresponding author. Tel: +---1, Fax: +---, rslab@ntu.edu.tw. Abstract Goodness-of-fit tests based on the L-moment-ratios diagram for selection of appropriate distributions for hydrological variables have had many applications in recent years. For such applications, sample-size-dependent acceptance regions need to be established in order to take into account the uncertainties induced by sample L-skewness and L-kurtosis. Acceptance regions of two-parameter distributions such as the normal and Gumbel distributions have been developed. However, many hydrological variables are better characterized by three-parameter distributions such as the Pearson type III and generalized extreme value distributions. Establishing acceptance regions for these three-parameter distributions is more complicated since their L-moment-ratios diagrams plot as curves, instead of unique points for two-parameter distributions. Through stochastic simulation we established sample-size-dependent % acceptance regions for the Pearson type III distribution. The proposed approach involves two key elements the conditional distribution of population L-skewness given a sample L-skewness and the conditional distribution of sample L-kurtosis given a sample L-skewness. The established % acceptance regions of the Pearson type III distribution were further validated through two types of validity check, and were found to be applicable for goodness-of-fit tests for random samples of any sample size between 0 and 00 and coefficient of skewness not exceeding.0. Keywords: L-moments, frequency analysis, goodness-of-fit test, stochastic simulation, acceptance regions, Pearson type III distribution. Introduction Hydrological frequency analysis requires identifying adequate types of distribution for collected random samples. This is commonly done by conducting goodness-of-fit (GOF) tests. Traditionally the chi-square test and the Kolmogorov-Smirnov test are often used for selection of probability distributions for hydrological variables (Haan, 00). Another method of goodness-of-fit test is the method based on ordinary moment-ratio diagrams 1

2 (D Agostino and Stephen, 1). Moment ratios are unique properties of probability distributions and sample moment ratios of ordinary skewness and kurtosis have been used for selection of probability distribution (Kottegoda, ; D Agostino and Stephen, 1). Nevertheless, Tsukatani and Shigemitsu () indicated that ordinary moment-ratio diagrams are not always appropriate because estimation of the fourth moment lacks sufficient accuracy. Thus, they proposed using a combination of variance coefficient and skewness coefficient (see Tsukatani and Shigemitsu () for definitions) which can be derived using the first three moments of the data for GOF test of different types of Pearson distribution. Hosking and Wallis (1) demonstrated that the L-skewness and L-kurtosis are much less biased than the ordinary skewness and kurtosis. As a result, the skewness and kurtosis L-moment-ratio diagram (LMRD) has been suggested as a useful tool for discrimination between candidate distributions (Hosking, ; Hosking and Wallis, 1, Vogel and Fennesset, 1, Hosking and Wallis, 1). L-moment-ratio diagrams of different probability distributions have been derived by Hosking and Wallis (1). As shown in Figure 1, a two-parameter distribution with a location and a scale parameter (such as the normal or Gumbel distribution) plots as a single point on the LMRD, whereas a three-parameter distribution (such as the Pearson type III (PE) distribution or the generalized extreme value (GEV) distribution) with location, scale and shape parameters plots as a curve on the LMRD. However, these theoretical points or curves of various probability distributions on the LMRD are somewhat limited in practical applications since they cannot accommodate for uncertainties induced by parameter estimation using random samples. To circumvent the above problem, Liou et al. (00) established sample-size-dependent acceptance regions for L-moments-based GOF tests for the normal and Gumbel distributions using stochastic simulation. As illustrated in Figure, the acceptance regions are characterized by a set of acceptance ellipses which corresponds to various sample sizes. Empirical formulas were also developed to determine the acceptance regions for any sample size within the [0, 00] range. While the acceptance regions established for the normal and Gumbel distributions may be useful for many applications, hydrological frequency analysis often encounters variables which are better characterized by three-parameter distributions with location, scale and shape parameters. For example, the U.S. Water Resources Council recommended the log Pearson type III (LPE) distribution be used for flood flow frequency analysis (US-WRC, ). In Taiwan the Log-Pearson type III distribution (LPE) was also suggested for rainfall frequency analysis (TW-WRA, 000). Flood flow in Northern Tunisia was shown to be best represented by the GEV distribution (Abida and Elouze, 00). To our knowledge, acceptance regions for L-moments-based

3 GOF tests for three-parameter distributions such as PE and GEV have not been established. Establishing LMRD acceptance regions for three-parameter distributions is more complicated than that for normal and Gumbel distributions. For a two-parameter distribution with a location and a scale parameter, there exists a unique point characterizing its LMRD. Whereas for a three-parameter distribution with location, scale and shape parameters, the LMRD plots as a curve and each point on the curve represents a valid parameter vector. Following the same concept of establishing acceptance region for the normal distribution, it appears that the acceptance region for L-moments-based GOF test of a three-parameter distribution will vary with sample size and the L-skewness. Thus, the objectives of this study are: (1) to demonstrate how the acceptance region of L-moments- based GOF test of a three-parameter distribution can be established, and () to investigate the effect of sample size on the acceptance region using stochastic simulation. The PE distribution is chosen to exemplify the proposed method. This paper is organized as follows. Section L-moments and the L-moment ratio diagram of the PE distribution defines the L-moment-ratios, their probability-weighted-moment estimators, and the L-moment-ratio diagram of the PE distribution. Section Stochastic simulation of the Pearson type III distribution describes details of random sample generation for the PE distribution using stochastic simulation. Detailed discussions on the biasedness of the sample L-skewness and L-kurtosis are also given. Section Establishing acceptance regions for GOF test of the PE distribution explains the rationale for establishing the acceptance regions and gives a detailed account on how the % acceptance regions of GOF tests based on L-moment-ratios of the PE distribution can be established. Section Validation of the LMRD acceptance regions describes the procedures and results of two types of validity check of the sample-size-dependent % acceptance regions. An illustrative example is also given to demonstrate utilization of the sample-size-dependent acceptance regions for GOF test. L-moments and the L-moment ratio diagram of the PE distribution Since its first introduction by Hosking () applications of L-moments in the field of hydrology have been ever increasing. L-moments are defined as linear combinations of expected values of order statistics of a random variable. In terms of linear combination of order statistics, the first four L-moments ( r, r 1,,, ) can be expressed by E 1 X 1:1 1 E X : X 1: (1) ()

4 X E : 1 where X X : X E : X : X k : n 1: X : X 1: is the k-th order statistic from a random sample of size n. Similar to the ordinary moment ratios, the L-moment ratios are defined by r, r,, r Explicit expressions for the relationships between L-skewness ( ) and L-kurtosis ( ), i.e. the L-moment ratio diagram, for commonly used probability distributions have been given by Hosking () and can be expressed by the following polynomial approximations: k 0 A k k For the PE distribution, A 0 =0.1, A =0.0, A =0.1, A =-0., and A =0.1. All other A i values are zero. L-moment ratios can be estimated from random samples using the probability-weighted- moment estimator or the plotting-position estimator (Hosking and Wallis, 1; Liou et al., 00). In general, the probability-weighted-moment estimators are preferred over the plotting-position estimators since they have generally lower bias (Hosking and Wallis, 1). In this study only the probability-weighted-moment estimators were calculated and used for construction of acceptance regions of LMRD. Given a random samplex, x, 1, x n, an unbiased estimator of the probability weighted moment is given by r 1 j 1 j j r x n 1 n n r j n n b r n j r 1 : The sample L-moments ( ) r and sample L-moment ratios ( t r ) can then be calculated by 1 b 0 () b1 b 0 () b b1 b 0 () 0b 0b 1b1 b 0 () t (1) r r Stochastic simulation of the Pearson type III distribution In this study we chose the standard PE distribution (zero mean and unit standard deviation) to exemplify the proposed approach for construction of acceptance regions for () () () () ()

5 GOF tests of a three-parameter distribution based on the L-moment ratio diagram. Probability density function of the PE distribution is expressed as x (1) 1 1 x f ( x) e, x ( ) where,, and are respectively the scale, shape, and location parameters. These parameters are related to the expected value (), standard deviation (), and coefficient of skewness () of the distribution through the following equations: / (1) / (1) (1) For the PE distribution, rational-function approximations can be used to express L-moment-ratios and as functions of the shape factor (Hosking and Wallis, 1). If 1 (or equivalently ), C 1/ 0 1 A0 A 1 A A 1 1B B 1 C 1 C C ; 1D D if 1 (or equivalently ), 1E 1 E E 1F F F 1 1G G G 1. 1H 1 H H Coefficients of the above equations are shown in Table 1. Apparently, among the three distribution parameters only the shape factor affects L-moment ratiosand. From Eq. (1) we see that the shape factor is totally dependent on the coefficient of skewness. Thus, stochastic simulation of the PE distribution was conducted by setting = 0, = 1, and varying from 0.0 to at an increment of 0.0 and an additional case of = 0.001, making a total of 1 parameter settings. Such parameter settings cover a wide range of L-skewness ( ) and corresponding L-kurtosis. For each case of parameter setting (= 0, = 1, ), it corresponds to a specific combination of the scale, shape, and location parameters ( /, PE distribution are generated. /, / (1) (1) (1) (0) ) from which random samples of the

6 In order to assess the effect of sample size on acceptance regions, for each case of the 1 parameter settings, 0,000 random samples were generated with respect to each of the 1 sample sizes n = 0,,,,, 0,,,,, 0, 0, 0, 0, 0,, 00, 0 and 00. Such simulations resulted in a total of () sets of (n, γ) combinations, each having 0,000 random samples. For every (n, γ) combination, sample L-skewness (t ) and L-kurtosis (t ) were calculated for each of the 0,000 samples, using the probability- weighted-moment estimator (Eq. (1)). Figures and respectively illustrate the scattering of sample L-skewness and L-kurtosis, with sample size n = 0 and 0, using 0,000 random samples of the standard PE distributions of various coefficients of skewness. Also shown in Figures and are the % equiprobable density contours of (t, t ), assuming t and t form a bivariate normal distribution. For smaller skewness (for example, 0. 1for sample size n = 0), scattering of (t, t ) approximates a bivariate normal distribution. As the coefficient of skewness increases, approximation by the bivariate normal distribution becomes less satisfactory. In particular, for n = 0 and, the % equiprobable density contour even covers an area with no (t, t ) points. For each case of (n, γ) combination, means of the 0,000 sample estimates of L-skewness and L-kurtosis, respectively represented by (ˆ t ) and (ˆ t ), were calculated (see Table ). For a specific coefficient of skewness, biases of the probability-weighted-moment estimators (t and t ) decrease with increasing sample size, as shown in Figure. From simulated random samples, we found that biases of t and t are dependent on both the sample size n and coefficient of skewness γ(or equivalently L-skewness ) and can be respectively approximated by b(, t, bias) t, bias(, n) E( t) (ˆ t), n b (, t, bias) (1a) (1b) and t, bias b(, t, bias) (, n) E( t) (ˆ t), n b. (, t, bias) (a) (b) The regression coefficients b(γ,t,bias ) and b(γ,t,bias ) only depend on the coefficient of skewness (see Figure ) and Eqs. (1a) (b) can be used for calculating biases of t and t for any sample size n varying between 0 and 00 and coefficient of skewness γnot exceeding.0. It is noteworthy that for the PE distribution t is a biased estimator of with negative bias for all 0 < γ. In other words, t tends to underestimate although the

7 bias is generally negligible. In contrast, t is positively biased for smaller skewness (γ< 0.) and negatively biased for 0. < γ. Establishing acceptance regions for GOF test of the PE distribution Rationale for establishment of acceptance regions For the normal distribution, acceptance regions for L-moment-ratio-based GOF test can be expressed by a set of unique ellipses which almost centered on the theoretical point of (, 0. 1), and depend only on the sample size (Liou et al., 00). However, 0 LMRD of the three-parameter PE distribution plots as a curve and every point on the curve characterizes a specific set of distribution parameter. Even if the scattering of (t, t ) can be well approximated by a bivariate normal distribution, it will be necessary to establish 1 different set of ellipses for different points of, ) on the LMRD curve, as illustrated in ( 1 Figure. As a result, for a given sample pair of L-moment ratios ( t, t ), one will need to 1 identify an ellipse in order to conduct the GOF test. Unfortunately, unlike in the case of 1 normal distribution for which and 0. 1, the population L-moment-ratios (, ) for the given sample pair t, ) is never known, and thus the (, )-specific ( t acceptance ellipses approach can not be adopted for GOF test of the PE distribution. The above argument indicates that, in conducting an L-moment-ratios based GOF test for a three-parameter distribution such as PE or GEV, one needs to consider all possible points in the parameter space which may give rise to random samples with a certain sample pair of L-moment-ratios (t, t ). As illustrated in Figure, a sample pair of L-moment-ratios ( t t, t t ) may originate from PE distributions with varying mostly between a and b (or corresponding range for the coefficient of skewness γ). Such PE distributions can be identified by considering the conditional density of given t t, i.e. f ( t t ) as shown in Figure. Additionally, PE random samples with sample L-skewness t t are associated with a range of sample L-kurtosis which can be characterized by the conditional density of t given t, i.e. g t t t ). t ( 0 Given a random sample of size n, the practice of GOF test for a probability distribution with level of significance will result in a 0% chance of rejecting the null hypothesis, if the random sample is indeed from that distribution. Thus, given a large number of PE 1 random samples of size n with sample L-skewness t, the 0(1 )% t acceptance

8 region of the L-moment-ratios-based GOF test can be determined from the conditional density g t t t ) and it should result in a nearly 0% ( rejection rate. Hereafter, the 1 acceptance region for random samples of a specific sample L-skewness (for example, t ) t is referred to as the t -specific acceptance region. The same procedure can then be repeated for a range of t values, and finally an acceptance band with respect to the sample size n can be constructed by fitting a set of upper and lower bound curves to 0(1 )% upper bounds and lower bounds of the t -specific acceptance regions. Conditional density of given t As has been described in the previous section, the t -specific acceptance region can be determined from the conditional density of t givent, i.e. g ( t t). To achieve a good estimation of g t t t ), we need to ensure having a large set of random samples (with ( 1 sample L-skewness t t ) that are generated from PE distributions of all possible values of. While it is impossible to generate random samples from PE distributions of all possiblevalues, we could at least settle for a range of which accounts for a very large portion of possiblevalues. This is achieved by investigating the conditional density of given t, i.e. f ( t), using simulated random samples of various sample size n and coefficient of skewness ( ) through the following procedures. For a specific sample size (say n = 0), our simulation of PE random samples with various coefficient of skewness results in a huge data set of (t, t ) sample pairs which form the data cloud shown in Figure. For a specific sample L-skewness t, all simulated samples with sample L-skewness falling within a small range of t ( in our study) were collected. Each simulated sample within this range corresponds to a PE population with certain L-skewness. Thus a relative histogram of can be constructed and is considered as an estimate of the condition density of given t, i.e. f t t ). t ( The same procedures were applied to simulated random samples of different sample size n and different t values, yielding estimate of f ) for various sample L-skewness t. ( t Through the Kolmogorov-Smirnov GOF test we found that the conditional densities f ) are normally distributed with expected value almost equaling t and standard ( t

9 deviation ( ) decreasing with increasing sample size. The standard deviation seems to be dependent only on the sample size, and, as demonstrated by Figure, for larger sample size ( n 0 ) the dependency approaches 0. ( n). () n Equation () indicates that % of PE samples with a specific sample L-skewness t come from PE distributions with population parameterfalling within the range of ( t 1. ( n), t 1. ( )) n. Such result is important in that it affects the upper limit for establishing t -specific acceptance regions using our simulated data. For example, at sample size n = 0, equals 0.1 and % of PE samples with a specific t come from PE distributions withfalling within the range of ( t 0., t 0. ). Since our random samples were simulated for not exceeding (or equivalently0.0), the upper limit of the t -specific acceptance region for sample size n=0 is t = 0. (0.0 0.) which is approximately equivalent to γ= (or 0.). If establishment of t -specific acceptance region is desired for higher t values, it will be necessary to generate random samples of PE distributions with coefficient of skewness higher than. Determining acceptance regions The conditional density g t t ) is the basis for determining the t -specific acceptance ( region. Given a specific sample L-skewness t, all simulated samples with sample 1 L-skewness falling within the t range were collected and considered as a random 0 1 sample of g ( t t). The and 1 quantiles of this random sample were then extracted from its empirical cumulative distribution function, and they respectively represent the lower and upper bounds of the 0(1 )% acceptance region with respect to the given sample L-skewness. The acceptance regions determined in this manner not only depend on t but also sample size n. Figure demonstrates the % acceptance regions for sample size n = 0, 0, and 0. It is also worthy to note that, unlike f ) which is ( t g( t t normally distributed, normality assumption for the conditional density ) is not

10 generally valid. Figure 1 demonstrates histograms of t derived from random samples with sample size n = 0 and 0, given certain t values. For higher sample L-skewness, for example t = 0., the conditional density g t t ) appears to be positively skewed even for ( sample size n = 0. In order for the acceptance regions to be applicable for arbitrary sample size n and sample L-skewness t, the upper and lower bounds of the acceptance regions need to be expressed as functions of the sample size and sample L-skewness. We first calculate deviations of the upper and lower bounds from the theoretical ~ curve (see ( t, n U ) and ( t, n) in Figure ()) and then express these bound deviations as functions of the L sample L-skewness through the following regression models: ( t t U, n) a 0( n) a 1( n) t a ( n) t a ( n) () ( t t L, n) b 0( n) b 1( n) t b ( n) t b ( n) () Coefficients of the above equations with respect to 1 sample sizes are listed in Table. From the very high R values of the regression models, we can be sure that the dependence of bound deviations on sample L-skewnes is well characterized by the models. However, regression coefficients listed in Table are specific only to a few sample sizes we have used. In order for Equations () and () to be generally applicable the following regression model is adopted to express those coefficients ( a, ) as functions of the sample size n: w1 w n w n y 0 where n is the sample size and y represents one of the regression coefficients in Equations () and (). Regression coefficients of Equation () are listed in Table. Combination of Equations (1)-(0) and ()-() allows us to establish the % acceptance regions of PE distribution with respect to any sample size between 0 and 00 and sample L-skewness t not exceeding 0.. Validation of the LMRD acceptance regions The sample-size-dependent acceptance regions established in the previous section are further checked for their validity. Two types of validation, namely the t -specific and -specific validity checks, were conducted. The t -specific validity check began by stochastically generating 0,000 standardized PE samples (= 0, = 1), with the same sets of coefficient of skewness γand sample size n as described in the section Stochastic simulation of the Pearson type I distribution. Sample L-skewness and L-kurtosis t, ) of each random sample were calculated and ( t i bi ()

11 LMRD-based GOF test was conducted using the sample-size-dependent % acceptance regions established in the previous section. Then various sets of (t 0.00) interval for t not exceeding 0. (corresponding to γ= ) were selected. For a specific sample size n, each selected (t 0.00) interval is composed of a large number, say N, of sample (t, t ) pairs. The rejection rate, i.e. the type I error, is then calculated as ˆN r / N where N r represents the rejected number of random samples within the (t 0.00) interval. For sample-size-dependent % acceptance regions to be valid the rejection rateˆ should be very close to the level of significance ( 0.0), or equivalently the acceptance rate be very close to 0.. Results of the t -specific validity check are shown in Figure 1 for sample size n = 0, 0, 0, 0, 0, and 00. The acceptance rates all fall within a very small range of [0., 0.]. Such results strongly indicate the validity of the sample-size- dependent % acceptance regions established in this study, and that they can be applied to GOF test for PE random samples of any sample size between 0 and 00 and sample L-skewness not exceeding 0.. The above validation is done with respect to sample L-skewness t. This is normally the case since the population L-skewness is never known. However, for complete demonstration of its applicability, we also check the validity of the established acceptance regions for random samples of known population parameters (,, γ), i.e. validation with respect to population L-skewness. Again, we simulated 0,000 random samples from the PE distribution with respect to each combination of (= 1., = 1., γ), with 1 values of skewness γthe same as described in previous sections, for sample size n = 0, 0 and 0. Each combination of (,, γ) corresponds to a specific pair of (,) since () population L-skewness and L-kurtosis only depend on the coefficient of skewness γ. For every pair of (, ), GOF test using the established acceptance regions was conducted for a total of 0,000 random samples with sample size n = 0, 0 and 0. Results of such -specific validity check are shown in Figure 1. Except for few cases of sample size n = 0 and 0., acceptance rates all fall within the (0. 0.) range, further confirming the validity of the established sample-size-dependent acceptance regions. Although the PE distribution was targeted for establishment of the % acceptance regions for L-moment- 0 ratios based GOF test in this study, we believe the same treatment can be applied to 1 establishing acceptance regions of other three-parameter distributions such as the GEV distribution. We end this paper by exemplifying usage of the established acceptance regions for LMRD-based GOF test of the PE distribution. Suppose a random sample of size n = is

12 available and its sample L-skewness and L-kurtosis are calculated as t = 0. and t = 0.1, respectively. Using Eq. (), the corresponding regression coefficients for Equations () and () are calculated as a., a 0. 1, a., a and b 0 0.1, b1 0. 0, b 1., b The upper and lower bound deviations from the theoretical ~ curve are U 0. and L 0. based on Equations () and (). Corresponding to 0., we have 0. from Equation (1). Thus, the upper and lower bounds of the % acceptance region for t are 0. (0.+0.) and 0.01 ( ) respectively. The sample L-skewness t = 0.1 falls within this acceptance region and therefore the null hypothesis (the sample is from a PE distribution) is not rejected. Summary and conclusions In this paper we present a detailed account of a stochastic simulation approach for establishing % acceptance regions of L-moment-ratios-based GOF test for the PE distribution. Unlike the normal and Gumbel distributions, the LMRD of the PE distribution plots as a curve, and t -specific acceptance regions need to be established through stochastic simulation. Properties of the sample L-skewness and L-kurtosis estimated by the probability-weighted-moment estimator are discussed. The sample-size-dependent % acceptance regions of the L-moments-based GOF tests for the PE distribution are further validated through a stochastic simulation process. A few concluding remarks are drawn as follows: 1. For Pearson type III distributions of smaller coefficient of skewness, scattering of sample L-skewness and L-kurtosis can be approximated by a bivariate normal distribution. As the coefficient of skewness increases, the bivariate normal approximation becomes less satisfactory.. For the Pearson type III distribution, the asymptotic biases of the sample L-skewness and L-kurtosis of the probability-weighted-moment estimator are dependent on both the sample size and coefficient of skewness. More specifically, sample L-skewness tends to underestimate the population L-skewness, although the bias is generally negligible. In contrast, sample L-kurtosis is positively biased for smaller skewness and negatively biased for γ> 0... PE random samples of the same sample L-skewness can come from PE distributions of various population L-skewness. The likelihood of these population L-skewness can be characterized by a conditional normal density whose standard deviation is inversely 1

13 proportional to the square root of sample size.. The sample-size-dependent % acceptance regions established in this study have been thoroughly validated through a stochastic simulation process and can be applied to GOF test for random samples of any sample size between 0 and 00 and sample skewness not exceeding.. Although the PE distribution was targeted for establishment of the % acceptance regions for L-moment-ratios-based GOF test in this study, the same treatment can be applied to establishing acceptance regions of other three-parameter distributions such as the GEV distribution. Acknowledgements We gratefully acknowledge the National Science Council of Taiwan, R.O.C. for financially supporting this research. References Abida, H. and Elouze, M., 00. Probability distribution of flood flows in Tunisia. Hydrology and Earth System Sciences Discussions, 1. D Agostino, R.B. and Stephens, M.A., 1. Goodness-of-Fit Techniques. Marcel Dekker, New York. Haan, C.T., 00. Statistical Methods in Hydrology. Ames, Iowa, Iowa State Press. Hosking, J.R.M.,. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society Series B (1), -1. Hosking, J.R.M. and Wallis, J.R., 1. Some statistics useful in regional frequency analysis. Water Resources Research (), 1-1. Hosking, J.R.M. and Wallis, J.R., 1. Regional frequency analysis: an approach based on L-moments. Cambridge University Press, Cambridge, U.K. Kottegoda, N.T.,. Stochastic water resources technology. Macmillan, London, U.K. Liou, J.J., Wu, Y.C., and Cheng, K.S., 00. Establishing acceptance regions for L-moments based goodness-of-fit tests by stochastic simulation. Journal of Hydrology,. Tsukatani, T. and Shigemitsu, K.,. Simplified Pearson distributions applied to air pollutant concentration. Atmospheric Environment 1,. U.S. Water Resources Council (US-WRS),. Guildlines for determining flood flow frequency, bulletin 1B. Office of Water Data Coordination, U.S. Geological Survey, Reston, Virginia. Vogel, R.M. and Fennesset, N.M., 1. L-moments diagrams should replace product 1

14 moment diagrams. Water Resources Research (), 1-1. Taiwan Water Resources Agency (TW-WRA), 000. Technical Code for Hydrological Design (in Chinese). Ministry of Economic Affairs, Taipei, Taiwan. 1

15 Table 1. Coefficients of the rational-function approximations of L-skewness and L-kurtosis of the Pearson type III distribution. A 0 =.01-1 C 0 = A 1 = C 1 =.0 - A =. - C =. - A = -. - C = 1. - B 1 =. -1 D 1 = B =.0-1 D =.0-1 E 1 =.0 G 1 =.1 E = 1.1 G =.1 E = G =.1 F 1 =.1 H 1 =.01 F =. H =. 1 F = 1.0 H =.1 1 1

16 Table. Sample means of t and t of the Pearson type III distributions with different values of coefficient of skewness. Sample (ˆ) t t (ˆ ) size, n 1 (τ =0.1) (τ =1) (τ =0.0) 1 (τ =0.1) (τ =1) (τ =0.0)

17 Table. Regression coefficients of the empirical relationships between the bound deviations and the sample L-skewness. Sample Coefficients for the upper bound Coefficients for the lower bound size deviations (Eq. ()) deviations (Eq. ()) n a 0 a 1 a a R b 0 b 1 b b R

18 Table. Regression coefficients of Equation (). w1 w n w n y 0 y w 0 w 1 w R a a a a b b b b

19 Figure 1. L-moment-ratio diagram of the normal, Gumbel, generalized extreme value (GEV), and Pearson type III (PT) distributions. 1

20 Normal Gumbel Figure. % acceptance regions of L-moments-based GOF test for the normal and Gumbel distributions. Acceptance ellipses correspond to various sample sizes (n = 0, 0, 0, 0, 0,, 0,, 0, 00, and 1,000). [Liou et al., 00] 1 1 0

21 Figure. Scatter plot of sample (t, t ) pairs with respect to various skewness. Each plot is constructed based on 0,000 PE random samples of size n = 0. The ellipses represent % equiprobable density contour of (t, t ), assuming t and t form a bivariate normal distribution. Red dots mark the theoretical value of (, ) of the PE distribution. 1

22 Figure. Scatter plot of sample (t, t ) pairs with respect to various skewness. Each plot is constructed based on 0,000 PE random samples of size n = 0. The ellipses represent % equiprobable density contour of (t, t ), assuming t and t form a bivariate normal distribution. Red dots mark the theoretical value of (, ) of the PE distribution.

23 Figure. Biases of sample L-skewness (t,bias ) and L-kurtosis (t, bias ) of the Pearson type III distribution vary with sample size n. See Figure for b(γ,t,bias ) and b(γ,t,bias ). 1

24 b (, t, bias ) R b(, t, bias ) R Coefficient of skewness, γ Figure. Dependence of b(γ, t,bias ) and b(γ, t,bias ) on the coefficient of skewness γ.

25 Figure. Conceptual illustration of % acceptance ellipses of three points ( i) ( i) (,, i 1,, ) on the LMRD of PE distribution. The point (t, t ) represents a sample pair of L-moment-ratios. 1

26 t Upper bound of the % acceptance region oft given t t. Conditional distribution of % t given t t. g t t t ) ( Lower bound of the % acceptance region oft given t t. ( t * *, t ) ( t, t ) f ( t t ) f ( t t * ) a b Figure. Conceptual illustration of the % acceptance region for PE random samples having sample L-skewness t. t t 0 1

27 t t 0.00 t t t Estimate of the conditional density f t ). ( Figure. Data cloud of (t, t ) using simulated PE random samples with sample size n = 0 and various coefficients of skewness. Relative histogram of corresponding to all random samples with t t is also shown in the figure.

28 n=0 n=0 0. ( n) n n Figure. Standard deviation of the conditional density f ( t) expressed as a function of sample size n.

29 n = 0 n = 0 n = 0 t U ( t, n) L ( t, n) Figure. Upper and lower bounds of the t -specific % acceptance regions with respect to sample size n = 0, 0, and 0. The solid line represents the theoretical ~ curve of the PE distribution. t 1 1

30 n 0 t n 0 t n 0 t 0. n 0 t 0. n 0 t 0. n 0 t 0. Figure 1. Empirical histograms of t for different given values of t. 0

31 t Figure 1. % acceptance regions of (t, t ) for PE random samples of size n = 0, 0, 0, 00, and 00. The theoretical curve represents the ~ relationship of the PE distribution. t % acceptance region for sample size n = 0. n = 0 n = 0 n = 0 n = 00 n = 00 Theoretical 1

32 n=0 n=0 t t n=0 n=0 t t n=0 n=00 t t Figure 1. Acceptance rates of t -specific validity check for sample-size-dependent % acceptance regions of sample L-skewness and L-kurtosis pairs.

33 n=0 n=0 n=0 Figure 1. Acceptance rates of -specific validity check for sample-size-dependent % acceptance regions of sample L-skewness and L-kurtosis pairs.

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Measuring Line Edge Roughness: Fluctuations in Uncertainty Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

COMPARISON BETWEEN ANNUAL MAXIMUM AND PEAKS OVER THRESHOLD MODELS FOR FLOOD FREQUENCY PREDICTION

COMPARISON BETWEEN ANNUAL MAXIMUM AND PEAKS OVER THRESHOLD MODELS FOR FLOOD FREQUENCY PREDICTION COMPARISON BETWEEN ANNUAL MAXIMUM AND PEAKS OVER THRESHOLD MODELS FOR FLOOD FREQUENCY PREDICTION Mkhandi S. 1, Opere A.O. 2, Willems P. 3 1 University of Dar es Salaam, Dar es Salaam, 25522, Tanzania,

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Estimating Industry Multiples

Estimating Industry Multiples Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Zeros of a Polynomial Function

Zeros of a Polynomial Function Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

The Dangers of Using Correlation to Measure Dependence

The Dangers of Using Correlation to Measure Dependence ALTERNATIVE INVESTMENT RESEARCH CENTRE WORKING PAPER SERIES Working Paper # 0010 The Dangers of Using Correlation to Measure Dependence Harry M. Kat Professor of Risk Management, Cass Business School,

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software September 2009, Volume 31, Code Snippet 2. http://www.jstatsoft.org/ A SAS/IML Macro for Computing Percentage Points of Pearson Distributions Wei Pan University of Cincinnati

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

TEACHING SIMULATION WITH SPREADSHEETS

TEACHING SIMULATION WITH SPREADSHEETS TEACHING SIMULATION WITH SPREADSHEETS Jelena Pecherska and Yuri Merkuryev Deptartment of Modelling and Simulation Riga Technical University 1, Kalku Street, LV-1658 Riga, Latvia E-mail: merkur@itl.rtu.lv,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 13 and Accuracy Under the Compound Multinomial Model Won-Chan Lee November 2005 Revised April 2007 Revised April 2008

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Des Moines River Regulated Flow Frequency Study

Des Moines River Regulated Flow Frequency Study E S I Des Moines River Regulated Flow Frequency Study MINNESOTA WISCONSIN D E S C E D A R M I S S I M O I N S S I P P R A C C O O N R I V E R Saylorville Lake Des Moines - SE 6th St #* Lake Red Rock I

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Data Preparation and Statistical Displays

Data Preparation and Statistical Displays Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability

More information

Fundamentals of Probability and Statistics for Reliability. analysis. Chapter 2

Fundamentals of Probability and Statistics for Reliability. analysis. Chapter 2 Chapter 2 Fundamentals of Probability and Statistics for Reliability Analysis Assessment of the reliability of a hydrosystems infrastructural system or its components involves the use of probability and

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

Online Appendices to the Corporate Propensity to Save

Online Appendices to the Corporate Propensity to Save Online Appendices to the Corporate Propensity to Save Appendix A: Monte Carlo Experiments In order to allay skepticism of empirical results that have been produced by unusual estimators on fairly small

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Time Series and Forecasting

Time Series and Forecasting Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information