Randomization Based Confidence Intervals For Cross Over and Replicate Designs and for the Analysis of Covariance

Size: px
Start display at page:

Download "Randomization Based Confidence Intervals For Cross Over and Replicate Designs and for the Analysis of Covariance"

Transcription

1 Randomization Based Confidence Intervals For Cross Over and Replicate Designs and for the Analysis of Covariance Winston Richards Schering-Plough Research Institute JSM, Aug, 2002 Abstract Randomization or permutation tests have been studied extensively under the finite model framework in comparative experiments to assess whether one treatment is significantly different than the other under the null hypothesis of no treatment difference.we explore the use of the randomization principle to construct confidence intervals for the treatment differences (considered as shifts) through the conceptual recovery of the null (hypothesis) state from the alternative in the sample. In the linear model framework, they may be constructed by generating a null distribution around the estimated shift (under least squares theory, say,) using the finite model rerandomization sample space of estimates derived from residuals. These confidence intervals do not require any assumptions of normality which are quite questionable for small sample cases. They are obtained under a very a simple assumption of additivity under the finite model setting. Illustrative Coverage probabilities are presented for some applications of the methodology. We consider replicate (cross over) designs where the confidence intervals are obtained using randomization theory versus the usual normal theory assumptions and the mixed model approach with factor analytic covariance structure from the FDA guidance on average bioequivalence for pharmacokinetics. 1 Introduction In situations where the distribution of the underlying population of interest is assumed (e.g., normal distribution), the validity of the procedure used, or the probabilistic functions based on the resulting sampling distribution will depend, perhaps critically, on the assumptions. Violations of the assumptions may have very serious consequences on the correctness or justification of the inference or decision involved. In the practical world, however, one's experience or prior information about a data situation may be quite inadequate to justify the assumptions. Besides, in many cases the sample size may be too small and the appeal to general large sample theory may be without basis. In these situations, distribution free or non-parametric methods may provide suitable procedures for analyzing the data. Below we give a brief and coherent description of finite model randomization methodology. Fisher in his 1926 paper (1) and in his now classical book "The Design of Experiments" (1935) (2) put forward an idea, the principle of randomization, that has dominated comparative experiment methodology for the past 75 years. The principle of randomization under a finite model setting was illustrated in testing the null hypothesis of no treatment difference in his Lady Testing Tea experiment. There Fisher used a restricted 2885

2 randomization or permutation test conditional on fixed sample sizes for the two treatments. In this article the principle of randomization under a finite model setup is used as the framework from which we construct confidence intervals when the response variables are either ordinal or dichotomous (Richards and Gogate [7]). The principle of randomization was expounded in Fisher (1935) (2) and in greater detail by Kempthorne (1952) (3). Kempthorne and Folks(1971) (4) defined consonance intervals in the context of the inversion of tests of significance. However, only recently has the application of this principle directly in data analysis received widespread attention due to the advancement in the computer technology. Lehmann (1997) (5), has discussed construction of a confidence interval using this concept for a shift in location parameter of an infinite population as an approximation which may be considered as the treatment effect for a comparative trial. Our motivation is inter alia to challenge the rather complex forms of assumptions and complicated analyses that are now rampant in the mixed model arena to make an inference about the comparison of treatment effects. We suggest that the inference from the finite model approach is sound. It does incorporate or embrace a wide range of underlying realities under approximate additivity of treatment effect, even under the infinite model distributional assumptions. We extend this concept to construct a confidence interval for the (additive) treatment difference in (repeated measures) simple cross over design and replicate design for the case of two treatments. We investigate the extension of this approach to the analysis of covariance in the linear model framework. We suggest a way of obtaining confidence intervals of the odds in dichotomous contexts where the link function is considered linear in concomitant variables and additive in the treatment effect, for the logistic regression, say. 2 Finite Model Framework An important feature of the finite model theory is its usefulness in making inference in situations where the probability distributions of the observations are unknown. Incorporation of randomization into experimental design gives a strong basis for statistical inference, particularly, if additivity as defined later holds. The probability statements and associated inferences have definitive relations to what is conceptually observable in a situation. The reader is referred to the work of Kempthorne in the derivation of Models which may be represented in form similar to parametric forms for the factorial analysis of variance, say, with main effects and interactions. However, these definitions are given in terms of the conceptual population defined below. They reflect algebraic partitions (combinations) of the basal responses of the experimental units without appeal to added distributional assumption of errors. In our discussions below we fill use the usual parametric forms of the models in our finite model estimation procedures. Suppose there are N experimental units U 1, U 2,,U N. Since each unit can receive any of the t (sequences of) treatments, the conceptual population of responses will consist of N t possible (vector) responses. In an actual experiment, however, we are restricted by the fact that each unit is assigned to only one sequence of treatments so that we have a restricted sample from the conceptual population which we must use to make a statistical inference for the difference of treatment means, for example, via test of the null hypothesis or confidence 2886

3 interval estimation. We restrict our discussion for the case of two treatments since that is of primary interest to us. Extension to a sequence of treatments follow logically. Suppose γ 1 and γ 2 denote the true treatment effects (e.g., means or proportions) associated with the two treatments (say A (new or active treatment) and B (control or standard of therapy)). Suppose Z i denote the basal vector response of unit U i. Under additivity, response C ik (of unit U i under treatment sequence vector k) can be expressed as C ik = Z i + γ k. In the sample, y ik the i th vector response under treatment sequence k is then given by Where δ j k(i) = y ik = Σ j C jk δ j k(i) 1 if unit j is the i-th replicate unit of treatment sequence k 0 otherwise Note that P P k ( i) 1 ( δ = 1) i = N k ( i) k '( i') ( δ = 1, δ = 1) j Conceptually we use an inversion process to test an hypotheses under the finite model setup. The equivalence (or one to one correspondence) between rejection and acceptance regions shows the structure of confidence sets as the totality of parameter values for which the hypothesis H ( ) is accepted when a sample is observed (see, for example, Section 3.5 of Lehmann (1997) for more details for the case where each unit has a single response). This property may be used in constructing the confidence intervals using randomization (permutation) tests. However a conceptually equivalent approach is to use an empirical distribution of linear contrasts on mean vector response per sequence based on various suband j' 1/N(N-1) for j j and i i 0 otherwise Let y. k denote the sample mean of the observations y ik. Basedontheaboverepresentationof y ik, it follows that the E(y. k )=Z.+γ k and hence contrasts c 1 y 1 c 2 y 2 is an unbiased estimate of the true difference =c 1 γ 1 -c 2 γ 2. Also notice that the average of any subset of the sample observations on sequence k is also an unbiased estimate of the population mean Z. + γ k, and consequently, linear contrasts the differences of the sample averages are unbiased estimates of the true difference. This property suggests a natural way of constructing confidence intervals as quantiles of the empirical distribution of the difference of populations means as described in Section Construction of Confidence Intervals 2887

4 samples of the two samples to obtain a confidence interval for the true difference (of the parameters under consideration). In this, the key property of unbiasedness of the difference in sample means under the finite model setup as described in the previous section is used as a basis to obtain the empirical distribution. We will derive confidence intervals for the difference in treatment effects. The usual least squares (or generalized least squares method under compound symmetry) estimates for the parameter models may be shown to yield unbiased estimates of differences in direct treatment effects under randomization for the finite model with additivity. We may use this general approach to obtain the point estimate and randomization theory to obtain the confidence interval estimates (relative to the point estimate) using the residuals. Inversion method under the finite model set up Treatment Unit Responses We consider scalar responses to illustrate the theory. The generalization to vector responses under the simple additivity of component treatment effects follows. Suppose X 1,X 2,,X m and Y 1,Y 2,,Y n are sets of scalar observations under treatment sequences C and T respectively with true population means γ 1 and γ 2. We assume that the active treatment and the control appear in different periods for sequence B and sequence A. Without loss of generality, we assume that the first m areassignedtotreatmenta and the remaining n observations have received treatment B in what follows. Our interest is to obtain a confidence interval for the true contrast =c 1 γ 1 -c 2 γ 2. Note that, under the finite model set up with additivity, the X's and Y's are a realization of the original basal responses Z's where the actives are shifted by an amount as laid out in the following table. Unit U 1... U m U m+1 U m+2... U m+n Treat. C X 1... X m Treat. T Y 1 Y 2... Y n Basal Z 1... Z m Z m+1 + Z m Z m+n + If were known, by subtracting from the Y's we would be in a null hypothesis situation of no treatment difference and the resulting observations would be one of the m+n C m possible (treatment group) realizations of the original Z's restricted only by the observed sample size. Therefore, a natural way of finding a confidence interval for with confidence coefficient 1 - α is to find the totality of values o such that, by subtracting 0 from the Y's, the null hypothesis of no treatment difference is not rejected at a significance level α using a randomization test (based on the modified or adjusted values). The lower and upper bounds of the values 0 then constitute a confidence interval as noted earlier. The algorithmic steps can thus be summarized as follows. Compute the modified or adjusted responses from the observed values by choosing an initial value 0 to create a "null" hypothesis situation. Compute the estimate for each of the m+n C m possible assignments. 2888

5 Compute the significance level by comparing the new values of the test statistic with the value obtained for the original observations from which 0 is subtracted. Retain the value of 0 if the significance value is less than or equal to the value corresponding to the desired confidence coefficient. Otherwise, repeat the above steps until one reaches the desired significance level, perhaps, by using the method of bisection or method of tangents. The procedure may have to be carried out separately to obtain upper and lower confidence limits. Note that it may not be feasible to obtain the exact value of α because the distribution is necessarily discrete. Remark 1 The above algorithm is given mainly for pedantic purposes. A logically equivalent and more efficient way of obtaining confidence intervals is to apply the empirical differences method as described in Lehmann (1997) [5] in the finite model set up: Since each observation is an unbiased estimate of the population mean, so is the average of those observations taken r at a time where r {1,2, min(m,n)}. In turn, the differences of such averages is an unbiased estimate of the difference of the population means. Generate the distribution of all such possible differences. The quantiles Q α/2 and Q 1 α/2 of this empirical distribution will constitute a 1 - α confidence interval for.. Equivalently, generate the null distribution of all such possible differences, using the residuals from the least squares fit. The quantiles Q α/2 and Q 1 α/2 of this null empirical distribution shifted by the point estimatewill producethe1- α confidenceinterval for. previously described. Let us consider the replicate design with sequences 1, TRTR and 2, RTRT. Under additivity the conceptual vector response for patient k under sequences 1 and 2 may be written Z 1k = (u k1 u k2 u k3 u k4 )+ τ(1010)andz 2k = (u k1 u k2 u k3 u k4 )+ τ( ), respectively, where τ is the additive difference between the test treatment T, and the control R. So that if τ were known, subtracting τ ( ) from the vector response of unit k under sequence 1 or correspondingly τ ( )undersequence2 would recoverthebasalnullvectorresponse for unit k, i.e., U k= (u k1 u k2 u k3 u k4 ) Under the assumption of a compound symmetry covariance structure, the best linear unbiased estimate of the direct treatment difference between treatments T and R adjusting for possible first order carryover effects is given by the inner product T 0 = C (Y 1. Y 2. ) =( ) [(y 1.1 y 1.2 y 1.3 y 1.4 )-(y 2.1 y 2.2 y 2.3 y 2.4 )] / 20 where the y i.p are the (scalar) component for period p of the mean vector responses over subjects in sequences i=1,

6 A1-α confidence interval for τ is given by (t 1 t 2 ) such that by modifying the observed vector responses as described above for τ equal to t 1 and t 2 would result in observed significance levels for the statistic T 0 of α 1 and α 2, respectively, under the randomization distribution from the modified values, with α =α 1 + α 2. As before, this is equivalent to generating the distribution of T 0 over all subsets taken r at a time from each sequence for r=1, 2, l, where l is the minimum of n 1 and n 2 and selecting the appropriate quantiles. Alternatively, we may use the residuals from the least squares fit instead of the responses and shift the resulting null distribution distribution by the point estimate The Analysis of Covariance. In situations where concomitant values are considered to influence the response the analysis of covariance model for the response of the j-th replicate unit on treatment i is usually represented in the form Y ij =µ + α i +X i β + ε ij with ancova table and resulting ANOVA as follows: Sum of Squares Source df xx xy yy Treatments t-1 T xx T xy T yy Residual N-t R xx R xy R yy Total N-1 G xx G xy G yy ANOVA X Treat p R xy R -1 xx R xy Treat X t-1 G yy -R yy - G xy G -1 xx G xy +R xy R -1 xx R xy Error N-t-p R yy -R xy R -1 xx R xy Our interest here is to obtain a confidence interval estimate of the comparison of the treatment effects after adjusting for the nuisance parameter β, estimated under least squares by b= R -1 xx R xy, which for scalar b is equal to Σ ij (y ij -y i. )(x ij -x i. )/Σ ij (x ij -x i. ) 2. We note that even under a single null grouping the true value of β cannot in general be determined via the finite model with additivity. However, we could assign its value by convention to be equal to G -1 xx G xz, where Z is the conceptual population of responses under basal conditions. The LS estimate of the comparison between two treatments from the ANCOVA model is then given by a k -a k =y k. -y k. -(x k. -x k. )b. 2890

7 EMS ANOVA Under Randomization theory For the simple case of equal sample sizes for two treatments under Randomization theory it may be shown that both estimates below are unbiased for the treatment effect. E(a k -a k )=τ k - τ k.. (unbiased ) E(y. k )-E(y k )=τ k - τ k.. (unbiased ) However, for the analysis of variance table the expectation of means square for treatment effects and for error are not equal under null conditions, since E R (R xy R -1 xx R xy ) ± G xz G -1 xx G xz. (N-t-p)/(N-1-p) and E R (R -1 xx R xy ) ± G -1 xx G xz. Nevertheless, as the main interest is the comparison of the treatment effects the analysis of covariance model may be written in the analysis of variance form w ij = Y ij -X i β =µ + α i + ε ij Conditional on the value of the nuisance parameter β, estimated by b, we can modify the initial value of the response by removing the nuisance effects and proceed, as in the analysis of variance above, to obtain the randomization confidence interval on the adjusted values {w ki }={ y ki.- x ki.b} Confidence Intervals for the treatment effect in the Analysis of Covariance. As in the the previously described sections, the procedure for the analysis of covariance is equivalent to using the residuals from the least squares fit to generate the null distribution of the treatment difference as described above and shifting the resulting quantiles of the distribution by the point estimate. The rerandomization approach may be used in the determination of confidence intervals in binary situations where the link function is additive in the treatment effect Randomization Confidence Intervals for the treatment effect in binary logistic regression. For the case of two treatments where the response rate is considered approximately as increasing in a binary logistic regression, the model may be approximated as EY ij =π(x)= exp(δ i +X ij β)/ (1+ exp(δ i +X ij β)), where β may be considered as nuisance parameters and the δ i =0orδ is the effect for one treatment relative to the other. As in the analysis of covariance the true value of β cannot in general be determined via the finite model with additivity. However, we could proceed to estimate the δ and β jointly, and obtain a confidence interval for δ accounting for X or conditional on β. Hirji, Mehta and Patel, 1987 (6), 1995, (6a) have derived and examined Maximum likelihood estimates and Exact likelihood ratio estimates based on the sufficient statistics. 2891

8 A minor modification to their approach enables randomization confidence interval estimate for δ to be obtained assuming the nuisance parameters β areknown and equal totheir estimated values, b. The limits would be given by the quantiles from the distribution of solutions of δ using the likelihood function or estimating equations on the subsets of the data of sizes 1 to m as described above with β equated to their estimated values, b. Treatment subgroups will need both types of outcomes to obtain a solution for the treatment effect. Subgroups that violate this criterion may require an imposed modification in the relevant equations or may be assigned to null or extreme value solutions for the treatment effect depending on whether the patterns are similar or not. The coverage based on this approach is to be compared with that for the exact CI of Mehta and Patel. Results and Conclusions Replicate Designs a) For the sample sizes investigated, n=12, 16, and 20, under the finite model (permutations) the coverage of our procedure is equal to the nominal value within tolerance of a finite discrete distribution. The coverage of confidence intervals obtained by the usual cross over model with first order carryover and normal assumptions under compound symmetry and from the mixed model procedure recommended in the FDA guidance for Bioequivalence exceed the nominal values. b) Sampling from an infinite model framework shows that our procedure produces coverage that is close to the nominal value. The coverage of confidence intervals obtained by the usual normal assumptions under compound symmetry is slightly exceeded by the nominal values. While coverage from the mixed model procedure of the FDA guidance for Bioequivalence slightly exceeds the nominal values. c) Preliminary work on coverage for the analysis of covariance under a) the finite model (permutations) construction and b) the infinite model was performed for small sample sizes (12, 16 and 20). The simulations were done using truncated pseudo normal variables, and various values of the slope. The results indicate that the coverage of our procedure for ANOVA is close to the nominal value for the 99%, 95% and 99% confidence intervasl and a little smaller than the nominal values in ANCOVA.for the 95% and 90% confidence intervals. The width of the interval in ANCOVA is smaller than for ANOVA. Sampling from the finite model framework, the coverage of confidence intervals obtained using the usual normal estimates are close to thethe nominal values. The finite model rerandomization procedure gives coverages that are equal to the nominal values for the ANOVA. and slightly less than the nominal values in ANCOVA.for the 95% and 90% confidence intervals. Situations where data are from underlying nonnormal are being investigated. 2892

9 e) Computation issues As the sample sizes increase the panoply of outcomes for the sample space becomes computationally prohibitive. In application, given the data we would estimate the confidence intervals by selecting a weighted random sample of subsets of sizes 1 to m as defined above The weights are determined by the relative frequency of the subset sizes in the randomization sample space. Monte Carlo Results for the Replicate Design. The results given below are for illustrative purposes. The infinite model data were generated to reflect various inter subject and intra subject heterogeneity for the treatments. Results are based on approximately 2000 samples each. In the finite model table, results were obtained for underlying actual finite model additive data. The coverage examples for sample size 12:6-6* are based on the complete restricted set of 924 assignments of the 12 subjects to treatment groups of size 6. For sample size 16:8-8 and 20:10-10 the results are based on 2000 random selections from the complete restricted set of assignments of the subjects to treatment groups of equal sizes. Infinite model Results: Sample Size and Coverage(/2000) Sample Sizes(N:n1-n2) 12:6-6* 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite model glm w.carry-ovr mixed-fda Finite model Results: Sample Size and Coverage(/2000) Sample Sizes(N:n1-n2) 12:6-6 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite model glm w.carry-ovr mixed-fda

10 Monte Carlo Results for the ANOVA and ANCOVA. Again the results given below are for illustrative purposes. The infinite model data were generated using a pseudonormal random generator truncated at 2 standard deviations from the means. In the finite model table, results were obtained for underlying actual finite model additive data. The data were derived using subsets from the samples above and unique rerandomization assignments of the units into treatment groups. The coverage examples for sample size 12:6-6* are based on the complete restricted set of 924 assignments of the 12 subjects to treatment groups of size 6. For sample size 16:8-8 and 20:10-10 the results are based on 2000 random selections from the complete restricted set of assignments of the subjects to treatment groups of equal sizes. Finite model Table: Sample Size and Coverage(/2000) finite model Table: 12:6-6 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite ANOVA Finite ANCOVA GLM ANOVA GLM ANCOVA References Infinite model Table: Sample Size and Coverage (/1000) infinite model Table: 12:6-6 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite ANOVA Finite ANCOVA GLM ANOVA GLM ANCOVA Fisher, R. A. (1926) The Arrangement of Field Experiments. J. Min. Agric. Eng. 33: Fisher, R. A. (1935) The Design of Experiments. Oliver and Boyd, Edinborough, England 3 Kempthorne, Oscar and LeroyFolks (1952) 2894

11 Design and Analysis of Experiments 2 nd ed. John Wiley and Sons, Inc., New York 4 Kempthorne, Oscar and LeroyFolks (1971) Probability, Statistics and Data Analysis, Ames, Iowa: Iowa State Press. 5Lehman,E.J. (1997) Testing Statistical Hypothesis, 3 rd ed. John Wiley and Sons, Inc., New York 6 Hirji, Karim F., Mehta, Cyrus R., and Patel, Nitin R. (1987), Computing Distributions for Exact Logistic Regression, JASA,82 6a Mehta, C Rand Patel N, R (1995) Exact logistic regression: theory and examples. Statistics in Medicine, Richards,Wand J. Gogate (2000) Finite Model Confidence intervals for Dichotomous data, unpublished Manuscript. 2895

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Testing for Granger causality between stock prices and economic growth

Testing for Granger causality between stock prices and economic growth MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

The Null Hypothesis. Geoffrey R. Loftus University of Washington

The Null Hypothesis. Geoffrey R. Loftus University of Washington The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 gloftus@u.washington.edu

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? Simulations for properties of estimators Simulations for properties

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Confidence Intervals for Spearman s Rank Correlation

Confidence Intervals for Spearman s Rank Correlation Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Experimental Designs (revisited)

Experimental Designs (revisited) Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information