The Chi-square test when the expected frequencies are less than 5
|
|
- Julianna Strickland
- 7 years ago
- Views:
Transcription
1 The Chi-square test when the expected frequencies are less than 5 Wai Wan Tsang and Kai Ho Cheng Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong {tsang, khcheng3}@cs.hku.hk Summary. In the chi-square test, it is required that the expected frequency of each cell is at least 5. This condition ensures that the CDF of the test statistic (χ 2 ) can be closely approximated by the chi-square distribution. This paper describes two methods to compute the CDF of χ 2 directly. The first method computes the exact probabilities for all attainable values of χ 2. It is effective when both the number of samples and the number of cells are small. The second method approximates the CDF with an empirical distribution function that has three digits of accuracy. The second method complements the first one when the number of cells is large. A C program that uses these two methods to compute the CDF of χ 2 is implemented. With this program, one can carry out the chi-square test even when some or all expected frequencies are less than 5. Key words: goodness-of-fit test, chi-square test 1 Introduction The chi-square goodness-of-fit test is used to check whether a set of samples fits a purported discrete distribution. The null hypothesis is that the samples follow the distribution. Suppose that the possible outcomes of an experiment are 1, 2,..., k, with probabilities p 1, p 2,..., p k, respectively. The experiment is carried out n times independently. Let o 1, È o 2,..., o k be È the numbers of 1, 2,..., k respectively in the n outcomes. Note that o i = n and p i = 1. The chi-square statistic is defined as k χ 2 = i=1 (o i np i) 2 np i (1) o i is called the observed frequency of cell i and np i is the expected frequency. When the null hypothesis is true and all expected frequencies are at least 5, the CDF of χ 2 is closely approximated by the chi-square distribution of k 1 degrees of freedom, denoted as Chisq(x, k 1). Let p-value = Chisq(χ 2, k 1). If the p-value is greater than a pre-set threshold of proportion, say, 0.95, the null hypothesis is
2 1584 Wai Wan Tsang and Kai Ho Cheng rejected. Otherwise, it is accepted. χ 2 indeed has discrete values but the chi-square distribution is continuous. Figure 1a shows the true CDF of χ 2 when all np i s are 5 (the staircases) and Chisq(x,5) (the smooth curve). They are close to each other. Figure 1b shows the staircases and the curve again when all np i s are 2. In this graph, the curve deviates noticeably from the staircases. To ensure that the CDF can be closely approximated by the chi-square distribution, the chi-square test requires all expected frequencies be at least 5. (a) k = 6, n = 30 and all p i = 1/6. (b) k = 6, n = 12 and all p i = 1/6. Fig. 1. The CDFs of χ 2 and their approximation, Chisq(x, k 1) The chi-square test is suggested by Karl Pearson in 1900 [PK00]. The approximation of the CDF of χ 2 with the chi-square distribution was crucial before the computer era. With today s computing technology, we can actually compute the CDF of χ 2 on the fly, at least when n and k are small. In doing so, we can relax the at-least-5 requirement on the expected frequencies. The relaxation is important in the applications where testing samples are scarce or very expensive, e.g., in medical or genomic research. This paper describes two methods for computing the CDF of χ 2, one analytical and one empirical. The first method computes the exact CDF but is inefficient when k or n is large. The second method computes an empirical distribution function (EDF) of χ 2 using 11 million trials. The resulting probabilities have at least three digits of accuracy. A C program that uses these two methods to compute the CDF of χ 2 is implemented. With this program, one can carry out the chi-square test even when some or all expected frequencies are less than 5.
3 The Chi-square test when the expected frequencies are less than The analytical method It is easy to see that when k = 2, a test instance, specified by [o 1, o 2], follows the binomial distribution. That is, the probability that there are o 1 1 s and o 2 2 s is n! o 1!o 2! po 1 1 p o 2 2 (2) When k 2, [o 1, o 2,..., o k ] follows the multinomial distribution, a generalization of the binomial distribution. The probability, p, that [o 1, o 2,..., o k ] occurs is n! o 1!o 2!... o k! po 1 1 p o p o k k (3) The following sketches a straightforward way to compute the CDF of χ 2 using the above formula. 1. For each instance, [o 1, o 2,..., o k ], compute the χ 2 value and p. 2. Sort the pairs of [χ 2, p] in the ascending order of the χ 2 values. 3. Combine the pairs that have identical χ 2 values. The p in the new pair is the sum of the p s in the pairs being combined. For example, [0.65, 0.01] and [0.65, 0.02] are combined into [0.65, 0.03]. The resulting list gives the density distribution of χ Accumulate the p s in the density distribution to form the CDF. A C program that computes the CDF using this method has been implemented. The test instances, [o 1, o 2,..., o k ] s, are enumerated using recursion. For efficiency, the powers of p i s and factorials in the formula of the multinomial distribution are pre-computed. To verify the correctness of our program, we plot the computed CDFs together with the corresponding chi-square distributions in Figure 2. As expected, they are very close to each other. To demonstrate the effectiveness, we used the program to compute the CDFs for the chi-square test of k = 2, 3,..., 10 cells. For each k, we found the largest n such that the computation could end within 1 minute on a PC with 2.26GHz Pentium 4 processor. Table 1 shows the n s recorded. k Largest n Table 1. The largest n s found for various k s s.t. the program ends in 1 minute. 3 The empirical method The analytical method is inefficient when k is large. For such cases, we can estimate the EDF of χ 2 using simulation. This approach was suggested by Professor G.
4 1586 Wai Wan Tsang and Kai Ho Cheng (a) k = 4, n = 40 and all p i = 1/4. (b) k = 8, n = 40 and all p i = 1/8. Fig. 2. The CDFs of χ 2 and their corresponding chi-square distributions. Marsaglia in 2005 [MAR05]. We have implemented a C program for the task. In our program, random numbers are generated using a combination of the multiply-with-carry generator [MZ91] and the 3-shift generator [MAR03]. Discrete variates are obtained using the method suggested in [MTW04]. The maximum absolute error (MAE) in an EDF has the same distribution as the Kolmogorov statistic [TW04]. Suppose that an EDF is obtained using m trials. Using the asymptotic distribution of the Kolmogorov statistic given in [KOL33], the mean and standard deviation of MAE are 0.87/ m and 0.26/ m, respectively. In our program, m = 11,000,000. The mean plus three standard deviations is Therefore, it is very safe to claim that the EDF is accurate up to the third digit. To verify the correctness of our program, we plot the estimated EDFs together with the true CDFs computed using the analytical method in Figure 3. The EDFs coincide with the CDFs in the graphs. We use our program to estimate the EDFs of χ 2 for different n s and different hypothetical distributions having 5 values (k = 5). The execution times are shown in Table 2. As expected, the execution time is proportional to n but is insensitive to the distribution. n=20 n=30 n=40 n=50 p 1 = 1/5, p 2 = 1/5, p 3 = 1/5, p 4 = 1/5, p 5 = 1/5 18 s 24 s 31 s 38 s p 1 = 1/15, p 2 = 2/15, p 3 = 3/15, p 4 = 4/15, p 5 = 5/15 19 s 25 s 32 s 38 s p 1 = 1/25, p 2 = 2/25, p 3 = 4/25, p 4 = 7/25, p 5 = 11/25 20 s 25 s 32 s 38 s Table 2. Execution times for computing the EDFs for various n s and distributions.
5 The Chi-square test when the expected frequencies are less than (a) k = 6, n = 12 and all p i = 1/6. (b) k = 8, n = 40 and all p i = 1/8. Fig. 3. The CDFs of χ 2 and the EDFs obtained using our program. Table 3 shows the execution times of computing the EDFs for different k s when n = 200. In the experiment, the hypothetical distributions are uniformly distributed, i.e., all p i s are equal. The results show that the execution time is insensitive to k. n = 200 k = s k = s k = s k = s k = s Table 3. Execution times for computing the EDF for n = 200 and k = 20, 40, 60, 80 and Discussion A C program that evaluates the CDF of χ 2 (p-value) in the chi-square test has been developed. If all expected frequencies are at least 5, the p-value is computed from the chi-square distribution of k 1 degrees of freedom as usual. Otherwise, if k 10 and n is less than or equal to the values shown in Table 1, compute the p-value using the analytical method, else use the empirical method. If the empirical method is used, the estimated execution time will be printed on the console. This program can be downloaded from the website at tsang/chisq.c. We are still tuning the program for efficiency. A dynamic programming approach is being considered for computing the true CDF of χ 2. For the empirical method,
6 1588 Wai Wan Tsang and Kai Ho Cheng certain random number generators that are faster than the combined generator used is being tested for suitability. (a) k = 2, n = 10 and p 1 = p 2 = 1/2. (b) k = 2, n = 6, p 1 = 1/4 and p 2 = 3/4. Fig. 4. Two CDFs with large quantum jumps χ 2 is a discrete variable but is treated as a continuous variable in the chi-square test. The appropriateness depends on the sizes of k and n. When k is very small, the quantum jumps in the CDF of χ 2 are obvious even when all expected frequencies are at least 5. Figure 4a shows an extreme case where k = 2, n = 10 and p 1 = p 2 = 1/2. The quantum jumps are bigger when the at-least-5 requirement is not satisfied or the p i s are not equal, or both, as shown in Figure 4b where k = 2, n = 6, p 1 = 1/4 and p 2 = 3/4. The effects of the discreteness on Type I error, Type II error and the power of the chi-square test are worth for further investigation.
7 References The Chi-square test when the expected frequencies are less than [KOL33] Kolmogorov, A.: Sulla determinazione empirica ei una legge di distributione. Giornale dell Istituto Italiano degli Attuari, 4, (1933) [MAR03] Marsaglia G: Xorshift RNGs. Journal Statistical Software, 8, Issue 14 (2003) [MAR05] Marsaglia, G: Monkeying with the Goodness-of-Fit Test. Journal of Statistical Software, 14, Issue 13 (2005) [MTW04] Marsaglia, G., Tsang, W.W. and Wang, J.: Fast genereation of Discrete Random Variables. Journal of Statistical Software, 11, Issue 3 (2004) [MZ91] Marsaglia, G. and Zaman, A.: A new class of random number generators. The Annals of Applied Probability, 1, (1991) [PK00] Pearson, K.: On the Criterion that a Given System of Deviations from the Probable in the Case of Correlated System of Variables is such that it can be Reasonably Supposed to have Arisen from Random Sampling. Philosophical Magazine, 50, Issue 5, (1900) [TW04] Tsang, W.W and Wang, J.: Evaluating the CDF of the Kolmogorov statistic for normality testing. Proceedings of the COMPSTAT 2004, 16th Symposium of IASC, Prague, , August (2003)
Two Digit Testing for Benford s Law. Abstract
Two Digit Testing for Benford s Law Dieter W. Joenssen 1,2 1 University of Technology Ilmenau, Ilmenau, Germany 2 Corresponding author: Dieter W. Joenssen, e-mail: Dieter.Joenssen@TU-Ilmenau.de Abstract
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationChapter 23. Two Categorical Variables: The Chi-Square Test
Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise
More informationCalculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.
More informationChapter 19 The Chi-Square Test
Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot
More informationChi Square Tests. Chapter 10. 10.1 Introduction
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationSOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions
SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions 1. The following table contains a probability distribution for a random variable X. a. Find the expected value (mean) of X. x 1 2
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationA and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.
Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationA TOOLKIT FOR STATISTICAL COMPARISON OF DATA DISTRIBUTIONS
The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) A TOOLKIT FOR STATISTICAL
More informationStatistical Impact of Slip Simulator Training at Los Alamos National Laboratory
LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationYou flip a fair coin four times, what is the probability that you obtain three heads.
Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables.
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
More informationBowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationStatistical Testing of Randomness Masaryk University in Brno Faculty of Informatics
Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular
More informationSTART Selected Topics in Assurance
START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson
More informationTesting Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More informationChapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
More informationTests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationBinomial Probability Distribution
Binomial Probability Distribution In a binomial setting, we can compute probabilities of certain outcomes. This used to be done with tables, but with graphing calculator technology, these problems are
More informationSimulating Chi-Square Test Using Excel
Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationDetecting Flooding Attacks Using Power Divergence
Detecting Flooding Attacks Using Power Divergence Jean Tajer IT Security for the Next Generation European Cup, Prague 17-19 February, 2012 PAGE 1 Agenda 1- Introduction 2- K-ary Sktech 3- Detection Threshold
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationBivariate Statistics Session 2: Measuring Associations Chi-Square Test
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution
More informationThe CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
More informationPeople have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
More informationProcess Capability Analysis Using MINITAB (I)
Process Capability Analysis Using MINITAB (I) By Keith M. Bower, M.S. Abstract The use of capability indices such as C p, C pk, and Sigma values is widespread in industry. It is important to emphasize
More informationNormal and Binomial. Distributions
Normal and Binomial Distributions Library, Teaching and Learning 14 By now, you know about averages means in particular and are familiar with words like data, standard deviation, variance, probability,
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationLecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationCrosstabulation & Chi Square
Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among
More information12.5: CHI-SQUARE GOODNESS OF FIT TESTS
125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
More informationThe normal approximation to the binomial
The normal approximation to the binomial The binomial probability function is not useful for calculating probabilities when the number of trials n is large, as it involves multiplying a potentially very
More informationCHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
More informationThe Monty Python Method for Generating Gamma Variables
The Monty Python Method for Generating Gamma Variables George Marsaglia 1 The Florida State University and Wai Wan Tsang The University of Hong Kong Summary The Monty Python Method for generating random
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationHypothesis Testing: Two Means, Paired Data, Two Proportions
Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions 10.1 Hypothesis Testing: Two Population Means and Two Population Proportions 1 10.1.1 Student Learning Objectives By the end of this
More informationLAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationWHERE DOES THE 10% CONDITION COME FROM?
1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay
More information99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationLesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables
Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables Classwork Example 1 Students at Rufus King High School were discussing some of the challenges of finding space for
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationTHE SIX SIGMA BLACK BELT PRIMER
INTRO-1 (1) THE SIX SIGMA BLACK BELT PRIMER by Quality Council of Indiana - All rights reserved Fourth Edition - September, 2014 Quality Council of Indiana 602 West Paris Avenue West Terre Haute, IN 47885
More informationGraphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:
Graphs Exploratory data analysis Dr. David Lucy d.lucy@lancaster.ac.uk Lancaster University A graph is a suitable way of representing data if: A line or area can represent the quantities in the data in
More informationComputational Statistics and Data Analysis
Computational Statistics and Data Analysis 53 (2008) 17 26 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Coverage probability
More informationNon-Inferiority Tests for One Mean
Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationComparing Multiple Proportions, Test of Independence and Goodness of Fit
Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2
More informationIBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationGoodness of Fit. Proportional Model. Probability Models & Frequency Data
Probability Models & Frequency Data Goodness of Fit Proportional Model Chi-square Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any
More informationHomework 4 - KEY. Jeff Brenion. June 16, 2004. Note: Many problems can be solved in more than one way; we present only a single solution here.
Homework 4 - KEY Jeff Brenion June 16, 2004 Note: Many problems can be solved in more than one way; we present only a single solution here. 1 Problem 2-1 Since there can be anywhere from 0 to 4 aces, the
More informationMind on Statistics. Chapter 12
Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference
More informationProbability Distributions
CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution
More informationSPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER
SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER SAS White Paper Table of Contents Abstract.... 1 Background.... 1 Example 1: Telescope Company Monitors Revenue.... 3 Example
More informationNon-Inferiority Tests for Two Proportions
Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which
More informationChi-square test Fisher s Exact test
Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationNonparametric Statistics
Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics
More informationSTATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
More informationAP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationGeneral Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationStatCrunch and Nonparametric Statistics
StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that
More information