# statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Save this PDF as:

Size: px
Start display at page:

Download "statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals"

## Transcription

1 Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss xx ) y (mean) s e sqrt(1/n + (x o m x ) 2 /ss xx ) y new s e sqrt(1/n + (x new m x ) 2 /ss xx + 1) (individual) Where t crit is with N-2 degrees of freedom, and s e = sqrt(σ(y i y i ) 2 /(N-2)) = sqrt((ss yy b ss xy )/(n-2)) Summary sheet from last time: Hypothesis testing Of course, any of the confidence intervals on the previous slide can be turned into hypothesis tests by computing t obt = (observed expected)/se, and comparing with tcrit. Testing H0: ρ=0: t obt = r sqrt(n-2)/sqrt(1-r 2 ) Compare with tcrit for N-2 degrees of freedom. Testing H0: ρ=ρ o : Need to use a different test statistic for this. Statistics Chi-square tests and nonparametric statistics /13/2004 Descriptive Graphs Frequency distributions Mean, median, & mode Range, variance, & standard deviation Inferential Confidence intervals Hypothesis testing 1

2 Two categories of inferential statistics Parametric What we ve done so far Nonparametric What we ll talk about in this lecture Parametric statistics Limited to quantitative sorts of dependent variables (as opposed to, e.g., categorical or ordinal variables) Require dependent variable scores are normally distributed, or at least their means are approximately normal. There exist some parametric statistical procedures that assume some other, non-normal distribution, but mostly a normal distribution is assumed. The general point: parametric statistics operate with some assumed form for the distribution of the data. Sometimes require that population variances are equal Parametric statistics Best to design a study that allows you to use parametric procedures when possible, because parametric statistics are more powerful than nonparametric. Parametric procedures are robust they will tolerate some violation of their assumptions. But if the data severely violate these assumptions, this may lead to an increase in a Type I error, i.e. you are more likely to reject H0 when it is in fact true. Nonparametric statistics Use them when: The dependent variable is quantitative, but has a very non-normal distribution or unequal population variances The dependent variable is categorical Male or female Democrat, Republican, or independent Or the dependent variable is ordinal Child A is most aggressive, child B is 2nd most aggressive 2

3 Nonparametric statistics The design and logic of nonparametric statistics are very similar to those for parametric statistics: Would we expect to see these results by chance, if our model of the population is correct (one-sample tests)? Do differences in samples accurately represent differences in populations (two-sample tests)? H 0, H a, sampling distributions, Type I & Type II errors, alpha levels, critical values, maximizing power -- all this stuff still applies in nonparametric statistics. Chi-square goodness-of-fit test A common nonparametric statistical test Considered nonparametric because it operates on categorical data, and because it can be used to compare two distributions regardless of the distributions Also written χ 2 As an introduction, first consider a parametric statistical test which should be quite familiar to you by now Coin flip example (yet again) You flip a coin 0 times, and get 60 heads. You can think of the output as categorical (heads vs. tails) However, to make this a problem we could solve, we considered the dependent variable to be the number of heads Coin flip example Null hypothesis: the coin is a fair coin Alternative hypothesis: the coin is not fair What is the probability that we would have seen 60 or more heads in 0 flips, if the coin were fair? 3

4 Coin flip example We could solve this using either the binomial formula to compute p(60/0) + p(61/0) +, or by using the z- approximation to binomial data Either way, we did a parametric test Solving this problem via z- approximation z obt = ( )/sqrt(0.5 2 /0) = 2 p If our criterion were α=0.05, we would decide that this coin was unlikely to be fair. What if we were rolling a die, instead of tossing a coin? A gambler rolls a Number on die 60 times, and die gets the results shown on the right Is the gambler using a fair die? Number of rolls We cannot solve this problem in a good way using the parametric procedures we ve seen thus far Why not use the same technique as for the coin flip example? In the coin flip example, the process was a binomial process However, a binomial process can have only two choices of outcomes on each trial: success (heads) and failure (tails) Here was have 6 possible outcomes per trial 4

5 The multinomial distribution Can solve problems like the fair die exactly using the multinomial distribution This is an extension of the binomial distribution However, this can be a bear to calculate particularly without a computer So statisticians developed the χ 2 test as an alternative, approximate method Determining if the die is loaded Null hypothesis: the die is fair Alternative hypothesis: the die is loaded As in the coin flip example, we d like to determine whether the die is loaded by comparing the observed frequency of each outcome in the sample to the expected frequency if the die were fair In 60 throws, we expect of each outcome Some of these lines may look suspicious (e.g. freqs 3 & 4 seem high) However, with 6 lines in the table, it s likely at least one of them will look suspicious, even if the die is fair You re essentially doing multiple statistical tests by eye Number on die Sum: Observed frequency Expected frequency 60 Basic idea of the χ 2 test for goodness of fit For each line of the table, there is a difference between the observed and expected frequencies We will combine these differences into one overall measure of the distance between the observed and expected values The bigger this combined measure, the more likely it is that the model which gave us the expected frequencies is not a good fit to our data The model corresponds to our null hypothesis (that the die is fair). If the model is not a good fit, we reject the null hypothesis 5

6 The χ 2 statistic = rows 2 2 ( observed frequency i expected frequency i ) χ i = 1 expected frequency i The χ 2 statistic When the observed frequency is far from the expected frequency (regardless of whether it is too small or two large) the corresponding term is large. When the two are close, the term is small. Clearly the χ 2 statistic will be larger the more terms you add up (the more rows in your table), and we ll have to take into account the number of rows somehow. Number on die Sum: Back to our example Observed frequency Expected frequency 60 χ 2 = (4-) 2 / + (6-) 2 / + (17-) 2 / + (16-) 2 / + (8-) 2 / + (9-) 2 / = 142/ = 14.2 What s the probability of getting χ if the die is actually fair? Probability that χ if the die is actually fair With some assumptions about the distribution of observed frequencies about expected frequencies (not too hard to come by, for the die problem), one can solve this problem exactly. When Pearson invented the χ 2 test, however (1900), computers didn t exist, so he developed a method for approximating this probability using a new distribution, called the distribution. χ 2 6

7 The χ 2 distribution The χ 2 distribution The statistic χ 2 is approximately distributed according to a χ 2 distribution As with the t-distribution, there is a different χ 2 distribution for each number of degrees of freedom In this case, however, the χ 2 distribution changes quite radically as you change d.f. Random χ 2 factoids: Mean = d.f. Mode = d.f. 2 Median d.f. -.7 Skewed distribution, skew as d.f df = 4 df = Figure by MIT OCW. The χ 2 test Use the χ 2 tables at the back of your book, for the appropriate d.f. and α. area (along top of table) # in the table Degrees of freedom 99% 95% 5% 1% What degrees of freedom do you use? Let k = # categories = # expected frequencies in the table = # terms in the equation for χ 2 For the die example, k=6, for the 6 possible side of the die Then, d.f. = k 1 d Where d is the number of population parameters you had to estimate from the sample in order to compute the expected frequencies 7

8 Huh? This business about d will hopefully become more clear later, as we do more examples. For now, how did we compute the expected frequencies for the die? With 60 throws, for a fair die we expect throws for each possible outcome This didn t involve any estimation of population parameters from the sample d = 0 Degrees of freedom for the die example So, d.f. = k 1 = 6 1 = 5 Why the -1? We start off with 6 degrees of freedom (the 6 expected frequencies, e i, i=1:6), but we know the total number of rolls (Σe i = 60), which removes one degree of freedom I.E. if we know e i, i=1:5, these 5 and the total number of rolls determine e 6, so there are only 5 degrees of freedom Back to the example We found χ 2 = 14.2 What is p( χ )? Looking up in the χ 2 tables, for 5 d.f: Degrees of freedom 5 99% % % 1% > 11.07, but < p is probably a little bit bigger than 0.01 ( it s actually about 0.014, if we had a better table)) So, is the die fair? We re fairly certain (p<0.05) that the die is loaded. 8

9 Rule of thumb Once again, this χ 2 distribution is only an approximation to the distribution of χ 2 values we d expect to see As a rule of thumb, the approximation can be trusted when the expected frequency in each line of the table, e i, is 5 or more When would this not be a good approximation? Example: suppose your null hypothesis for the die game was instead that you had the following probabilities of each outcome: (1: 0.01), (2, 0.01), (3, 0.01), (4, 0.95), (5, 0.01), (6, 0.01) In 0 throws of the die, your expected frequencies would be (1, 1, 1, 95, 1, 1) These 1 s are too small for the approximation to be good. For this model, you d need at least 500 throws to be confident in using the χ 2 tables for the χ 2 test Other assumptions for the χ 2 test Summary of steps Each observation belongs to one and only one category Independence: each observed frequency is independent of all the others One thing this means: χ 2 is not for repeatedmeasures experiments in which, e.g., each subject gets rated on a task both when they do it right-handed, and when they do it left-handed. Collect the N observations (the data) Compute the k observed frequencies, oi Select the null hypothesis Based upon the null hypothesis and N, predict the expected frequencies for each outcome, e i, i=1:k χ 2 statistic, = k 2 2 ( o i ei ) Compute the χ i = 1 e i 9

10 Summary of steps, continued For a given a, find χ 2 crit for k 1 d degrees of freedom Note that the number of degrees of freedom depends upon the model, not upon the data If χ 2 > χ 2 crit, then reject the null hypothesis -- decide that the model is not a good fit. Else maintain the null hypothesis as valid. An intuition worth checking How does N affect whether we accept or reject the null hypothesis? Recall from the fair coin example that if we tossed the coin times and 60% of the flips were heads, we were not so concerned about the fairness of the coin. But if we tossed the coin 00 times, and still 60% of the tosses were heads, then, based on the law of averages, we were concerned about the fairness of the coin. Surely, all else being equal, we should be more likely to reject the null hypothesis if N is larger. Is this true for the χ 2 test? On the face of it, it s not obvious. d.f. is a function of k, not N. Checking the intuition about the effect of N Suppose we double N, and keep the same relative observed frequencies i.e. if we observed instances of outcome i with N observations, in 2N observations we observe 20 instances of outcome i. What happens to χ 2? k (2o 2 ) k 4 ( ) 2 ( χ ) = i ei o = i ei new = 2 χ i = 1 2 e i i = 1 2 e i Effect of N on χ 2 All else being equal, increasing N increases χ 2 So, for a given set of relative observed frequencies, for larger N we are more likely to reject the null hypothesis This matches with our intuitions about the effect of N

11 What happens if we do the coin flip example the χ 2 way instead of the z-test way? 0 coin flips, see 60 heads. Is the coin fair? Table: Observed Outcome frequency Heads Tails Sum: Expected frequency 0 χ 2 = (60-50) 2 /50 + (40-50) 2 /50 = 4 Looking it up in the χ 2 table under d.f.=1, it looks like p is just slightly < 0.05 We conclude that it s likely that the coin is not fair Look at this a bit more closely Our obtained value of the χ 2 statistic was 4 z obt was 2 (zobt) 2 = χ 2 Also, z 2 = (χ 2 crit ) crit for d.f. = 1 So, we get the same answer using the z-test as we do using the χ 2 test This is always true when we apply the χ 2 test to the binomial case (d.f. = 1) Another use of the χ 2 test Suppose you take N samples from a distribution, and you want to know if the samples seem to come from a particular distribution, e.g. a normal distribution E.G. heights of N students We can use the χ 2 test to check whether this data is well fit by a normal distribution A χ 2 test for goodness of fit to a normal distribution First, must bin the data into a finite number of categories, so we can apply the χ 2 test: Height (inches) h< h< h< h< h Sum: Observed frequency Expected frequency 11

12 A χ 2 test for goodness of fit to a normal distribution Next, we find the candidate parameters of the normal distribution, from the mean height, and its standard deviation It turns out that for this data, mh = 67.45, and s = 2.92 A χ 2 test for goodness of fit to a normal distribution Given that mean and standard deviation, find the z-scores for the category boundaries We will need to know what fraction of the normal curve falls within each category, so we can predict the expected frequencies Boundary z Get the area under the normal curve from - to z, for each boundary value of z See hyperstat/ z_table.html for a convenient way to get these values Boundary z Area below z What fraction of the area under the normal curve is in each category? Boundary z Area below z Height (inches) h< h< h< h< h Sum: Observed frequency Fraction of the area

13 What frequency do we expect for each category? Multiply the expected fraction by the total number of samples, 0 If you do any rounding, make sure the sum = N = 0! Height Observed Fraction of the Expected (inches) frequency area frequency h< h< h< h< h Sum: (Don t worry too much about one bin out of 5 having slightly fewer than 5 counts) Now (finally) we re ready to do the χ 2 test Compute χ 2 = (5-4) 2 /4 + (18-21) 2 /21 + (42-39) 2 /39 + (27-28) 2 /28 + (8-8) 2 /8 = ¼ + 9/21 + 9/39 + 1/28 = Degrees of freedom = k 1 d = 2 k = 5 ( = # categories = # terms in χ 2 eq n) d = 2 ( = number of population parameters estimated from the data in computed expected frequencies. Estimated µ and σ.) Find critical value for α=0.05: χ 2 crit = 5.99 The normal distribution seems to be a very good fit to the data (we maintain the null hypothesis as viable) Another example How to combine results across independent experiments Results in the lower tail of the χ 2 distribution (as opposed to the upper tail, which is what we ve looked at so far) The lower tail of the χ 2 distribution In the previous examples, we looked in the upper tail of the χ 2 distribution, to judge whether the disagreement between observed and expected frequencies was great enough that it was unlikely to have occurred by chance, if the null hypothesis model was correct We can also look in the lower tail, and judge whether the agreement between data and model is too good to be true 13

14 Gregor Mendel and genetics In 1865, Mendel published an article providing a scientific explanation for heredity. This eventually led to a revolution in biology. Gregor Mendel and genetics Mendel ran a series of experiments on garden peas Pea seeds are either yellow or green. Seed color is an indicator, even before you plant it, of the genes of the child plant the seed would grow into. Mendel bred a strain that had only yellow seeds, and another that had only green seeds. Then he crossed the two, to get a 1st generation yellowgreen hybrid: all seeds yellow Then he crossed pairs of 1st generation hybrids, to get 2 nd generation hybrids: approximately 75% of seeds were yellow, 25% were green Genetic model Mendel postulated a theory in which y (yellow) is dominant, and g (green) is recessive y/y, y/g, and g/y make yellow g/g makes green One gene is chosen at random from each parent First generation parents are either gy or yg. So how many green seeds and how many yellow seeds do we expect from them? Genetic model The model which describes the number of green and yellow seeds should (Mendel) postulated) be one in which you randomly choose, with replacement, a yellow or green ball from a box with 3 yellow balls and one green ball In the long run, you d expect the seeds to be roughly 75% yellow, 25% green 14

15 One of Mendel s experiments In fact, this is very close to the results that Mendel got. For example, on one experiment, he obtained nd -generation hybrid seeds He expected 8023/ green seeds, and got 2001 Very close to what s predicted by the model! Mendel, genetics, and goodness-offit Mendel s made a great discovery in genetics, and his theory has survived the rigor of repeated testing and proven to be extremely powerful However, R.A. Fisher complained that Mendel s fits were too good to have happened by chance. He thought (being generous) that Mendel had been deceived by a gardening assistant who knew what answers Mendel was expecting Pooling the results of multiple experiments The thing is, the experimental results just mentioned have extremely good agreement with theoretical expectations This much agreement is unlikely, but nonetheless could happen by chance But it happened for Mendel on every one of his experiments but one! Fisher used the χ 2 test to pool the results of Mendel s experiments, and test the likelihood of getting so much agreement with the model Pooling the results of multiple experiments When you have multiple, independent experiments, the results can be pooled by adding up the separate χ 2 statistics to get a pooled χ 2. The degrees of freedom also add up, to get a pooled d.f. 15

16 Pooling the results of multiple experiments: mini-example Exp. 1 gives χ 2 =5.8, d.f.=5 Exp. 2 gives χ 2 =3.1, d.f.=2 The two experiments pooled together give χ 2 = =8.9, with 5+2=7 d.f. Fisher s results For Mendel s data, Fisher got a pooled χ 2 value under 42, with 84 degrees of freedom If one looks up the area to the left of 42 under the χ 2 curve with 84 degrees of freedom, it is only about 4/0,000 The probability of getting such a small value of χ 2, i.e. such good agreement with the data, is only Fisher s results Fisher took Mendel s genetic model for granted that wasn t what he was testing Based on his results, Fisher rejected his null hypothesis that Mendel s data were gathered honestly in favor of his alternative hypothesis that Mendel s data were fudged A lower tail example testing whether our data fits a normal distribution Recall our earlier example in which we wanted to know if our height data fit a normal distribution We got a χ 2 value of , with 2 degrees of freedom, and concluded from looking in the area of the upper tail of the χ 2 distribution ( χ 2 >0.9451) that we could not reject the null hypothesis The data seemed to be well fit by a normal distribution Could the fit have been too good to be true? 16

17 A lower tail example testing whether our data fits a normal distribution To test whether the fit was too good, we look at the area in the lower tail of the χ 2 distribution With α=0.05, we get a critical value of χ = is considerably larger than this critical value We maintain the null hypothesis, and reject the alternative hypothesis = the fit is too good to be true. The data seem just fine. One-way vs. two-way χ 2 Our examples so far have been one-way χ 2 tests. Naming of tests is like for factorial designs One-way means one factor E.G. factor = number on die E.G. factor = color of peas E.G. factor = into which category did our sample fall? Two-way χ 2 : two factors Two-way χ 2 Often used to test independence E.G: are handedness (right-handed vs. left-handed vs. ambidextrous) and gender (male vs. female) independent? In other words, is the distribution of handedness the same for men as for women? Test of independence = test of whether two distributions are equal Handedness vs. gender Suppose you have a sample of men and women in the population, and you ask them whether they are left-handed, right-handed, or ambidextrous. Get data something like this: Men Women Right-handed Left-handed Ambidextrous 20 8 An m n = Total: table 17

18 Is the distribution of handedness the same for men as for women? It s hard to judge from the previous table, because there are more men than women. Convert to percentages, i.e. what percent of men (women) are right-handed, etc? Right-handed Left-handed Ambidextrous Sum: Men 87.5%.6% 1.9% 0% Women 91.5% 7.9% 0.7% ~0% (numbers in table don t quite add up right due to rounding don t worry about it, we re just eyeballing the percentages) Is the distribution of handedness the same for men as for women? From this table, it seems that the distribution in the sample is not the same for men as for women. Women are more likely to be right-handed, less likely to be lefthanded or ambidextrous Do women have more developed left brains (are they more rational? ) Do women feel more social pressure to conform and be right-handed? Or is the difference just due to chance? Even if the distributions are the same in the population, they might appear different just due to chance in the sample. Using the χ 2 test to test if the observed difference between the distributions is real or due to chance Basic idea: Construct a table of the expected frequencies for each combination of handedness and gender, based on the null hypothesis of independence (Q: how? ) Compute χ 2 as before = m n 2 2 ( o ij eij ) χ e i = 1j = 1 Compare the computed χ 2 with the critical value from the χ 2 table (Q: how many degrees of freedom?) ij Computing the table of expected frequencies First, take row and column totals in our original table: Right-handed Left-handed Ambidextrous Total: Men Women Total

19 Right-handed Left-handed Ambidextrous Total: Men Women From this table, the percentage of right-handers in the sample is 2004/ % If handedness and sex are independent, the number of right-handed men in the sample should be 89.6% of the number of men (67) (0.896)(67) 956 Similarly, the number of left-handed women should be 205/ , and so on for the other table entries Total The full table, and χ 2 computation Observed Expected Men Women Men Women Right-handed Left-handed Ambidextrous χ 2 = ( ) 2 /956 + (70-48) 2 /48 + (113-98) 2 /98 + (92-7) 2 /7 + (20-13) 2 /13 + (8-15) 2 /15 12 Degrees of freedom When testing independence in an m n table, the degrees of freedom is (m-1) (n-1) There s again a -d term if you had to estimate d population parameters in the process of generating the table. Estimating the % of right-handers in the population doesn t count There wont be a -d term in any of our two-way examples or homework problems So, d.f. = (3-1)(2-1) = 2 χ 2 So, are the two distributions different? = 12, d.f. = 2 Take α = 0.01 χ 2 crit = > p<0.01 We reject the null hypothesis that handedness is independent of gender (alt: that the distribution of handedness is the same for men as for women), and conclude that there seems to be a real difference in handedness for men vs. women. 19

20 Summary One-way χ 2 test for goodness-of-fit d.f. = k 1 d for k categories (k rows in the one-way table) Two-way χ 2 test for whether two variables are independent (alt: whether two distributions are the same) d.f. = (m-1)(n-1) d for an m n table Can test both the upper tail (is the data poorly fit by the expected frequencies? ) and the lower tail (is the data fit too good to be true?) χ 2 = sum[(observed expected) 2 / expected] 20

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Chapter 5. Discrete Probability Distributions

Chapter 5. Discrete Probability Distributions Chapter Problem: Did Mendel s result from plant hybridization experiments contradicts his theory? 1. Mendel s theory says that when there are two inheritable

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### Appendix 2 Statistical Hypothesis Testing 1

BIL 151 Data Analysis, Statistics, and Probability By Dana Krempels, Ph.D. and Steven Green, Ph.D. Most biological measurements vary among members of a study population. These variations may occur for

### Chapter 19 The Chi-Square Test

Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

### Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

### 9. Sampling Distributions

9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

### In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

### MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

### Elementary Statistics

lementary Statistics Chap10 Dr. Ghamsary Page 1 lementary Statistics M. Ghamsary, Ph.D. Chapter 10 Chi-square Test for Goodness of fit and Contingency tables lementary Statistics Chap10 Dr. Ghamsary Page

### Name: Date: Use the following to answer questions 3-4:

Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean

### Confidence Intervals on Effect Size David C. Howell University of Vermont

Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting

### 3. Analysis of Qualitative Data

3. Analysis of Qualitative Data Inferential Stats, CEC at RUPP Poch Bunnak, Ph.D. Content 1. Hypothesis tests about a population proportion: Binomial test 2. Chi-square testt for goodness offitfit 3. Chi-square

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual

### TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2

About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

### Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

### Chi Square Distribution

17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### 5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives We often have to compute the total of the observations in a

### Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Basic Probability and Statistics Review Six Sigma Black Belt Primer Pat Hammett, Ph.D. January 2003 Instructor Comments: This document contains a review of basic probability and statistics. It also includes

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### AP Statistics 7!3! 6!

Lesson 6-4 Introduction to Binomial Distributions Factorials 3!= Definition: n! = n( n 1)( n 2)...(3)(2)(1), n 0 Note: 0! = 1 (by definition) Ex. #1 Evaluate: a) 5! b) 3!(4!) c) 7!3! 6! d) 22! 21! 20!

### 7. Normal Distributions

7. Normal Distributions A. Introduction B. History C. Areas of Normal Distributions D. Standard Normal E. Exercises Most of the statistical analyses presented in this book are based on the bell-shaped

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

### Analysis of Questionnaires and Qualitative Data Non-parametric Tests

Analysis of Questionnaires and Qualitative Data Non-parametric Tests JERZY STEFANOWSKI Instytut Informatyki Politechnika Poznańska Lecture SE 2013, Poznań Recalling Basics Measurment Scales Four scales

### Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### Session 8 Probability

Key Terms for This Session Session 8 Probability Previously Introduced frequency New in This Session binomial experiment binomial probability model experimental probability mathematical probability outcome

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Calculate and interpret confidence intervals for one population average and one population proportion.

Chapter 8 Confidence Intervals 8.1 Confidence Intervals 1 8.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Calculate and interpret confidence intervals for one

### Solutions to Homework 6 Statistics 302 Professor Larget

s to Homework 6 Statistics 302 Professor Larget Textbook Exercises 5.29 (Graded for Completeness) What Proportion Have College Degrees? According to the US Census Bureau, about 27.5% of US adults over

### Chapter 20: chance error in sampling

Chapter 20: chance error in sampling Context 2 Overview................................................................ 3 Population and parameter..................................................... 4

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event

### 13.0 Central Limit Theorem

13.0 Central Limit Theorem Discuss Midterm/Answer Questions Box Models Expected Value and Standard Error Central Limit Theorem 1 13.1 Box Models A Box Model describes a process in terms of making repeated

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### STAT 35A HW2 Solutions

STAT 35A HW2 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/09/spring/stat35.dir 1. A computer consulting firm presently has bids out on three projects. Let A i = { awarded project i },

### Reporting Statistics in Psychology

This document contains general guidelines for the reporting of statistics in psychology research. The details of statistical reporting vary slightly among different areas of science and also among different

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Analysis of Variance. MINITAB User s Guide 2 3-1

3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Chapter 2: Systems of Linear Equations and Matrices:

At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### www.rmsolutions.net R&M Solutons

Ahmed Hassouna, MD Professor of cardiovascular surgery, Ain-Shams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A- Choose the best answer The duration

Name: OUTLINE SOLUTIONS University of Chicago Graduate School of Business Business 41000: Business Statistics Solution Key Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper

### A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

### Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

### Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

Babraham Bioinformatics Introduction to Statistics with SPSS (15.0) Version 2.3 (public) Introduction to Statistics with SPSS 2 Table of contents Introduction... 3 Chapter 1: Opening SPSS for the first

### IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

### This document contains Chapter 2: Statistics, Data Analysis, and Probability strand from the 2008 California High School Exit Examination (CAHSEE):

This document contains Chapter 2:, Data Analysis, and strand from the 28 California High School Exit Examination (CAHSEE): Mathematics Study Guide published by the California Department of Education. The

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather

### List of Examples. Examples 319

Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

### Testing a Hypothesis about Two Independent Means

1314 Testing a Hypothesis about Two Independent Means How can you test the null hypothesis that two population means are equal, based on the results observed in two independent samples? Why can t you use

### THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING 1 COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS Fall 2013 & Danice B. Greer, Ph.D., RN, BC dgreer@uttyler.edu Office BRB 1115 (903) 565-5766

### Chapter 9: Two-Sample Inference

Chapter 9: Two-Sample Inference Chapter 7 discussed methods of hypothesis testing about one-population parameters. Chapter 8 discussed methods of estimating population parameters from one sample using

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

### Main Effects and Interactions

Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly

### Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Guide to Microsoft Excel for calculations, statistics, and plotting data

Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

### Basic Probability Theory II

RECAP Basic Probability heory II Dr. om Ilvento FREC 408 We said the approach to establishing probabilities for events is to Define the experiment List the sample points Assign probabilities to the sample

### STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS 1. If two events (both with probability greater than 0) are mutually exclusive, then: A. They also must be independent. B. They also could

### socscimajor yes no TOTAL female 25 35 60 male 30 27 57 TOTAL 55 62 117

Review for Final Stat 10 (1) The table below shows data for a sample of students from UCLA. (a) What percent of the sampled students are male? 57/117 (b) What proportion of sampled students are social

### Descriptive and Inferential Statistics

General Sir John Kotelawala Defence University Workshop on Descriptive and Inferential Statistics Faculty of Research and Development 14 th May 2013 1. Introduction to Statistics 1.1 What is Statistics?

### CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING STATISTICS @ T.C. Resort Properties

CHAPTER 1 Chi-Square Tests and Nonparametric Tests USING STATISTICS @ T.C. Resort Properties 1.1 CHI-SQUARE TEST FOR THE DIFFERENCE BETWEEN TWO PROPORTIONS (INDEPENDENT SAMPLES) 1. CHI-SQUARE TEST FOR

### 6.042/18.062J Mathematics for Computer Science. Expected Value I

6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### Education & Training Plan Accounting Math Professional Certificate Program with Externship

University of Texas at El Paso Professional and Public Programs 500 W. University Kelly Hall Ste. 212 & 214 El Paso, TX 79968 http://www.ppp.utep.edu/ Contact: Sylvia Monsisvais 915-747-7578 samonsisvais@utep.edu

### People like to clump things into categories. Virtually every research

05-Elliott-4987.qxd 7/18/2006 5:26 PM Page 113 5 Analysis of Categorical Data People like to clump things into categories. Virtually every research project categorizes some of its observations into neat,