# Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Save this PDF as:

Size: px
Start display at page:

Download "Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:"

## Transcription

1 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 1: CHI-SQUARE TESTS: When to use a Chi-Square test: Usually in psychological research, we aim to obtain one or more scores from each subject. However, sometimes data consist merely of the frequencies with which certain categories of event occurred. Instead of each subject providing a score, subjects contribute to a "head count": they fall into one category or another. In these circumstances, where one has categorical data, one of the family of Chi-Squared tests is the appropriate test to use. To make this clear, let's consider a concrete example. Imagine that you were interested in the type of transport used by people to arrive at the university. Do people prefer some forms of transport rather than others? One way to tackle this question might be to provide people with a set of alternatives (car, bus, motorcycle, bicycle, etc) and ask them which method of transport they habitually used. Out of 100 people, you might end up with 60 who said they used their car, 0 who said they came by bus, 10 who came by motorcycle and 10 who came by bicycle. Each person asked would therefore fall into one of these categories. We would now have a set of observed frequencies of certain things happening. We might now ask: are these observed frequencies similar to what we might expect to find by chance, or is there some non-random pattern to them? In the example above, do people show no particular preference for one form of transport over another, or are some forms of transport preferred and others disliked? In this particular case, it's pretty clear-cut: just looking at the frequencies would enable us to claim that most people come by car and that other forms of transport are relatively unpopular. But what if the frequencies had been 30,, 3, 5? Given these obtained frequencies, we might not be as confident that cars were the most popular form of transport. The Chi-Square test helps us to decide whether or not our observed frequencies are due to chance, by comparing our observed frequencies to the frequencies that we might expect to obtain purely by chance. Chi-Square is a very versatile statistic that crops up in lots of different circumstances. We will concentrate on two applications of it. (a) Chi-Square "Goodness of Fit" test: This is used when you want to compare an observed frequency-distribution to a theoretical frequency-distribution. The most common situation is where your data consist of the observed frequencies for a number of mutually-exclusive categories, and you want to know if they all occur equally frequently (so that our theoretical frequency-distribution contains categories that all have the same number of individuals in them). Our example above would be appropriate for this test: we have one independent variable ("type of transport habitually used"), and a number of different levels of it (car, bus, bike, etc). We wanted to know if some forms of transport were preferred over others, or whether there were no preferences. In this context, "no preferences" means "all categories occurred with equal frequencies". Another example might involve applications to study psychology at Sussex: we might want to see if certain schools of study are more popular with applicants than others. The independent variable might be "school of study", and the levels of this variable would be the different schools - SOC, BIOLS, COGS, AFRAS and CCS. Our data would consist of the number of applicants to each school. Again, we would be comparing these observed frequencies to those that would occur if all schools were equally popular. A third example: we might be interested in the relationship between age of driver and likelihood of being involved in an accident. We could obtain the records of 1000 accidents, and see how many of the drivers involved fell into each of the following age-categories: 1-30, 31-40, 41-50, 51-60, 61-70, and If there is no relationship between accident-rate and age, then there should be similar numbers of drivers in each category (all categories should occur with equal frequency). If on the other hand, younger drivers are more likely to have accidents, then there would be a large number of accidents in the younger age-categories and a low number of accidents in the older age-categories.

2 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page : Chi-Square Goodness of Fit test - a worked example: We select a random sample of 100 UCAS psychology applicants to Sussex, and find that they are distributed across five Schools of Study in the following way (fictional data, I hasten to add!) School of Study: CCS SOC AFRAS COGS BIOLS total Observed frequency: Are all Schools equally popular, as far as psychology applications are concerned? Or do some Schools attract more applications than others? Looking at the observed frequencies, it appears that CCS and SOC are much more popular than the other Schools. We can perform a Chi-Square test on these data to find out if this is true, i.e. to see the pattern of applications deviates significantly from what we might expect to obtain if all schools were equally popular with applicants. Our statistical hypothesis: As with most statistical tests, we are faced with two possibilities. One is that there is no difference between the Schools in terms of application rate; any differences between our observed frequencies and those which we would expect to obtain, are just minor discrepancies which are due to chance. The alternative possibility is that there is a real difference between Schools in the application rates; the observed discrepancies are so large that they are unlikely to have arisen merely by chance - it is more likely that they reflect some "real" phenomenon, in this case the fact that some Schools are truly preferred over others. We start off by assuming that the former possibility is true: this is the so-called "null hypothesis", that any discrepancies between our observed frequencies and those frequencies which we would expect to get, are due merely to chance. Only if the evidence is strong enough, will we reject the null hypothesis in favour of the alternative hypothesis - that our data really are different from the expected pattern of frequencies. What we mean by "evidence" in the context of the Chi-Square test, is that the discrepancies between the observed and expected frequencies are so large that they cannot be "explained away" as having occurred merely due to random variation. Here is the Chi-Square formula: χ = ( O E) E Step 1: calculate the "expected frequencies" (the frequencies which we would expect to obtain if there were no School preferences). total number of instances Expected frequency = number of categories We have 100 applications and 5 categories, so the Expected Frequency for each category is 0 (i.e., 100/5): CCS SOC AFRAS COGS BIOLS total: Observed frequency: Expected frequency:

3 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 3: Step : subtract, from each observed frequency, its associated expected frequency (i.e., work out (O-E) ): O - E : Step 3: square each value of (O-E): (O - E) : ( O E) E Step 4: divide each of the values obtained in step 3, by its associated expected frequency: : Step 5: add together all of the values obtained in step 4, to get your value of Chi-Square: χ = = 5.5 This is our obtained value of Chi-Square; it is a single-number summary of the discrepancy between our obtained frequencies, and the frequencies which we would expect if all of the categories had equal frequencies. The bigger our obtained Chi-Square, the greater the difference between the observed and expected frequencies. How do we assess whether this value represents a "real" departure of our obtained frequencies from the expected frequencies? To do this, we need to know how likely it is to obtain various values of Chi-Square by chance. Assessing the size of our obtained Chi-Square value: (1) What you do, in a nutshell... (a) Find a table of "critical Chi-Square values" (at the back of most statistics textbooks). (b) Work out how many "degrees of freedom" (d.f.) you have. For the Goodness of Fit test, this is simply the number of categories minus one. (c) Decide on a probability level. (d) Find the critical Chi-Square value in the table (at the intersection of the appropriate d.f. row and probability column). If your obtained Chi-Square value is bigger than the one in the table, then you conclude that your obtained Chi-Square value is too large to have arisen by chance; it is more likely to stem from the fact that there were real differences between the observed and expected frequencies. In other words, contrary to our null hypothesis, the categories did not occur with similar frequencies. If, on the other hand, your obtained Chi-Square value is smaller than the one in the table, you conclude that there is no reason to think that the observed pattern of frequencies is not due simply to chance (i.e., we retain our initial assumption that the discrepancies between the observed and expected frequencies are due merely to random sampling variation, and hence we have no reason to believe that the categories did not occur with equal frequency). For our worked example... (a) We have an obtained Chi-Square value of 5.5. (b) We have five categories, so there are 5-1 = 4 degrees of freedom. (c) Consult a table of "critical values of Chi-Square". Here's an excerpt from a typical table:

4 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 4: probability level: d.f etc. etc (d) The values in each column are "critical" values of Chi-Square. These values would be expected to occur by chance with the probabiity shown at the top of the column. We are interested in the values in the 4 d.f. row, since our obtained Chi-Square has 4 d.f..(if we had a different number of d.f., we would use a different row of the table: so, with 5 d.f., you use the 5 d.f. row, 10 d.f. you use the 10 d.f. row, and so on)). With 4 d.f., obtained values of Chi-Square as large as 9.49 will occur by chance with a probability of 0.05: in other words, you would expect to get a Chi-Square value of 9.49 or more, only 5 times in a hundred Chi-Squared tests. Thus, values of Chi-Square that are this large do occur by chance, but not very often. Looking along the same row, we find that Chi-Square values of 13.8 are even more uncommon: they occur by chance once in a hundred times (0.01). Values of Chi-Square as large as are extremely unlikely to crop up by chance: the probability of them doing so is less than once in a thousand Chi-Square tests. (e) We compare our obtained Chi-Square to these tablulated values. If our obtained Chi- Square is larger than a value in the table, it implies that our obtained value is even less likely to occur by chance than is the value in the table. Our obtained value of 5.5 is much bigger than the critical value of 18.46; and so we can conclude that the chances of our obtained Chi-Squared value having occurred by chance are less than one in a thousand (written formally, "p<0.001"). We can therefore be relatively confident in concluding that our observed frequencies are significantly different from the frequencies that we would expect to obtain if all categories were equally popular. In other words, not all Schools are equally popular with UCAS applicants. () Why we do it - the Sampling Distribution of Chi-Square: In essence, our obtained Chi-Square value represents the sum of the discrepancies between the obtained and expected frequencies. In interpreting it, our problem is that our data come from a random sample which has been taken from a population. Random variation in our sample means that our observed pattern of category frequencies may or may not be a good reflection of what's going on in the parent population. In the present example, we have a sample of 100 UCAS applicants, and the pattern within that sample may or may not be a good reflection of the pattern of preferences within the whole population of UCAS applicants to Sussex. In short, there is a "true" state of affairs within the population. We are trying to work out on the basis of our sample, what that state of affairs is - but our sample is all we actually have to work with. Suppose for the moment that the expected frequencies are the true ones within the population - in the present example, that there really is no difference between the Schools in terms of application rates. In practice, even if this were so, this state of affairs is unlikely to be reflected perfectly accurately in our sample. Imagine that we took lots of different random samples from the population of UCAS applicants to Sussex. Due to random variation from sample to sample, the observed and expected frequencies would rarely match exactly; more often, they would be fairly close; and occasionally, we would get a weird sample and the observed and expected frequencies would not appear to match at all. Although the true state of affairs was that all of the categories occurred with equal frequency, purely by chance a sample would sometimes give rise to an odd-looking frequency distribution and hence produce a large value of Chi-Square. If we took this large value of Chi-Square at face value, we might falsely conclude that there was a "real" difference in the frequencies of our categories, when in fact our obtained pattern of frequencies was distorted and really there was no difference in the parent population. We clearly need to know how likely it is to obtain small, moderate and large values of Chi- Square from sample data, when the true state of affairs in the parent population is that all categories occur with equal frequencies. Fortunately, we don't need to actually do this for ourselves: statisticians have worked it all out for us.

5 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 5: The following graph shows what values of Chi-Square you get, when you take repeated random samples from a population in which the expected frequencies are true (i.e., when in fact there is no preference at all in the parent population for some categories rather than others). high relative frequency of obtaining a particular value of χ : low χ The first thing to note is that the sampling distribution of Chi-Square looks rather different depending on the number of degrees of freedom. When we have 4 d.f., we are most likely to get a value of Chi-Square that is close to 4; sometimes Chi-Square will be much larger than 4 and sometimes it will be much smaller. The larger (or smaller) the value of Chi-Square, the less likely it is to occur. When we have 10 d.f., we are most likely to get a value of Chi-Square that is about 10; again, smaller and larger values are progressively less likely to be obtained. These distributions reflect the fact that very small and very large discrepancies between the observed and expected frequencies are unlikely to occur by chance. Note that, with small d.f., the Chi-Square distribution becomes progressively more asymmetrical, or "skewed": for example, with 4 d.f., large values of Chi-Square are much easier to obtain than very small values. The graph shows that in assessing the size of our obtained Chi-Square, we need to take account of how many d.f. we have. The graph also shows that, for a given value of d.f., large Chi- Square values are unlikely to arise by chance - although the possibility always remain that they can do so! In other words, if we get a large value of Chi-Square, there are two possibilities: (a) it reflects the true state of affairs in the parent population: hence there is a difference between the categories in terms of how many instances occur in each, and our theoretical frequency distribution is not a good description of the population; (b) the observed discrepancies between the observed and expected frequencies in our sample are just due to sampling variation; we have no reason to suspect that the theoretical frequency distribution is not a good description of the population. It seems more likely that our sample poorly reflects the characteristics of the population from which it was taken. We can never entirely be sure which of these two possibilities is true. However, the larger the value of Chi-Square, the more likely (a) becomes and the less likely (b) becomes. Very large Chi- Square values (and hence large discrepancies between obsrved and expected frequencies) are highly unlikely to have arisen by chance, although the possibility always remains that they have done so and are merely freak results. This fits in with our intuitions: the more marked the discrepancy betwen the frequencies which we actually obtain (from our sample), and the frequencies we expected to obtain (based on our ideas of what the parent population is like), then the more confident we would be in concluding that the discrepancy was not simply due to chance. If the discrepancies were small, we could dismiss them as being due to random variation, and maintain a belief that the charactristics of the populationn were being imperfectly reflected in our particular sample ; but there comes a point at which the discrepancies are so great that we cannot explain them away in this manner. Instead we have to accept that our sample does reflect the characteristics of the parent population; it's just that the population has different characteristics (a diffrent frequency distibruiotn) to those we originally thought it had.

6 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 6: (b) Chi-Square Test of Association between two variables: This is appropriate to use when you have nominal (categorical) data for two independent variables, and you want to see if there is an association between them. (b) Chi-Square Test of Association - worked example: We could take our hypothetical 100 UCAS applicants and categorise them on two independent variables: educational background (two levels: Science versus Arts) and School choice (five levels: COGS, SOC, CCS, BIOLS or CCS). What we want to know is: is there a non-random association between these two variables: for example, do students with an Arts background show a different pattern of applications to that shown by students with a Science background? Our observed frequencies can be put into a contingency table, as follows: SCHOOL OF STUDY: COGS SOC AFRAS BIOLS CCS column total: ARTS SCIENCE row totals: These are our observed frequencies. In this particular case, simply looking at the observed frequencies gives us an idea that there is a systematic relationship between our two independent variables: CCS seems very popular with Arts students and unpopular with Science students, and the opposite seems to be true for BIOLS. In real life, however, things are often not as clear-cut as this. As with the one-variable Chi-Square test, our aim is to see if the pattern of observed frequencies is signficantly different from the pattern of frequencies which we would exect to see by chance - i.e., would expect to obtain if there were no relationship between the two variables in question. With respect to the example above, "no relationship" would mean that the pattern of school preference shown by Arts students was no different to that shown by Science students. The Chi-Square formula is exactly the same as for the one-variable test described earlier; the only difference is in how you calculate the expected frequencies. Step 1: calculate the "expected frequencies" (the frequencies which we would expect to obtain if there were no association between School preference and educational background). row total * column total Expected frequency = grand total Do this for each cell in the table above. Here is that table again, with the expected frequencies in brackets below the obtained frequencies: COGS SOC AFRAS BIOLS CCS row total: ARTS (11.) (9.8) (4.) (16.1) (8.7) SCIENCE (4.8) (4.) (1.8) (6.9) (1.3) column totals: Thus the expected frequency for Arts students applying to COGS is 70 (the row total representing the total number of Arts students) times 16 (the column total representing the total number of applications to COGS), divided by the grand total (100). (70 * 16)/100 = 11.. The expected frequency for Science students applying to BIOLS is 30 (the row total corresponding to the total number of Science students) times 3 (the total number of students

7 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 7: applying to BIOLS) divided by the grand total (100). (30 * 3)/100 = 6.9. And so on for the rest of the table. Note that the expected frequencies should add up to the same total as the observed frequencies (100 in this case), give or take a small rounding error; if they do not, you have made a mistake in the arithmetic somewhere. Step : subtract, from each observed frequency, its associated expected frequency (i.e., work out (O-E) ): COGS SOC AFRAS BIOLS CCS ARTS O: E: O-E: SCIENCE O: E: O-E: Step 3: square each value of (O-E): ARTS (O-E) : SCIENCE (O-E) : (NB: It's just a coincidence that for this particular example, the (O-E) values are identical for the Arts and Science students; there's no reason why they should be). Step 4: divide each of the values obtained in step 3, by its associated expected frequency: ARTS: ( O E) E 1.44/ / / / /8.7 = 0.13 = = 0.15 = = SCIENCE: ( O E) 1.44/ / / / /1.3 E = = 1.15 = = = Step 5: add together all of the values obtained in step 4, to get your value of Chi-Square: χ = χ = 5.94 This is our obtained value of Chi-Square; it is a single-number summary of the discrepancy between our obtained frequencies, and the frequencies which we would expect if there was no association between our two variables.the bigger our obtained Chi-Square, the greater the difference between the observed and expected frequencies.

8 Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 8: Interpreting Chi-Square: To assess whether this value represents a "real" departure of our obtained frequencies from the expected frequencies, we compare our obtained Chi-Square value to the appropriate "critical" value of Chi-Square obtained from the same table as we used before. To do this, we need to know how many degrees of freedom we have. The d.f. are given by the following formula: d.f. = (number of rows - 1) * (number of columns - 1) In the present example, we have two rows and five columns, so we have (-1) * (5-1) = 4 d.f. If our obtained Chi-Square is larger than a value in the table, it implies that our obtained value is even less likely to occur by chance than is the value in the table. Our obtained value of 5.94 is much bigger than the critical value of 18.46; and so we can conclude that the chances of our obtained Chi-Squared value having occurred by chance are less than one in a thousand (written formally, "p<0.001"). We can therefore be relatively confident in concluding that our observed frequencies are significantly different from the frequencies that we would expect to obtain if there were no association between the two variables. In other words, the pattern of applications to various Schools is different for Arts and Science students. Note that the Chi-Square test merely tells you that there is some relationship between the two variables in question: it does not tell you what that relationship is, and most importantly, it does not tell you anyhting about the causal relationship between the two variables. In the present example, it would be tempting to interpret the results as showing that possession of an Arts or Science background causes people to apply to different Schools of Study. However, the direction of causality could equally well go the other way: possibly students begin by choosing a preferred School of Study, and then pick A-levels which will be most apporopriate for that School. Chi-Square merely tells you that the two variables are associated in some way: precisely what that association is, and what it means, is for you to decide - the test does not do it for you. Another problem in interpreting Chi-Square is that the test tells you only that the observed frequencies are different from the expected frequencies in some way: the test does not tell you where this difference comes from. Sometimes, it might be that all of the observed frequencies are significantly discrepant from the expected frequencies, whereas on other occasions it might be that the significant Chi-Square value has arisen because of one particular marked discrepancy between the observed and expected frequencies. Usually, you can get some idea of why the Chi-Square was significant by looking at the size of the discrepancies between the observed and expected frequencies. In the present example, the discrepancies between observed and expected frequencies are relatively minor for Arts and Science students applying to COGS, SOC and AFRAS; about as many students apply to these Schools as you would expect if there were no differences between Arts and Science students in application-preference. The major discrepancies occur for CCS and BIOLS: inspection of the O-E discrepancies suggests that Arts students apply to CCS much more often than one would expect, and apply to BIOLS much less often than one would expect.for example, one would expect 16 applications to BIOLS by the Arts students, but in practice there wre only three. The Science students show the opposite pattern with their CCS and BIOLS preferences. In many cases, however, Chi- Square contingency tables are not so easy to interpret! Assumptions of the Chi-Square test: For a Chi-Square test to be used, the following assumptions must hold true: 1. Your data are a random sample from the population about which inferences are to be made.. Observations must be independent: each subject must give one and only one data point (i.e., they must contribute to one and only one category). 3. Problems arise when the expected frequencies are very small. As a rule of thumb, Chi- Square should not be used if more than 0% of the expected frequencies have a vlaue of less than 5. (It does not matter what the observed frequencies are). Note that both of the examples used in this handout violate this rule: out of 5 of the expected frequencies in the goodness of fit example, and 4 out of the 10 expected frequencies in the contingency table example, are less than 5! You can get around this problem in two ways: either combine some categories (if this is meaningful, in your experiment), or obtain more data (make the sample size bigger).

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

### Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.

### Crosstabulation & Chi Square

Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 1: DESCRIPTIVE STATISTICS - FREQUENCY DISTRIBUTIONS AND AVERAGES: Inferential and Descriptive Statistics: There are four

### First-year Statistics for Psychology Students Through Worked Examples

First-year Statistics for Psychology Students Through Worked Examples 1. THE CHI-SQUARE TEST A test of association between categorical variables by Charles McCreery, D.Phil Formerly Lecturer in Experimental

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### Eight things you need to know about interpreting correlations:

Research Skills One, Correlation interpretation, Graham Hole v.1.0. Page 1 Eight things you need to know about interpreting correlations: A correlation coefficient is a single number that represents the

### Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### The Kruskal-Wallis test:

Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

### Using SPSS to perform Chi-Square tests:

Using SPSS to perform Chi-Square tests: Graham Hole, January 2006: page 1: Using SPSS to perform Chi-Square tests: This handout explains how to perform the two types of Chi-Square test that were discussed

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 1: NON-PARAMETRIC TESTS: What are non-parametric tests? Statistical tests fall into two kinds: parametric tests assume that

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Chi Square Distribution

17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Contingency Tables and the Chi Square Statistic Interpreting Computer Printouts and Constructing Tables Contingency Tables/Chi Square Statistics What are they? A contingency table is a table that shows

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### T-TESTS: There are two versions of the t-test:

Research Skills, Graham Hole - February 009: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition in which subjects receive some

### 4. CHI-SQUARE: INTRODUCING THE GOODNESS OF FIT TEST AND THE TEST OF ASSOCIATION

4. : INRODUCING HE GOODNESS OF FI ES AND HE ES OF ASSOCIAION Dr om Clark & Dr Liam Foster Department of Sociological Studies University of Sheffield CONENS 4. Chi-square *So now you should be able to undertake

### Sampling Distribution of the Mean & Hypothesis Testing

Sampling Distribution of the Mean & Hypothesis Testing Let s first review what we know about sampling distributions of the mean (Central Limit Theorem): 1. The mean of the sampling distribution will be

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS. Posc/Uapp 816 CONTINGENCY TABLES

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 CONTINGENCY TABLES I. AGENDA: A. Cross-classifications 1. Two-by-two and R by C tables 2. Statistical independence 3. The interpretation

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### The Chi-Square Test. STAT E-50 Introduction to Statistics

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

### LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

### SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

### Hypothesis Testing. Bluman Chapter 8

CHAPTER 8 Learning Objectives C H A P T E R E I G H T Hypothesis Testing 1 Outline 8-1 Steps in Traditional Method 8-2 z Test for a Mean 8-3 t Test for a Mean 8-4 z Test for a Proportion 8-5 2 Test for

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Descriptive Research Study Investigation of Positive and Negative Affect of UniJos PhD Students toward their PhD Research Project : Ask Question : Design Study Scientific Method 6: Share Findings. Reach

### A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

### Measuring the Power of a Test

Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

### Variables Control Charts

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Investigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness

Investigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness Josh Tabor Canyon del Oro High School Journal of Statistics Education

### AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Using SPSS 20: Handout 2. Descriptive statistics:

Research Skills One: Using SPSS 20, Handout 2: Descriptive Statistics: Page 1: Using SPSS 20: Handout 2. Descriptive statistics: An essential preliminary to any statistical analysis is to obtain some descriptive

### Chapter 23. Two Categorical Variables: The Chi-Square Test

Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise

### 1. Comparing Two Means: Dependent Samples

1. Comparing Two Means: ependent Samples In the preceding lectures we've considered how to test a difference of two means for independent samples. Now we look at how to do the same thing with dependent

### Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Book Review of Rosenhouse, The Monty Hall Problem Leslie Burkholder 1 The Monty Hall Problem, Jason Rosenhouse, New York, Oxford University Press, 2009, xii, 195 pp, US \$24.95, ISBN 978-0-19-5#6789-8 (Source

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Statistical Inference and t-tests

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

### THE LOGIC OF HYPOTHESIS TESTING. The general process of hypothesis testing remains constant from one situation to another.

THE LOGIC OF HYPOTHESIS TESTING Hypothesis testing is a statistical procedure that allows researchers to use sample to draw inferences about the population of interest. It is the most commonly used inferential

### Confidence intervals

Confidence intervals Today, we re going to start talking about confidence intervals. We use confidence intervals as a tool in inferential statistics. What this means is that given some sample statistics,

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Probability Models & Frequency Data Goodness of Fit Proportional Model Chi-square Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any

### MEASURES OF VARIATION

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

### TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2

About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

### CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

### individualdifferences

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

### SPSS: Expected frequencies, chi-squared test. In-depth example: Age groups and radio choices. Dealing with small frequencies.

SPSS: Expected frequencies, chi-squared test. In-depth example: Age groups and radio choices. Dealing with small frequencies. Quick Example: Handedness and Careers Last time we tested whether one nominal

### Comparing three or more groups (one-way ANOVA...)

Page 1 of 36 Comparing three or more groups (one-way ANOVA...) You've measured a variable in three or more groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the

### UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

### Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

### SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Chapter 8. Hypothesis Testing

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

### Comparing two groups (t tests...)

Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Level 1 Level 2 Level 3 Level 4 0 0 0 0 evel 1 evel 2 evel 3 Level 4 DESCRIPTIVE STATISTICS & DATA PRESENTATION* Created for Psychology 41, Research Methods by Barbara Sommer, PhD Psychology Department

### Using SPSS 20, Handout 3: Producing graphs:

Research Skills 1: Using SPSS 20: Handout 3, Producing graphs: Page 1: Using SPSS 20, Handout 3: Producing graphs: In this handout I'm going to show you how to use SPSS to produce various types of graph.

### Statistics. Annex 7. Calculating rates

Annex 7 Statistics Calculating rates Rates are the most common way of measuring disease frequency in a population and are calculated as: number of new cases of disease in population at risk number of persons

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### Week 4: Standard Error and Confidence Intervals

Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

### ANOVA Analysis of Variance

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### Standard Deviation Estimator

CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

### Single sample hypothesis testing, II 9.07 3/02/2004

Single sample hypothesis testing, II 9.07 3/02/2004 Outline Very brief review One-tailed vs. two-tailed tests Small sample testing Significance & multiple tests II: Data snooping What do our results mean?