EPP 750 Traditional Analysis Alternating Treatment Remember there are two primary types of statistical applications: descriptive and inferential. In descriptive analysis, measures of central tendency (e.g. mean, median), measures of variability (e.g. standard deviation, range), and sometimes measures of relationship (correlation coefficient) are used simply to summarize a group of numbers usually obtained to answer some question.
For example, suppose the question is whether students in the school psychology program are taller than students in the school counseling program. (Okay, it's a dumb question, but it's an easy illustration). We obtain measures of height for all students in both programs and find that the school psychology students (Mean = 69 inches) are taller than school counseling students (Mean = 67 inches).
If all we are interested in is to compare school psychology and school counseling students at UNLV, we are finished. No additional statistical analysis is needed. School psychology students are taller.
More often, though, our intent in the question would go beyond the boundaries of UNLV. The school psychology and school counseling students would be serving as just a sample of their two respective populations, perhaps school psychology and school counseling students in the U.S. This moves us into the area of inferential statistics, using the descriptive data from the sample to infer (guess) something about the population.
The statistical procedures to accomplish this usually involve computations with the means, standard deviations, and the number of persons in the sample groups. Although the computations are a bit complex, that really doesn't matter because computers do them for us. Understanding the outcome of the computations is not complicated. Just remember that they are designed to give us the answer to a simple question:
What are the odds that the difference found in our samples could have occurred if there were really no differences in the total populations? The answer to this simple question is found in a probability statement called a p value.
Probably the simplest form of inferential statistic to compare two mean scores from sample groups is the t test. If we applied a t test to our height question, the answer might look something like this: t(49) = 1.34, p >.05
Although that equation may look a bit complicated, remember you're primarily interested only in using the p value at the end to answer a simple question. In this case the question is: what are the odd that the difference (69 in. - 67 in. = 2 in.) found in our sample could have occurred if there were really no differences in the total populations (no difference in average height if we had measured all students in the two groups).
In this example, the odds were greater than 5 in 100 for getting this much difference between our samples if there were really no difference in the populations. t(49) = 1.34, p >.05
t(49) = 1.34, p >.05 The first letter in the equation identifies that it is a t test. The number in parentheses is technically the "degrees of freedom". Think of it as a number that tells you approximately how many people were involved in the study (in this case about 50). The number after the equal sign is the result of the calculation of the t, interpreted by the p value which follows it.
t(49) = 1.34, p >.05 The letter p in this equation stands for probability, specifically the probability of getting the difference seen with the samples if the population difference is zero. Traditionally, statisticians use a p value of.05 as the "cutting score" to claim that the difference between sample groups would also be found in the populations they represent. If the p value is less than.05, the "educated guess" is that the populations are also different. If the p value is equal to or higher than.05, the assumption is that these sample data do not allow an assumption of differences in the population.
The same general procedure is applied when an analysis of variance (ANOVA) is used as the inferential statistic. Only two groups can be compared with a t test; two or more groups can be compared with ANOVA. Suppose, staying with the same example, we wanted to compare the height of school psychology, school counseling, and special education students. The equation for the ANOVA might look something like this: F(2,99) = 2.34, p <.05
F(2,99) = 2.34, p <.05 The first letter, F, simply identifies that it is an analysis of variance (ANOVA) The numbers in parentheses are again the "degrees of freedom". The first numeral is the number of means being compared minus one. Interpret the second numeral, as in the t test, as approximately the number of people in the study.
F(2,99) = 2.34, p <.05 The number after the equal sign is the result of the calculation of the F ratio, as with the t test, interpreted by the p value which follows it. And, the letter p in this equation again stands for probability, specifically the probability of getting the differences seen with the sample if the population difference is zero. In this example, the probability is less than 5 in 100 so the null hypothesis is rejected.
Single-Case Statistical Analysis with an Alternating Treatments Design Step One: If baseline data were gathered before starting the alternating treatments, determine if there was only random variation in the baseline, using the simple time series C statistic.
Single-Case Statistical Analysis with an Alternating Treatments Design Step Two: Now you have a decision. You can use a parametric (more likely to find a difference if one really exists) analysis, or you can use a nonparametric (not as precise but less likely to be influenced by peculiarities in the data) technique.
To make this decision with the repeated measures in single-case data, your instructor suggests that you first do an autocorrelation analysis within each treatment phase. If the p values from the autocorrelation are less than.05, you should probably use a nonparametric procedure to test the null hypothesis. The analysis program provides a procedure: the Mann-Whitney U.
The null hypothesis for the autocorrelation analysis is (no surprise) that there is no autocorrelation. If the p values from the autocorrelation for each treatment phase are equal to or greater than.05, you do not reject the null hypothesis. It is then safe to use a t test (two mean scores being compared) or an ANOVA (two or more mean scores) rather than the Mann-Whitney U.
To illustrate all of this, assume that the sample data in the analysis program came from a classroom study of the relative effectiveness of two approaches for enhancing reading comprehension. During each of the first eight days, the student reads a story without any advance preparation and then answers ten questions about the story (variable 1). This is followed by introduction of two treatment conditions: Condition - LISTEN- (variable 2) the teachers reads an assigned passage orally while the student follows along and the teacher intersperses explanations about the story; the student then reads the story and answers a 10 question test.
Condition -TAPE- (variable 3) the student is instructed to turn on a tape recorder and follow along as the passage is read aloud, again with periodic explanations. The student reads the story and answers the 10 question test. The dependent variable is the number of questions answered correctly. After the initial baseline data are gathered, the two treatments, LISTEN and TAPE, which are the independent variable are randomly interspersed.
One More Concept: The Effect Size The effect size statistic is a tool once used almost exclusively in meta-analysis, a quantitative investigation combining results from many studies for joint analysis. It is now reported for individual studies that compare mean scores using inferential statistics, for example the t test and ANOVA.
An effect size is a standardized difference between groups, calculated in its simplest form by subtracting the mean of one group from the mean of another and dividing the answer by a standard deviation. The p value indicates whether a difference in mean scores is likely to have occurred by chance alone, statistical significance. The effect size provides a quantitative indicator of the practical significance.
There are several ways to calculate the effect size, but the most popular technique is Cohen's d. When comparing mean scores: d value of.80 is interpreted as a large difference, d value of.50 is interpreted as a medium difference, d value of.20 is interpreted as a small differences.
Calculators for effect size are available online (use your search engine with key words, "effect size calculator"). An example is:
Remember that the statistical significance of a difference in mean scores, indicated by the p value, depends on the size of the difference AND on the sample size. Effect size does not take into account the size of the sample and thus often provides a better indicator of the importance of the difference in the mean scores.