F TESTS AND SUMS OF SQUARES

Size: px
Start display at page:

Download "F TESTS AND SUMS OF SQUARES"

Transcription

1 F TESTS AND SUMS OF SQUARES Dennis Roberts Some of you will present your class a small end of year unit (as extra material) on simple Analysis of Variance, or ANOVA for short. This is a great idea given A) that it nicely follows from simple two group t tests of differences in means AND B) because of it s popularity in empirical research (especially experiments). As part of this introduction, I have found in the past that a good way to start this ANOVA material is by looking at the concept of SUM OF SQUARES. In ANOVA, variability in an overall data set is partitioned into parts that reflect how much variation there is in the sample MEANS... and how much variation there is WITHIN GROUPS, which is commonly called error or experimental error. I have found that IF students catch on to this notion of sums of squares, that the road thru a beginning unit on ANOVA is a much easier and worthwhile trip. So, this short unit focuses on sums of squares and how that links to ANOVA. SIMPLE t TEST FOR DIFFERENCES IN MEANS The first thing we need to do is to recall the basic setup for a simple t test of the differences in means and to see what the NUMERATOR and the DENOMINATOR of the t test... tell us. Recall that the basic two sample t is: t = (Mean1 - Mean2)/SEdifference in means The numerator reflects how much different the two sample means are... or, in a nutshell... our estimate of how much difference in effect the treatments had. Assuming that the downstairs term stays constant, then the larger is the difference in the sample means, the larger will be our t test value. The larger that t test value is, the greater chance we have to REJECT the null hypothesis. The denominator reflects sampling error. If you look at the formula for the SEdifference in means, a main ingredient in that are the estimates for the variances for EACH of the two groups. In some cases, we assume that there is but ONE common variance for the two populations from which we are sampling... and hence pool the variance estimates into one common term. This term reflects how much variation there is WITHIN the experimental 1

2 groups. Now, given some difference in means in the upstairs term, the greater is the within experimental groups variance, the SMALLER will be your t test value. Thus, having a lot of within group variability is a BAD thing for us... if we want to reject the null hypothesis. So, in a nutshell, the t test could also be shown as: t = Variance in means / Experimental error variance t = How much means vary / How much variance there is within groups So, the hope that we have when we do some analysis of the two sample difference in means case is that our numerator which reflects treatment effects will be relatively large... while the denominator which reflects within group variance or error will be relatively small. The larger is the top part in relation to the bottom part, the larger the t value will be and the better our chances for rejecting the null. SUMS OF SQUARES To examine more carefully the numerator and denominator of a t test, we need to re-examine the basic definition for a variance. In words, the variance is the average of the squared deviations around the mean. The variance can be written as: Variance = (Sum Squared Deviations Around Mean)/ n 1 In a given set of data, as the scores spread out more and more from the mean, the variance becomes a larger and larger quantity. In the variance formula, there is a numerator... and a denominator. What is the numerator? Numerator of Variance = Sum of Squared Deviations around the Mean So, if we have a simple set of data like: 5, 4, 3, 2, and 1... the mean is 3... and the deviations will be 2, 1, 0, -1, and -2 AND the squared deviations will be 4, 1, 0, 1, and 4 and the SUM of the squared deviations will be 10. Thus, for this set of 5 values... our variance calculation would be: Variance = 10 / 4 = is the variance. What is the value of 10? 10 is the NUMERATOR 2

3 OF THE VARIANCE FORMULA... and is called the SUM OF SQUARES. Variance = Sum of Squares / n 1 In short, the sum of squares... or sometimes listed as SS... is simply the numerator of the variance. Nothing more than that. What is the SS or sum of squares for the following data? 10, 8, 6, 4, 2 The mean is The deviations around the mean are: The squared deviations are: The SUM of the squared deviations is: Hence, the SS or sum of squares is: If you answered you are correct! Finally, to know the SS... is to know the NUMERATOR of the variance formula. PARTITIONING SUMS OF SQUARES To start off with, I want to show a very simple experiment type of a layout of data and then show what SS quantities are found and how that is done. Then I will focus more on what happens to these quantities as the data set changes in certain ways. What if we had a very simple 3 group experiment where 15 Ss have been assigned at random... n=5 to each of 3 treatments. Here are some possible data. E1 E2 C

4 Normally, we would show basic descriptive stats on our groups: Variable N Mean StDev E E C What if we just look at ALL of the data, what do we notice? What is seen is that there IS variation in the entire data set... a score of 1 was seen (lowest) in E1... and a score of 7 was seen in E2, which is the highest value. We will name this total variation in a bit. Also notice that there are differences in the means OF the 3 groups... 3 in E1, 5 in E2 and 4 in C. We will later call this variation across the groups. If all the means were the same, that value would be 0. But, it s not 0 in this case. Finally, you see that there is variation WITHIN each group...scores vary in E1 for example, from 5 to 1. In E2, they go from 7 to 3... and in C they go from 6 to 2. We will call this variation WITHIN groups. For this example, there is overall variation, there is variation across (or between) the groups and variation within the groups. Let s be a bit more precise. Total Sum of Squares (SS Tot) The first thing we would calculate is the total sum of squares. Remember, we defined SS as simply the numerator of the variance formula so, what we need to do here is to find the sum of the squared deviations around the mean for the TOTAL set of data. Thus, here we will have all 15 scores as a set... we need to find the mean of that overall set... and then get the squared deviations around the mean of that overall set. BTW, that overall mean is sometimes called the GRAND mean. GRAND only means that it s for all the data together. Sco Dev SqDev

5 Mean = 4 Sum of SqDev = 40 All I did was to put all 15 values into ONE column... find the mean (GRAND mean = 4), subtract 4 from all 15 values to get the deviations around the GRAND mean, then square those deviations and add them up (40). By pooling ALL the data together and find the SS for that overall set of data, we have what is called the total sum of squares or... SS Tot for shot. SS Tot = 40 for this data set. SS Tot is the sum of the squared deviations around the GRAND mean of all data points. Thus, we have the first SS quantity that we need. Now we will look at how we partition or subdivide this SS Tot value into 2 other parts... SS across or between the groups (which reflects how the group means differ) and SS within the groups which represents how much WITHIN variation with see inside of the groups. Let s tackle the SS across or between groups next. Here we need to look at the CELL or group means... and compare them to the GRAND mean. Mean E1 = 3 Mean E2 = 5 Mean C = 4 Grand Mean = 4 How much does the mean of E1 differ from the grand mean? Well, 3-4 = -1. The mean of E1 is 1 point BELOW the grand mean. How much does the mean of E2 differ from the grand mean? 4-4 = 0. In fact, the mean of E2 is the same as the grand mean. Finally, how does the mean of C differ from the grand mean? 5-1 = 1. 5

6 So, E1 has -1 deviation from GM E2 has 0 deviation from GM C has 1 deviation from GM All is fine and good BUT... each group mean is standing the place of the 5 values in that group. That is... the mean of 3 in E1 is representing ALL 5 values in E1. It s like we could substitute 3 for all the values in E1... and if that were the case, then we would have 5 values in E1... that all differ by -1 from the grand mean or GM. Now, SS values are square deviation values... so if we thought about all five values in E1 deviating -1 from the GM, we would need to SQUARE all 5 of these -1 values to have the SS term for the mean of the E1 group FROM the GM. Another way to show this is to simply take the deviation of the E1 mean from the GM... square it BUT then weight it by n for that group. SS between E1 mean and GM = (3-4)^2 times n = -1^2 * 5 = 5 But, what about E2 and C? E2 would look like: SS between E2 mean and GM = (5-4)^2 times 5 = 5 SS between C mean and GM = (4-4)^2 times 5 = 0 Thus, the group means deviate from the GM = 10. SS BG (between groups) will be the squared deviations of the group means from the GM... weighted by the ns in each group. SS BG = 10 for this set of data. We saw that the SS Tot = 40. That is, ignoring variations between the groups or within the groups, the overall SS = 40. Now, we see that PART of that SS Tot can be attributed to the fact that the means of the groups vary (from the grand mean) too. 1/4th of the overall SS Tot... is due to group means varying. Thus, so far we have: SS Tot = SS BG +? 40 = 10 +? Now in this case, it should be obvious what that? value will be... it will be 30 BUT we need to see HOW that 30 comes about. What we have not looked at is how much variation there is within each of the groups. Let s look at the E1 group. 6

7 E1 deve1 SQdevE MTB > sum c8 Sum of SQdevE1 = 10 For E1, the sum of the squared deviations around the mean of E1 = 10. SS for the E1 group is 10. What about for group E2? And C? E2 deve2 SQdevE2 C devc SQdevC For E2, if you add up the squared deviations around the mean of E2, you get 10 for the SS within that group. And, for group C, if you add up the squared deviations around the mean of group C, you get 10 again. So, the SS within group E2 = 10 and the SS within group C = 10. The SS WG is the sum of the squared deviations within EACH group, added together across all groups. The SS WG or within groups = = 30. Let s summarize what we have done. SS Tot = SS BG + SS WG 40 = This SS expression is fundamental in a simple experimental data case... the overall SS can be partitioned into TWO parts, one that reflects how much the group means vary and one that reflects how much variation there is within the groups. This breakdown or partitioning is HOW ANOVA is done and ultimately, how the F test is found and interpreted. Now, recall in the beginning, I reviewed the basic formula for a t test on difference in means of 2 groups. We saw that the numerator of the t test reflected how the means in the two groups differed. Hence the upstairs term in that t test was an estimate of the treatment effect or how the group means differed. We hoped 7

8 of course to find that term to be relatively large... compared to the denominator term which reflected how much variation there was WITHIN the two groups and we called that error or experimental error. The goal in the t test was to find a relatively large numerator... between group means term... compared to a denominator term that was small which reflected sampling error. Differences in means represents treatment effects and variations within groups represents sampling error. We have been working on the partitioning of the total sum of squares into a component that reflects variations across the means of the groups and a component that reflects variations within the groups. It seems logical... if the ANOVA and F tests follow the same general logic and patterns as our simple 2 group t test did, that we will have a numerator term for estimates of treatment effects (SS BG) and a denominator term that will estimate sampling error (SS WG). And, this is exactly what we do... within a experimental data set like the one we have been working with here, we FIRST will find the overall SS Tot and then partition that into a BG part and a WG part. This is where there is this algebraically equality exists: SS Tot = SS BG + SS WG. All simple ANOVA problems will start like this obtaining and partitioning the SS quantities. THEN, from that point, there will be come additional steps we will need to take to transform the SS quantities into an F test or F ratio. The F ratio will be used, just like other t test or chi square tests or z tests... at some point these test statistics become sufficiently large that we reject the null we are using in a particular analysis context. But, before we do that, let s have a closer look at some different data patterns and how these would impact on how the partitioning of the SS Tot quantity. What happens when the means get more and more different? What happens when there is more and more within group variation? If you are able to get the feel for what happens to these SS quantities as conditions (between and within the groups) change, you will be more on your way to better understanding how an ANOVA takes place and HOW you interpret the results. Differences in Means Change Recall the original data situation: 3 groups (E1,E2, C) and n=5 within each group. The data were: Row E1 E2 C

9 Variable N Mean StDev E E C Here we saw that the means were 3, 5, and 4. What if we change the mean in one group... say C... to make the differences in means greater? Let s keep the within group variation the same for each group BUT make the mean for C larger by 3 points... that is... change the mean in C by adding 3 points to each value such that now the mean will be 7 rather than 4. Here s the new data. Row E1 E2 newc Variable N Mean StDev E E newc Now we have means that range from 3 to 7... whereas before we have means that ranged from 3 to 5. Therefore, I have changed the data in terms of means of the groups so that there is a larger spread of the mean values... while keeping the within group variation the same. What do you think should happen in terms of the SS quantities? Well, if I make the mean differences greater while keeping the same within group variation, the SS BG term should get relatively larger while keeping the SS WG term the same. Right? NOTE: To simplify the calculations, I will let Minitab do the calculations... what I want YOU to do is to focus on the impacts of what I have done... don t worry about how Minitab produces the values. The simplest way will be for me to let Minitab actually do the ANOVA calculations... from which we can see what the SS quantities are and see how there are impacted by the changes I have made. For the original data: Source DF SS MS F P Factor Error Total Look at the SS column. We had calculated that the SS Tot = 40, which is what is shown above, and that the partitioning into SS BG and SS WG produced 10 for SS BG and 30 for SS WG. This is what 9

10 Minitab shows. For the data where the C group has a mean +3 larger: Source DF SS MS F P Factor Error Total Now, because the overall data have been forced to vary more, our SS Tot is larger = 70 here. But, what is of importance is that when I make the mean of C larger... but kept the within group variation the same... that larger SS Tot is because there are larger differences amongst the group means. In this case, while the SS WG stays the same, the SS BG increases. In a nutshell, if the means vary more, while keeping within group variability the same, the SS BG gets relatively larger compared to the SS WG or error term. THE LARGER THE MEAN DIFFERENCES GET, ASSUMING THAT WITHIN VARIATION IS CONSTANT, THE GREATER WILL BE THE SS BG COMPARED TO THE SS WG. Variations within Groups Change Let s revisit the original data set again. Row E1 E2 C Variable N Mean StDev E E C Here, the mean differences go from 3 to 5. What I want to do is to create a new 3 group data set... where the mean differences are the same as above, but there is more within group variability. What I do is to add a constant of 10 to each data set (this is totally arbitrary)... so here are the results: Row E1a E2a Ca Variable N Mean StDev 10

11 E1a E2a Ca While all the data points move up by 10, that will not impact on the differences in the means. How can I now change the within group variability? What if I do the following: add 4 points to the top values, 2 points to the next values, leave the means or middle values alone... subtract 2 from the next to lowest values... and subtract 4 from the lowest values. That would make the data sets look like: E1a_1 E2a_1 Ca_ Variable N Mean StDev E1a_ E2a_ Ca_ If you look at the descriptive stats, the differences in the means are the same... but the Sds within each group are larger... each now being 3.16 compared to before. The ranges before were 4 points within each group but now they are 8 points. Clearly, I have spread out the data within each treatment group but kept the mean differences as before. So, what should happen to our SS terms? If the mean differences are the same... then our SS BG should remain the same as originally... but if our within group Sds have gotten larger, then the SS WG should increase. That is, if mean differences are kept constant but within group variation increases, the relative size of the SS WG should be larger compared to the SS BG term. Let s see if that happens. Using the Minitab output: Source DF SS MS F P Factor Error Total We see that the SS BG stays at 10, what it was originally, but the SS WG goes from 30 in the first case to 4 times that or 120 in the current case where I deliberately increased within group variability. Consider a set of data... and the SS BG to be A and the SS WG to be B. 11

12 If we hold within group variation constant... but change the size of the mean differences, we will find: 1. If the mean differences get smaller, the SS BG will get smaller relative to the SS WG. 2. If the mean differences get larger, the SS BG will get larger relative to the SS WG. If we hold the mean differences constant... but change the size of the within group variation, we will find: 3. If the within group variation gets smaller, the SS WG will get smaller relative to the SS BG. 4. If the within group variation gets larger, the SS WG will get larger relative to the SS BG. Question: We have talked before about the SS BG being an estimate of treatment effects and the SS WG being an estimate of sampling error. This is like the t test where upstairs is an estimate of mean differences and downstairs is an estimate of sampling error. To increase our chances of rejecting the null (of no population treatment effects), we would like the SS BG to be relatively larger compared to a relatively small SS WG. Right? So, from the options above... 1 to 4... which one(s) is(are) the best situations if we want to improve our chances of rejecting the null? Those who say 2 AND 3... you are the winners. So, there are two general ways to improve your chances for rejecting the null: First, increase your sample sizes... this would lessen sampling error, or Second, if there were some way to increase the size of the mean differences (making the treatments more potent for example), then this would tend to increase the numerator term that we have been looking at. Of course, in most studies, you have little ability to change how large the differences in means will be but you DO have control over what your sample sizes will be in the groups. So, here as well as in general, improving the chance to reject the null is generally more practically accomplished by increasing ns and therefore reducing sample error. GETTING FROM SS VALUES TO AN F TEST The majority of the above discussion has focused on the concept of sum of squares... total, between groups, and within groups. If you are able to look at a data layout and get some feel for 12

13 how large the group means differ and how much variation there is within the groups, you are on your way to thinking about whether you have much of a chance of rejecting the null... which would state that there are NO differences across the population treatments... that is... mu E1 = mu E2 = mu C. At some point in the overall ANOVA analysis... we will have to make a decision about rejecting or retaining the null. Let s see how that will be done. Look at the following example from above: E1a_1 E2a_1 Ca_ Variable N Mean StDev E1a_ E2a_ Ca_ From out Minitab output, we saw that the SS Tot = 130, SS BG = 10, and therefore the SS WG = 120. Since the SS BG is quite small compared to the SS WG or estimate of error, things are not looking to good at the moment. What we do is to construct an ANOVA summary table. It looks like the following: Source df SS MS Fratio p value BG (treat. effects) WG (samp. error) TOTAL Let s work our way thru this table. df = degrees of freedom. For BG, it is # groups = 3-1 = 2 For WG, we have 5-1 within EACH group... so there are 4 dfs within each group... but we have 3 groups of 4 each. The df for WG in our example will be 12. What about df for the total? Well, there are 15 values all together - 1 = 14 df for the total. So, here you will see that just like the SS terms... the df values are partitioned too. Total df = BG df + WG df 14 = Let s put these in the 13

14 table. Source df SS MS Fratio p value BG (treat. effects) 2 WG (samp. error) 12 TOTAL 14 Next in the table are the SS values. From the above calculations, let s put these in the table too. Source df SS MS Fratio p value BG (treat. effects) 2 10 WG (samp. error) TOTAL df values are like ns... if we think about # of groups, the n - 1 value would be # groups - 1 or 2. For WG, the n for each group is 5, so 5-1 is 4 dfs within EACH group so across 3 groups we have 4*3 = 12 dfs. What about overall? Well, N - 1 = 14 if we were thinking of the overall set of data and deviations around the grand mean (GM). Then we have the SS terms and keep in mind that the SS terms are LIKE the numerator parts of variance formulas. The next column in the table is the MS which equals: Mean Square (or what is the MEAN of the SQUARED deviations... or MS/df or nlike value = MS. So, the MS BG = 10/2 = 5... and the MS WG = 120/12 = 10. NOTE: we don t really do the Total MS value. So, let s fill in the table with what we have so far. Source df SS MS Fratio p value BG (treat. effects) WG (samp. error) TOTAL Now, if we have a MS term that represents a numerator part of a variance formula AND a denominator that is like an n value. So, if we divided a numerator SS value by a denominator that is like n... don t we have a VARIANCE value? Yes... we do. So, the MS 14

15 terms ARE like variances... in fact, they ARE variance estimates. In the case of the MS BG, this is an estimate of the variance of the treatment group means... whereas the MS WG is an estimate of the within group or ERROR variance. This is sounding a lot like the t test we first reviewed where the upstairs term reflects treatment mean differences or variability and the downstairs term reflects out estimate of sampling error. Now, as Michael Buffer (famous boxing ring announcer) would say: LET S GET READY TO RUMBLE... AND CALCULATE THAT F STATISTIC! The F ratio is, er, a ratio of TWO variance estimates: Estimate of Treatment Effect Variance F ratio = Estimate of Sampling Error Variance So, the F ratio is = MS BG / MS WG = 5/10 =.5 Now, unlike t tests that have one df value, F tests (used in ANOVA) have a df value associated with the numerator and a df value associated with the denominator. So, for the ANOVA summary table above, there are 2 dfs for the upstairs term and 12 dfs for the downstairs term. What we say in this case is: We have is an F ratio of.5, where there are 2 and 12 degrees of freedom. So, what does an F distribution with 2 and 12 degrees of freedom look like? Here is what it looks like from Minitab. 15

16 As you can see, the F distribution (this is just one of a huge family) is rather radically + skewed. The question is... how do we interpret the F value that we got... which was.5? Where does it fit within this F distribution? Well,.5 is way over to the LEFT side... close to 0. Since the F ratio is a ratio of 2 variances and variances cannot be negative, the F ratio is always a + value. It could approach a value of 0... the top variance estimate (differences in means) was very very small while the bottom variance (sampling error) was very very large. What we have for out data is a small numerator and a larger denominator of which the F ration is LESS than 1. Now, here s a basic principle when dealing with F ratios in the context of ANOVA. The null hypothesis in this case is that the population means for the E1, E2, and C treatments are all the same. Hence, mue1 = mue2 = muc. What would this F ratio look like IF the null were really true? See below. The expected value for the F ratio when the null is true is 1. Any F ratios less than 1 we can ignore in terms of a hypothesis test in this simple ANOVA case. F ratios that are really small suggest that the differences in the means, under the null model, do not even vary as much from study to study as you might expect by chance alone. Remember, even if the null is true, we will see variations in the mean values from study to study... that s sampling error. So, what we are really looking for in this case... would be an F ratio that is LARGER than 1 (which we don t have of course). In a way, this is like a chi square problem... we look for large chi square values before we are willing to reject the null. Same thing applies here... to reject the null, we will look for a relatively large F ratio... one that is way out along the X scale... to the right. If we were thinking of using an alpha level of what we could do is to find the value along the baseline to the right... where 5% or less of the F ratio values would fall. If we think about starting at the th extreme LEFT... what we need is the 95 percentile rank in this th distribution. We need the 95 percentile rank in an F distribution with 2 and 12 degrees of freedom. Using Minitab, here is that value. F distribution with 2 DF in numerator and 12 DF in denominator P( X <= x ) x To reject the null in this case, we need an F ratio = or > than Clearly, we are NO where near that value for out data so 16

17 we retain the null. If these were real experimental data, we would say that there is not sufficient evidence to reject the null in favor of an alternative that says that not all treatment means are the same. Perhaps we don t like this result but... then are the breaks! Let s look at one additional example. What if we were interested in the differences, if any, amongst 3 different instructional methods for teaching a unit in Apstat. We randomly assign 6 Ss each to each of the 3 treatments. Our data look as follows: Row meth1 meth2 meth Variable N Mean StDev meth meth meth Let s look at the summary data for a moment. We see first, the means from the treatment groups are different. Hence, we know that when we find the SS BG, it will NOT be 0. We also see that there is variation within each treatment group... so we know that the SS WG value will NOT be 0. What we need to do is to partition the SS Tot... into the component parts of SS BG and SS WG. While Minitab will do all of this for us... what would the SS Tot represent? Well, if we found the mean of all 18 values... that would be called the grand mean or GM. If you add up all 18 values and divide by you will obtain: So, we would subtract from each and every one of the 18 values... square that deviation... and then sum up those 18 squared deviation values. To obtain the SS BG, we would need to subtract the GM or grand mean from EACH of the treatment group means... square that deviation... and then multiply by 6 since EACH treatment mean is standing in for and representing 6 values in that treatment group. Finally, to find the SS WG, we would need to go group by group... and subtract each group mean from all the values IN that group... square the deviations... and add. We do this for each of the 3 treatment groups. Here is what Minitab provides us. Source DF SS MS F P Factor Error Total So, we see that: 17

18 SS Tot = SS BG + SS WG = Since there are 3 groups - 1, there are 2 dfs BG. Since there are 6 Ss in each group, we have 6-1 = 5 dfs within EACH group but since there are 3 groups, we have 15 dfs WG. The df Total would be 18-1 = 17. Again, df values are additive too: Tot df = df BG + df WG 17 = Dividing the SS by the appropriate df value produces the MS BG and MS WG terms. Then, to find the F ratio, we divide the MS BG by the MS WG... = /9.21 = F ratio in this study is How will we evaluate it? For this experiment, we will use an F ratio with 2 and 15 degrees of freedom. Like the first example, here is what that theoretical F distribution looks like. 18

19 As you can see, the F distribution with 2 and 15 dfs looks about the same as the one with 2 and 12 dfs. In general, F distributions are radically + skewed. What tends to differ is how far out along the X baseline that + skewed distribution will extend OR what the range of values is between the low and high ends. This F test is a one tail test... we need to isolate the upper 5% if we are using an alpha of.05. The critical value in this case will be: P( X <= x ) x Notice that the critical value here is about the same as it was for the other problem. Since our F ratio is much larger than the CV of we REJECT the null hypothesis of equal treatment means in favor of the alternative that says not all means are the same. But, PLEASE NOTE: Rejecting the null does NOT mean that all treatment means are different. By rejecting the null however, we can be sure that at least ONE is different than the others... while the others could be equal. In order to detect where the specific differences are, we do what are called follow up tests... which is beyond the scope of this handout. But, be assured that software like Minitab and other good packages do these easily. QUICK SUMMARY 1. I first reviewed the notion of a two sample t test and tried to get across the principle that the t value is a ratio of our estimate of mean differences compared to our estimate of sampling error. In one sense, the numerator is the good part whereas the denominator is the bad part. 2. We next looked that the notion of sum of squares. SS is nothing more than the numerator of a formula for the variance. SS have a lower limit of 0 (when all values are the same) to the sky is the limit depending on the actual data. 3. I tried to develop the idea that within a set of experimental data (like 3 groups of 5 Ss each), the SS Tot = SS BG + SS WG. That is, we can look at the overall sum of squares using the entire set of data... and see that we can subdivide or partition that into 2 parts: one part SS BG reflects how the treatment means vary and the other part SS 19

20 WG reflects how much sampling error there is (within group variation). 4. As the means get more and more divergent, the SS BG increases relative to the SS WG. As the variation within the groups gets larger and larger, the SS WG increases relative to the SS BG. 5. SS BG is a numerator concept and SS WG is a denominator concept. While there is not much we can do about the SS BG in terms of impacting or controlling it, we CAN increase n in the downstairs term to impact and control (within limits of resources) sampling error. 6. ANOVA partitions data into the components of BG and WG variation. The summary table turns the SS BG and SS WG terms into variance estimates. 7. The F ratio is a ratio of 2 variance estimates: one variance that estimates the treatment mean differences and the other variance estimates the amount of sampling error. 8. In an experimental data situation, our expectation is that the F ratio will be 1 IF the null of no treatment effects is true. 9. We use an F distribution with df BG and df WG to find a critical value needed to reject the null. The F test is a one tail test (for ANOVA purposes). If our F ratio is = or > than our right end critical value, we reject the null. If it is smaller, we retain. 10. Rejecting the null says that at least ONE of the treatment means is different than the others. They all might be different but not necessarily. SOME FINISHING EXERCISES 1. What do the numerator and denominator of a 2 sample t test tell us? 2. What part of a variance formula is the sum of squares term? 3. What are the min and max possible values for a SS term? 4. How do we get the SS Total for a set of data... say 4 groups and 10 Ss in each group? 5. For #4, how would we obtain the SS BG and SS WG terms? 6. What term is impacted if we change the variation within groups? 20

21 7. What term is impacted if we change how large/small the differences in the group means are? 8. How do we get the degrees of freedom for the BG and WG terms? 9. How are the MS BG and MS WG terms found? 10. How do we get the F ratio? 11. If the F test one tailed or two? 12. What does rejecting the null mean for the F test? Comments or questions about the handout above can be directed to: Dennis Roberts dmr@psu.edu 21

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

This chapter discusses some of the basic concepts in inferential statistics.

This chapter discusses some of the basic concepts in inferential statistics. Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test. HYPOTHESIS TESTING Learning Objectives Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test. Know how to perform a hypothesis test

More information

1.6 The Order of Operations

1.6 The Order of Operations 1.6 The Order of Operations Contents: Operations Grouping Symbols The Order of Operations Exponents and Negative Numbers Negative Square Roots Square Root of a Negative Number Order of Operations and Negative

More information

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails. Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Randomized Block Analysis of Variance

Randomized Block Analysis of Variance Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Experimental Designs (revisited)

Experimental Designs (revisited) Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

CHAPTER 13. Experimental Design and Analysis of Variance

CHAPTER 13. Experimental Design and Analysis of Variance CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

INTRODUCTION TO MULTIPLE CORRELATION

INTRODUCTION TO MULTIPLE CORRELATION CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. H0: The variables are independent

More information

Two Related Samples t Test

Two Related Samples t Test Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript DDBA 8438: The t Test for Independent Samples Video Podcast Transcript JENNIFER ANN MORROW: Welcome to The t Test for Independent Samples. My name is Dr. Jennifer Ann Morrow. In today's demonstration,

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

5.5. Solving linear systems by the elimination method

5.5. Solving linear systems by the elimination method 55 Solving linear systems by the elimination method Equivalent systems The major technique of solving systems of equations is changing the original problem into another one which is of an easier to solve

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

Chapter 14: Repeated Measures Analysis of Variance (ANOVA) Chapter 14: Repeated Measures Analysis of Variance (ANOVA) First of all, you need to recognize the difference between a repeated measures (or dependent groups) design and the between groups (or independent

More information

The Kruskal-Wallis test:

The Kruskal-Wallis test: Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Lesson 9 Hypothesis Testing

Lesson 9 Hypothesis Testing Lesson 9 Hypothesis Testing Outline Logic for Hypothesis Testing Critical Value Alpha (α) -level.05 -level.01 One-Tail versus Two-Tail Tests -critical values for both alpha levels Logic for Hypothesis

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial

Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial Excel file for use with this tutorial Tutor1Data.xlsx File Location http://faculty.ung.edu/kmelton/data/tutor1data.xlsx Introduction:

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

3. What is the difference between variance and standard deviation? 5. If I add 2 to all my observations, how variance and mean will vary?

3. What is the difference between variance and standard deviation? 5. If I add 2 to all my observations, how variance and mean will vary? Variance, Standard deviation Exercises: 1. What does variance measure? 2. How do we compute a variance? 3. What is the difference between variance and standard deviation? 4. What is the meaning of the

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay Entering and Formatting Data Open Excel. Set up the spreadsheet page (Sheet 1) so that anyone who reads it will understand the page (Figure

More information

2.6 Exponents and Order of Operations

2.6 Exponents and Order of Operations 2.6 Exponents and Order of Operations We begin this section with exponents applied to negative numbers. The idea of applying an exponent to a negative number is identical to that of a positive number (repeated

More information

Zeros of a Polynomial Function

Zeros of a Polynomial Function Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information