Interpretation of Computer Analysis Output for Fundamental Statistical Tests Volume One T-test P.Y. Cheng

Transcription

1 Interpretation of Computer Analysis Output for Fundamental Statistical Tests Volume One T-test P.Y. Cheng

2 Preface When I firstly came to the department in 1985, PC computers were still not common at all. There was only an Apple computer in the office for secretary work. However, we could run SPSS using our university s mainframe computers, which were connected to our department using terminals and RS232 interface! At that time, we had a very good neighbour with us on the same floor, it was the Department of Community Medicine. Their staff ran statistical programs heavily and were very good consultants of us for asking about problems on statistics! In the following decades, I have been trying to improve my knowledge on Statistics, with nearly always Statistics books carried with me, and taking annual leave for self-studying every summer! We Chinese have a common believe that hard working can compensate for stupidity and I ready want to compensate my stupidity with decades of hard working! Several years ago, I have a chance to start writing some books to summarize my painful experience on statistical problems in these decades, trying to help others to get a quicker way while handling similar problems! Volume 1, 2 and 3 of Interpretation of Computer Analysis Output for Fundamental Statistics Tests are the newest Books I have written so far, and the characteristics of these new books are: 1) Comparing with the several books published previously, I have talked more on basic theories underlying the statistical tests! I have tried to use many brilliant and convincing pictures and graphs to explain the important theories and first principles, avoiding dull or complicate approaches! Even if you cannot digest and understand these theories immediately, you can still use this book as a Cook Book for fitting your problems into the nearest examples in it to solve them! 2) In our environment with mostly animal experiments, there are usually only a few animals in each groups for comparison! I have often heard about the worrying about the sample size being too small that whether the T Test, Anova or even Regression can be correctly used! In this book, we ll discuss on this issue and trying to clarify whether the statistical tests should still be used even when the sample size is rather small e.g. N = 10 or N = 5, or even fewer! 3) We ll talk about non-parametric statistical tests corresponding to their parametric ones, when the assumptions for parametric tests can t be hold, or when not knowing the distributions at all! 4) We have, for every test run on SPSS, introduced an alternative way of getting the same results! This might be a manual method, a free Excel Add-In e.g. PHStat4, or even some web-tools! So that readers can still work on solving problems without being stopped by the absent of expensive software e.g. SPSS!

3 Acknowledgments Firstly of all, I would like to thanks Prof. C.M. Wong and Dr L.M. Ho who give me the braveness to start writing. They have been consultants for many staff of the University of Hong Kong and are always helpful for any HKU staff approaching them for asking statistical problems! I would also like to thank everybody who has contributed for the publishing of this book! This include the publishing company (which is still no known yet), the authors of reference books or internet material that has helped me a lot during the writing and checking process! I would like to thank my friends and relatives who have been encouraging me during the publishing of my books, especially my son (Andy) and my wife (Betty) who have shown their patient and understanding when I concentrate on the production of this new book! Lastly, I would like to thank, in advance, any future audience of this book and hope they can find some useful materials for helping them to solve statistical problems. Also hope they can enjoy reading this book in full color, with hundreds of brilliant pictures and many cook book examples! Medical Faculty, The University of Hong Kong Cheng Ping Yuen (Senior Technician) Bachelor of Life Science (BSc), Napier University, UK Master of Public Health (MPH), Hong Kong University Certificate of Hong Kong Statistics Society (HKSS) Fellow of British Royal Statistician Society (RSS) Microsoft Certified Professional (MCP) Hong Kong Registered Medical Technologist (Class I) Phone : (852) / hrmacpy@hku.hk

4 Content 1.1 Some basic concepts 1.1.a The Normal Distribution...P b Distribution of Sample means.....p c The Standard Normal Distribution and T-Distribution...P d Area (Probability) under Z-Distribution and T-Distribution......P e Testing of Hypothesis...P f The worrying about sample size N being too small for Statistical Tests...P Computer Analysis of t-distribution 1.2.a One Sample T-Test P b Two Samples T-Test..P c Paired T-Test..P Sample size N and Power for T-Test 1.3.a Using hand calculator...p b Sample size N in estimating a population averageµ, by PHStat4..P c Calculation of Sample Size and Power using G Power i) Calculation of Sample Size using free software G Power..P.53 ii) Calculation of Power using free software G Power P 58

5 1.4 Testing Hypothesis for (Proportions) instead of Means 1.4.a Approximation of a binomial distribution by a normal distribution.. P b Using PHStat4 to solve the problem in 14.a.....P c Using PHStat4 to solve problems with two proportions....p Non-parametric tests corresponding to various T-Tests 1.5.a Wilcoxon signed rank sum test (correspondent to 1 sample t-test). P b Nonparametric tests for 2 samples ( corresp. To 2 independent samples t-test) i) Mann-Whitney Test for 2 independent samples in SPSS......P. 81 ii) Wilcoxon Rank Sum Test for 2 independent samples in PHStat4....P c Wilcoxon match-paired signed-rank test for paired samples (correspondent to paired t-test)......p d Calculation of Sample Size and Power of Wilcoxon Tests Using G Power....P. 96

6 Interpretation of Computer Analysis Output for Fundamental Statistical Tests Volume One T-test P.Y. Cheng 1

7 1.1Some basic concepts 1.1.a The Normal Distribution The normal distribution is the most important distribution in Statistics, not only because so many natural phenomena (e.g. weight, height, class mark, IQ score ) follow this distribution, but also because the possibility of making use of it for solving many other statistical problems! The function for possibility density of a Normal distribution is:- The curve is, thus, determined by two parameters: 1. the population mean µ, and 2. the population standard deviation σ (or σ 2, the variance) 2

8 1.1.b Distribution of Sample means If we can always measure each individual of a population e.g. height of all children born in 1995 in UK, then we might not need to run statistical tests to get conclusion from them! However, it is usually impossible or, too costly, to make such a measurement! We usually take a sample from the population, run a statistical test with the sample data, and based on distribution and probability theories to get a conclusion of whether accepting a hypothesis or not! This is also called the interpretation of the population using a sample. The following is a sample of height from the population (e.g. children born in UK in 1995):- Image we can measure infinitive number of sample means (although we usually would not, practically, do so) and plot the frequent histogram, we can get a distribution of sample means: 3 Histogram of sample average against frequency

9 Formation of a distribution of sampel means (of size N): (*Please don t mix up with z or t distributions discussed later) Peaks lower than population curve N < total population Peak height increases due to combination of the sample curves above. It is important to learn that, the larger the sample size, the smaller would be the deviation of the distribution curve of the sample means (standard error of the mean), σm = σp/ N: 4

10 Due to the property above, it would always be better to use larger sample size instead of using smaller one! Using larger sample size, we could reduce the possible deviation of the measured mean from the real mean of the population! In the graph above, we assume there would be unlimited number of sample means for plotting the distribution, and the sampling error approach zero! But what would happen if that are limited amount of sample means only? Sampling error when Limited Number of sample means are used (The sampling distribution might be positively skew, negatively skew or not skew, depending on chance. But when the number of sample increase, the skewness would decrease and the distribution would approach normal!) For example, for the following population of 4,000 students, using 20 samples (N=10) for plotting the distribution curve might reduce possible error than just plotting 200 data of 200 students (N=1). The mean from the 20 samples would be closer to the population mean of 4,000 students than just getting the individual data of the 200 students only! Distribution of limited number of sample means of different size N 5

11 1.1.c The Standard Normal Distribution and T-Distribution The standard normal distribution and the t-distribution are extremely important, due to the following three conditions:- 1) If the population is normal and the variance is known : Then the random variable is exactly standard normal (Mean = 0, S.D. = 1), no matter how small the sample size is. 2) If the population is normal and the variance is unknown. The random variable has exactly a t-distribution (Mean = 0, S.D. approaches 1 as n increased) with n-1 degrees of freedom, no matter how small the sample size is. 3) The population is not normal and the variance may or may not be known. The random variable or the random variable (the one used depending on whether the variance is known or unknown) is approximately standard normal if the sample size is sufficiently large (at least thirty). Where : x is the mean of the sample µ is the mean of the population σ is the known standard deviation of population n is the sample size s is the standard deviation calculated from sample The calculation of the random variable and the random variable is also called the standardization of the original distributions. Without such a possibility of standardizing of any normal/non-normal distributions to create a standard normal distribution, it would be impossible (or very difficult) to make use of these most important distributions in Statistics to test for various hypothesizes! 6

12 1) If the population is normal and the variance is known : Then the random variable is exactly standard normal (Mean = 0, S.D. = 1), no matter how small the sample size is. All the four distributions above are normal distributions, but only the GREEN one is a Standard Normal One with µ = 0 and σ 2 = 1 (σ = 1)! 7

13 2) If the population is normal and the variance is unknown. The random variable has exactly a t-distribution (Mean = 0, S.D. approaches 1 as n increased) with n-1 degrees of freedom, no matter how small the sample size is. s, the sample variance, is used instead of a population variance! Please notice that when the underlying population is normal, we can apply t-distributions for statistical tests no matter how small (degree of freedom) is!! A t-distribution is similar with the z-distribution in that they are both bi-symmetric and of dell shape, but the central peak is lower and the two tails are higher! As df (N-1) increase, it would become more and more like a z-distribution! At df 120, we might say that there is almost no difference at all! ** Please don t mix up with 3 rd condition below, in which the underlying population is NOT NORMAL, and we can only apply approximation of Z distribution when N is at least 30 or above, without concerning any t-distributions!! 8

14 3) The population is not normal and the variance may or may not be known. The random variable or the random variable (the one used depending on whether the variance is known or unknown) is approximately standard normal if the sample size is sufficiently large (at least thirty). The is also call the Central Limit Theorem! Please notice that the sample size N must be equal or greater than 30 for applying this Central Limit Theorem, with that we can apply approximation of z distribution for running statistical tests! A graph like the one below might be confusing if it is not stated clearly that whether the underlying population is normal or not: If it is normal, then t-distribution curves can be applied no matter how small N is! Some books suggest approximation of z distribution when N >=30, but I myself don t feel this is necessary! (Why not just use the t-distribution with N-1 degree of freedom?) If it is not normal, apply z distribution approximation when N >=30, and just don t use any t- distributions! 9

15 1.1.d Area (Probability) under Z-Distribution and T-Distribution By knowing about the percentage of area under the z-distribution and the t- distribution (total area = 1), we know the probability of getting the z-value or t- value calculated from or in the 3 conditions discussed above! Area under Z-Distribution(Standard Normal Distribution) The area in between the central axis (Mean = 0) and 1 standard deviation = of total area, which is equal to 1. Thus imply that the probability for z to be between zero and 1 σ = !! Under the Normal Standard Curve, σ = 1, so 1 σ = 1, 2 σ = 2, 3 σ = 3 on the x-axis above!! 10

16 Area under T-Distribution of different df As t increase, the area in right tail decrease 11

17 1.1.e Testing of Hypothesis (Significance) With the z distribution (Standard Normal Distribution) and the t distribution, under which the area under them being well known (either from tables or by using computer), we can then carry out testing of hypothesis-using samples to interpret the underlying population! A probability of 0.05 is usually used as the critical probability for testing of hypothesis!! Z Distribution:- Area = 0.25% Area = 0.25% If the mean and the variance of a population are known, then we can run a normal test with a sample using the z distribution (standard normal distribution). For example, an education department want to know whether the average mark of students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5). A random sample of 25 students is taken, marks measured with mean = 83 Null hypothesis H0: Mean of this year = Mean of pass years Alternative hypothesis: Ha: Mean of this year =\= Mean of pass z = = / 5/ 25 = 3/1 = 3 z = 3 >> 1.96 The probability of getting z > 1.96 or < by change only (sampling error) = Thus the probability of getting such a high z value of 3 by chance only is << 0.05!! The sample mean is significantly different from the population mean used for z test! So we reject the Null Hypothesis that the average mark being same as pass years! 12

18 T Distribution:- df = Df = 24 Df = Suppose the S.D. of the underlying population of student marks in the above section is unknown, and the sample variance calculated from sample data being 6 instead of being 5, then:- t = = / 6 25 = 3/1.2 = 2.5 t = 2.5 >> (from table : df = 24, 5% probability of same population mean) The probability of getting t > or < by change (sampling error) = 0.05 Thus the probability of getting such a high t value of 2.5 by chance only is << 0.05!! The sample mean is significantly different from the population mean used for t test! So we reject the Null Hypothesis that the average mark being same as last year! (Please remember that, to make use of the t-distribution for the calculation of probability above, we assume that the underlying population being normal, and the t-distribution curves are useful no matter how smaller the sample size being used!) 13

19 One-Tailed Test and Two-Tailed Test Not significate Significate One-tailed Test critical value for 5% chance Two-tailed Test critical value for 5% chance In (a), we test for whether the sample mean is greater than or smaller than the population mean, such that the probability of getting a z value is totally 0.05 in probability, i.e on each side! The probability on each side is only! In (b), we just test for whether the sample mean is greater than the population mean, such that the probability of getting a z value is 0.05 in probability i.e on the right hand extreme only! This also implies that the rejection of the Null Hypothesis (that there is no real difference in means) is easier to achieve with a double fold of chance!! One tailed significance = two tailed significance divided by 2! As for the case in the graph above, the test for the difference between sample mean and the population mean is not significant in a two-tailed test (z=1.8 <.196), but being significant in the one-tailed test (z=1.8>1.651)!! 14

20 Type I Error and Type II Error We might say, by default, type I error would be the error we try to avoid Firstly! This is the error to say that there is difference between two groups while there is, in fact, no! (Suppose having difference is a crime, type I error is the error to sentence a person to have committed the crime, while he, in fact, has not!) If we accept the null hypothesis the there is no difference, then we would not have the risk of committing the type I error. However, we would then immediately be under the risk of committing the type II error i.e. saying that there is no difference while there is, in fact, yes! (Saying that a person has not committed a crime while he, in fact, has committed) 1 - β Range of t-values that would make us commit the Type II error i.e. saying there is no difference while there are, in fact, two curves existing! Critical t-value for rejecting the null hypothesis that there is no real difference between the two groups (only one curve)! In the graph above, using α/2 as the critical point, we would not reject the null hypothesis that there is no real difference between the 2 populations, while t is less than 2! We accept the null hypo-thesis since we don t want to committee the Type I error (sentencing a person that he has committed a crime while he is ignore)! We think there is only ONE curve (Red) existing!! However, if there is, in fact, real difference between the two populations (two curves existing), then we would have committed the Type II error (let the accused person go while he has committed the crime) already!! The range of t values that would make us commit such a mistake is shown in the graph above! The probability that we would commit such an error is the area represented by β! This probability would depend on how different you would say that there is a real different. β would be important for calculation of Power (1-β) and sample size N. We would take about calculation of Sample Size and Power in the later sections! 15

21 1.1.f The worrying about sample size N being too small for Statistical Tests In our laboratory experiment environment, we often hear about the worrying of sample size N in different groups being too small for running common statistical tests e.g. T Test! There are often only a few animals e.g. 5 to 10 in one group for being compared with other groups. The graph of such a few points might hardly show a normally distributed histogram! Let s see the condition of N = 5 below. As stated previously, 1) If the population is normal and the variance is known z = = normal (0,1) No matter how small sample size N is 2) If the population is normal and the variance is unknown t = = normal (0,1) No matter how small sample size N is (Since N = 5 only, we wouldn t consider the 3 rd condition - population being not normal and N >=30!) As you might see, if we assume the underling population is normal for 1 and 2 above, we could finally arrived at a standard normal distribution, or a t-distribution, no matter how small N is!! Then what is the rule of N in these conditions? For both and, the smaller the N, the lower would be the value of z or t, and the lower change of getting value > the critical z or critical t value for obtaining a significant result for rejecting the null hypothesis, and vice versa! 16

22 Previous examples with smaller N:- For example, an education department want to know whether the average mark of students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5). A random sample of 25 students is taken, marks measured with mean = 83 Null hypothesis H0: Mean of this year = Mean of pass years Alternative hypothesis: Ha: Mean of this year =\= Mean of pass z = = / 5/ 25 = 3/1 = 3 If only 5 students is taken instead of 25, then z = 3 >> 1.96 (reject null hypothesis) z = = / 5/ 5 = 3/ = z = << 1.96 (accept null hypothesis) Suppose the S.D. of the underlying population of student marks in the above section is unknown, and the sample variance calculated from sample data being 6 instead of being 5, then:- t = = / 6 25 = 3/1.2 t = 2.5 >> (from table : df = 24, α=0.05, reject null hypothesis) If only 5 students is taken instead of 25, then t = = / 6 5 = 3/ = t = << (from table : df = 24, 5% accept null hypothesis) 17

23 Previous examples with smaller N and larger group differences:- For example, an education department want to know whether the average mark of students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5). A random sample of 25 students is taken, marks measured with mean = 83 Null hypothesis H0: Mean of this year = Mean of pass years Alternative hypothesis: Ha: Mean of this year =\= Mean of pass z = = / 5/ 25 = 3/1 = 3 z = 3 >> 1.96 (reject null hypothesis) If also only 5 students is taken instead of 25, but sample mean = 93 this time: z = = / 5/ 5 = 13/ = z = >> 1.96 (reject null hypothesis) Suppose the S.D. of the underlying population of student marks in the above section is unknown, and the sample variance calculated from sample data being 6 instead of being 5, then:- t = = / 6 25 = 3/1.2 t = 2.5 >> (from table : df = 24, reject dull hypothesis) If only 5 students is taken instead of 25, but sample mean = 93 this time: t = = / 6 5 = 13/ = t = >> (from table : df = 24, reject null hypothesis) 18

24 Single subject test So what would happen if only one subject can be measured for comparison? For example, an education department want to know whether the average mark of students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5). A random sample of 25 students is taken, marks measured with mean = 83 Null hypothesis H0: Mean of this year = Mean of pass years Alternative hypothesis: Ha: Mean of this year =\= Mean of pass z = = / 5/ 25 = 3/1 = 3 z = 3 >> 1.96 (reject null hypothesis) If only 1 students is taken instead of 25, and his mark is 83: z = = / 5/ 1 = 3/5 = 0.6 z = 0.6 << 1.96 (accept null hypothesis) If also only 1 students is taken instead of 25, but his mark = 93 this time: z = = / 5/ 1 = 13/5 = 2.6 z = 2.6 >> 1.96 (reject null hypothesis) However, if the population standard deviation s is not known, then t = and would not be able to be calculated for a single subject case! 19

25 So, should we worry about having sample size N being too small e.g. being 5 only? The answer is, a bit, depending on different conditions! We might try to summary this issue as: 1) If the sample size decrease, and the group difference is nearly the same, then it would be more difficult to get a z or t value larger than the critical value for having a significant result! 2) However, if the group difference is large enough, the test would still be significant even N is not quite large, it would be quite often to have less than 10 animals in a group for comparison! 3) In our experimental environment, it would be quite often that increasing sample size means increasing the experiment cost rapidly, so it would be a wastage if a smaller N is already OK! 4) Even if sample size is only three, two, or just one, instead of five, the Rule of the game would still hold: Passing the test or not is our own business, nothing wrong with the tests themselves! The distribution and probability theories underlying would still be valid! 5) All above is talking on the risk of committing type I error, we worry about that whether N can be large enough for avoid saying that there is difference while there is, in fact, no! However, N is also import for avoiding type II error, the saying that there is no difference while there is, in fact, yes! The calculation of sample size N for getting enough of Power for detecting real difference would be discussed in later sections! 6) Please don t confuse the curve of distribution of sample means with sample size N (A) with z or t distributions (B,C) after standardization by and, although there are closely related! A 20 B C

26 1.2 Computer Analysis of t-distribution 1.2.a One sample T -Test The vendor of a new medicine claimed that it can yield a depression score below 70 after applying for 2 weeks to the patients! A sample of 25 patients has been chosen to take the new medicine and the depression score is taken after two weeks. (The underlying population is an unlimited average of samples of 25 patients)! 1) The result scores, in SPSS, are:- 21

27 2) Analysis, Compare Means, One Sample T Test 3) You would see, move Dep_Score to Test Variable(s), input 70 as Test Value 4) Click OK 22

28 4) Results for One Sample T Test in SPSS :- 4a) One-Sample Statistics Results: Sample Size N Sample Mean x Sample 23 Std. Dev. σ σ m = σ/ N = 4.748/ 25

29 4b) One Sample Test Results: Value to be compared with t value calculated by degree of freedom = N-1 Probability of t > or < by chance < 0.05, reject H 0 that the means are equal! Difference between sample mean (66.36) and hypothesis mean (70) = = % confidence for the mean to fall between these 2 limit % Conclusion: The 25 patients taking the new medicine have a depression score of mean and SD A t value of is obtained, which is significant even for a 2 tailed test! We can reject the Null Hypothesis H0 that the sample mean is same as the comparing value of 70! The vendor might be right that their new medicine can cure patients to obtain a depression score different from 70! 24

30 One-Tailed Test for example above: The computer output above is a 2 tailed test output! For a 2 tailed test: H0 : The population mean =70 Ha: The population mean =\= 70 In a 1 tailed test: H0 : The population mean >=70 Ha: The population mean < 70 We just want to decide whether the population mean of depression score would be less than 70, without considering that whether it would be greater than 70 in average:- NOTHING NEED TO BE CHANGED FOR THE RUNNING OF THE 2 TAILED TEST ABOVE! What you need to do is to know above how to interpret the same computer output. For rejecting the Null Hypothesis:- Step 1: t must be negative in this case that the negative tail is being tested for, and must be positive if the positive tail is being tested for! Step 2: divide the significance by 2, as the 1 tailed test would produce a probability 2 times less that a 2 tailed test for obtaining critical t values! 25

31 One Sample T-Test using Excel (with PHstat4 Add-In) (For installation of PHStat4 Add-In, please refer to Appendix: Installation of Free Software) 1) PHStat, One-Sample Tests, t-test for the Mean, sigma unknown 26

32 2) Input information for running a 2 tailed test: 3) Results nearly same as when using SPSS above, differences might due to rounding off issue: t = when using SPSS above! 2 times p-value in 1 tailed test 27

33 4) Input information for running a 1 tailed test: 5) Results nearly same as when using SPSS above, differences might due to rounding off issue: t = using when using SPSS above! shifting of critical value towards central axis! 0.5 times p-value in 2 tailed test 28

34 1.2.b Two Samples T Test This is also called independent t test, meaning that the two samples would not affect each other in measuring of the data values. Two t-distributed populations are tested by using two samples from each of them. The basic question is: How different are the two means below different such that the chance of getting the critical t-value would be less than 5% (one tailed) or 2.5 % (two-tailed)? Mean of Treatment Group Mean of Control Group For example, 25 patients are chosen for taking a traditional medicine for treating depression (Group 1- control), and another 25 patients are also chosen for taking the new medicine (Group 2)! The Depression Score is taken after 2 weeks and being input to SPSS as:- 29

35 1) Analysis, Compare Means, Independent Samples T Test 2) Move Dep_Score into Test Variable(s) and Group into Grouping Variables : 30

36 3) Click Define Groups 4) Input the values 1 and 2 form definition of Groups 5) Click OK 31

37 6) SPSS output: 7a) Groups Statistics 32 Group 2 has a lower mean and Std. Dev. than Group 1

38 7b) T Test results A B C Part A - Test for assumption of equal variance and t value obtained Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Dep_Score Equal variances assumed Equal variances not assumed An analysis of variance test has been run for testing the hypothesis that the variance of the two groups are equal or not! The larger the value of F, the higher the chance that their variance being different! Sign. = < 0.05, meaning that the variance of the two groups being significantly different! This imply that equal variance could not be assumed and we should use the equal variance not assumed data e.g. t, df instead! t value and df calculated under the assumption that the variance of the 2 groups being different! Separated variance instead of pooled variance is used! 33

39 Part B Significance, Mean Difference, and Std Error Difference: t-test for Equality of Means Sig. (2-tailed) Mean Difference Std. Error Difference Dep_Score Equal variances assumed Equal variances not assumed tailed probability = > 0.05, we cannot reject the Null Hypothesis that the two population means are equal! Sample Mean of Group 1 Sample Mean of Group 2 = Part C) 95% Confidence interval of the Difference t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper Dep_Score Equal variances assumed Equal variances not assumed We have 95% confidence to say that the difference between the two groups (Mean of Group One Mean of Group Two) would fall between the value and 5.039! 34

40 One-Tailed Test: As stated previously, there is no need to make any changes for running the test! Just make sure that whether a positive tail or a negative tail you are testing, and see whether you can get a significate test result after a double fold increase of chance! For example, if you just want to test whether the new medicine can produce a lower depression score in Group 2, this also means the whether Group 1 can produce a higher score that Group 2! Then we can testing the positive tail Mean of Group 1 Mean of Group 2, and follow step 1 and step 2 below: Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Dep_Score Equal variances assumed Equal variances not assumed Step 1 : t-value being positive instead of being negative Sig. (2-tailed) t-test for Equality of Means Mean Difference Std. Error Difference Dep_Score Equal variances assumed Equal variances not assumed Step 2: 0.504/2 = Still > 0.05, the 1 tailed test still can t find sign. Difference between the 2 groups! 35

41 Running the 2 samples test with Excel (For implanting Data Analysis Tools Add-In of Excel, please refer to Appendix: Installation of Free Software) 1) DATA, Data Analysis 36

42 2) Test for Assumption of equal variance 3) Input required information 4) Results show that the equal variance assumption is not hold << 0.05, the variance of the two groups are sign. Different! 37

43 5) Choose t-test: Two-Sample Assuming Unequal Variance 6) Input required information: The Hypothesis Mean Difference is a very useful item! If we just want to test whether there is any difference in mean of two group, just leave it blank (meaning zero)! If we want to test whether the 2 groups have a particular value of difference, just fill in the value: Leave blank means being zero! If a particular value of difference is to be tested, please input here!! 38

44 7) Results similar to results from SPSS, the small difference might due to rounding off issue: Sample as in SPSS (two tailed) Two tailed and One tailed Test Results from PH4Stat4 Excel Add-In: Population 1 Sample Sample Size 25 Sample Mean 67.6 Sample Standard Deviation Population 2 Sample Sample Size 25 Sample Mean Sample Standard Deviation Intermediate Calculations Numerator of Degrees of Freedom Denominator of Degrees of Freedom Total Degrees of Freedom Degrees of Freedom 39 Standard Error Difference in Sample Means Separate Variance t Test Statistic Two Tail Test Lower Critical Value Upper Critical Value p Value Do not reject the null hypothesis Population 1 Sample Sample Size 25 Sample Mean 67.6 Sample Standard Deviation Population 2 Sample Sample Size 25 Sample Mean Sample Standard Deviation Intermediate Calculations Numerator of Degrees of Freedom Denominator of Degrees of Freedom Total Degrees of Freedom Degrees of Freedom 39 Standard Error Difference in Sample Means Separate Variance t Test Statistic Same as in SPSS (two tailed) Upper Tail Test Upper Critical Value p Value Do not reject the null hypothesis 39

45 1.2.c Paired T-Test Paired T-Test would be used when e.g. the sample subjects would be measured on two time points, or pairs of twins are studied under experiments etc., etc. The key point is that we assume there is a particular relationship existing such that the data values measured would not be independent from each other! Simply speaking, it is an analysis of the differences from each pair of data: t = d 0 / sd / nd 40

46 For example, a company running two shops want to know whether there would be real difference of income between them. Suppose using SPSS:- 1) Analyze, Compare Means, Paired-Samples T Test 41

47 2) Put Shop_1 under Variable1 and Shop_2 under Variable2 3) Click OK 4) Output: 42

48 Enlarged pictures: Paired Samples Test Paired Differences 95% Confidence Interval of the Std. Std. Error Difference Mean Deviation Mean Lower Upper Pair 1 Shop_1 - Shop_ Mean of Shop 1 Mean of Shop 2 Std. Dev. Of the difference Std. Error of Mean of the difference 95% confidence that the difference would fall into this range Paired Samples Test t df Sig. (2-tailed) Pair 1 Shop_1 - Shop_ t (df = 9) t = Sign. < 0.05, the result is significance! Reject the hypothesis that the difference between the income of the two shop = 0! Shop 2 have an income different from shop 2.

49 Two-Tailed Test Paired Samples Test t df Sig. (2-tailed) Pair 1 Shop_1 - Shop_ Step 1: Make sure the +/- sign is agreed with the hypothesis you want to test for i.e. Shop_1 > Shop_2 or Shop 2 > Shop_1! If opposite, then no need to test anymore. If agreed, go to Step 2! Step 2: Divided this probability value by two and see whether it can be < 0.05 for rejecting the null hypothesis! 0.049/2 = , reject null hypothesis and accept the alternative hypothesis that Shop 1 have income lower that Shop t =

50 Example in 1.2.c Solved by Excel 1) Data, Data Analysis 2) Choose t-test: Paired Two Samples for Means 45

51 3) Input required fields: Besides zero, we can test for hypothesis that variable 1 is different from variable 2 by a certain value! 4) Results is almost the same as in SPSS: 46

52 Output of Excel Add-In PHStat4: 47

53 1.3 Sample Size N and Power for T Test As stated previously in this example, if we cannot get a t value > 2 and have accepted the null hypothesis that there is only one population mean (with the red curve), then we would not commit type I error, but we would immediately under the risk of committing type II error that there are, in fact, two populations (two curves) with different means! 1 β Tradition value = 80% If there is, in fact, another population (the blue curve), then the t value in this range would make us commit type II error for saying that there is only one curve the red one! Or we might say the possibility of committing type II error for not detecting a real population mean difference is the area β!! We could also say 1 β would be the Power of the test for detecting there is real difference between the two groups i.e. accepting the alternative hypothesis! If the power is too small, we would have too high a chance of missing real population differences in a T Test! So we should know how to find its value so to know whether it is high enough or not! The test might be run before the experiment (planned) or only be able to be run after experiment (Post Hoc), depending e.g. whether population Std. Dev. being known etc. etc! 48

54 Calculation of Sample Size and Power Calculation of sample size and power can, itself, be a course of a whole school term! But, for the simplest random sampling from normal populations, we can have the following example: 1.3.a Using hand calculator When Confidence coefficient = (1-α) = 95% Width of the interval (difference want to be able to detect) = δ = 50 Obtained the standard deviation (σ) = 150 The required sample size = [(1.96*150)/50] 2 = b Sample Size N in Estimating a population average (µ), by PHStat4 Excel You must start the program by clicking its shortcut outside Excel! 49

55 50

56 Known σ from previous or other studies, If population Std. Dev. is not available, just calculate (Max. possible value Min. possible value)/4 Difference you expect from Comparing population value and sample value! Confidence interval (1-α) Remarks: 1. Max. possible value of Std. Dev. would occur when the data points are equally spread on the mini. and max. possible values of the data set e.g. : 1 and 50 are the min. and max. possible values of a test score :- 2 data obtained: Std. Dev (1, 50) = data obtained: Std. Dev (1,1, 50) = Or Std. Dev (1,50,50) = data obtained: Std. Dev (1,1,50,50) = data obtained: Std. Dev (1,1,1,1,50,50,50,50) = Min. possible value of Std. Dev. usually = 0 2. Please notice that this Sample Size of 35 is only used for providing a 95% confidence that the mean of the sample obtained would be within mean of the population +/- E, when E is the sampling error you can accept :- -E +E Same result as using hand calculator! For calculation of Sample Size considering there might be a whole diff. population existing, and for calculation of power that is not needed here, please refer to following sections with Gpower Software, which is totally free!

57 1.3.c Calculation of Sample Size and Power using G Power (For installation of G Power please refer to Appendix Installation of G Power) In examples above, we haven t mentioned about the term Power although, in fact, the narrower the interval it can detect, the higher the power the test would have! Anyway, it is common for us to need to calculate the Power, the possibility of detecting a real difference without committing the type II error (without accepting the null hypothesis wrongly)! We would like to introduce an excellent, free G Power software for calculation of sample size and power! 52

58 1.3.c.i Calculation of Sample size using free software G Power Suppose we want to run a test to see whether a high salt diet sample can be detected as different from the general population with a higher blood pressure of 5 mmhg, and with a power of 0.8! The Std. Dev. is known to be 10 mmhg. [This value might be found by checking previously studies or by taking some quick samples for measurement. If both is not possible, just take the maximum possible value (maximum possible value minimum possible value) and divide this value by 4.] 1) Under Statistical test, choose Means: Difference from constant (one sample case) 2) Under Type of power analysis, choose A priori: Compute required sample size 53

59 3) Click Determine for seeing the small window at the right hand side: 4) Input information and click calculation 5) Input the calculated Effect size d (0.5 this time) into the main windows 54

60 6) Input 0.8 unless specially specified. 7) Results : N = 27 55

61 8) Select Two for two tailed test 9) Results N = 34 56

62 10) There are other similar tests e.g. the two samples test : 57

63 1.3.c.ii Calculation of Power with Free software G Power Suppose, in the previous blood pressure study, we have reversely 27 subjects for measurement and the sample mean is found to be 5 mmhg higher than the papulation mean, Std. Dev. is 10 mmhg as before, the power should be as before: 1) Choose t test, Means: Difference from constant (one sample case) 2) Under Type of power analysis, choose Criterion: Computer required α given power... 58

64 3) Click Determine for seeing the small window at the right hand side: 4) Input information and click calculation 5) Input the calculated Effect size d (0.5 this time) into the main windows 59

65 6) Input Sample Size 27: 7) Result: Power (1-β) =

66 8) Select Two for two tailed test 9) Result: Power (1-β) =

67 10) There are other similar tests e.g. the two samples test : 62

68 1.4 Testing Hypothesis for Proportions (instead of Means) 1.4.a Approximation of a binomial distribution by a normal distribution As stated previously, the normal distribution is so important that it can solve many problems not only for so many natural phenomena in the natural world, but it can also be used for solving many other problems by superimposing on other distributions! Let s see the following binomial distributions before any medicine is available for a disease and patient just recover by bed resting, the chance of recovery is 0.4 (fail to recover is 0.6) : 63

69 As you can see, for these binomial distributions, as N increases, the more the histogram would become a normally distribution one! In fact, the more p approaches 0.5, the less N need to be so that a normally distribution can well superimpose on the histogram of the binomial distribution! Generally, we can carry out the approximation when Np and Nq are both greater than 5! In the other way, N = 5/p and round up or N = 5/q and round up, depending on which is smaller!! For example, for 0.4 and 0.6, we have 5/0.4 = 12.5, round up to N=13, for carrying out the approximation! The following is a normal (z) distribution, suppose z = +/- 1.14, the area beyond them = 12.71%:- The area can be easily found in Normal Distribution Table critical value for 0.5 (2 tailed test) = 1.96, 1.14< Not significant! 64

70 If the superimpose of the normal distribution on the binomial one is alright, the later one can be treated as a normal distribution with mean = Np (=20*0.4=8 in this case) and Std. Dev = Npq (= 20*0.4*0.6) = 2.19 in this case)! The x-axis above have two scale, one is the k-value of actual number of patients recovered, and the other one is z-value in standard deviation of the Standard Normal Distribution Curve! (Due to some reason, the equation is adjusted by (+0.5) if k < µ, and is adjusted by (-0.5) if k > µ. z = [(k-µ) + 0.5] / σ (if k < µ) and z = [(k-µ) - 0.5] / σ (if k > µ) Therefore z =[(5-8) + 0.5] /2.19 = and z =[(11-8)-0.5] /2.19 = 1.14 It means that either having 5 or fewer patients (or having 11 or more patients) recover would be due to a chance of 12.7% of a binomial population of p = 0.4 of recovery with bed rest only! 65

71 Suppose, in the case above, if a medicine in testing is applied to 1,000 patient and 430 of them recover finally, we would like to calculate whether p > 0.4 of taking bed resting only! Without approximation of normal distribution, we can image the difficult of solving such a binomial question with e.g. 1000! Or X , for hundreds of times!! But now, we can easily solve the problem by the superimpose of normal distribution:- µ = 1000 * 0.4 = 400 σ = 1000*0.4*0.6 = z =[( )-0.5] /15.49 = =: 1.90 From Table: the chance of having an area of +/ = ! This means that the researchers would have 97% confidence that the medicine would increase the probability of the original recovery rate of 0.4! They should go ahead for considering great investment for its mass production! 66

72 1.4.b Using PHStat4 to solve the problem above In fact, the problem of superimposing binomial distribution by normal distribution above is a running of a z test for a proportion instead of a mean! It can be solved by PHStat as below:- 1) Click the PHStat icon on Desktop to enter excel, click Enable Macro : 2) Click Add-Ins, PHStat 67

73 3) One-Sample Tests, Z Test for Proportion 4) Seeing: 68

74 5) Probability or proportion of patients recovery from bed resting only! Confidence level to say rejecting the null hypotheses! Patient recover after taking medicine! Total number of patients taking the medicine! 6) Results very close to hand calculation :- Just testing the hypothesis of greater than p = 0.4 or not, without considering the negative tail! P value very close to value calculated by hand previously! Seems due to the (-0.5) and rounding off etc.. 69

75 1.4.c Using PHStat4 to solve problems with two proportions from binomial populations We can also use PHStat4 to compare two proportions from 2 binomial populations. For example, if we have 25 patients from a hospital of 300 patients passed away due to a certain disease, and have 34 patients passed away in another hospital treating 350 patients having the same disease, in the same year. Can we say that there is a different proportion of patients passing away in that year? Firstly, let s check whether the two binomial population (with recover probability p1 and recover probability p2) can be superimposed by a normal distribution: Null Hypothesis : (p1 p2) = 0 Alternative Hypothesis : (p1 P2) =\= 0 Test statistics: z = (sample p1 sample p2) / σ(sample p1 sample p2) Rejection region (α = 0.5): z = zα/2 = ) Click PHStat icon on Desktop, click Enable Macro 70

76 2) Add-In, PHStat 3) PHStat, Two-Sample Tests (Summarized Data), Z Test for Differences in Two Proportions 71

77 4) Seeing: 5) Difference want to detect Confidence to say no difference Patients passed away in Hospital 1 Patient treated in Hospital 1 Patients passed away in Hospital 2 Patient treated in Hospital 2 72 Testing for both tails, either greater than or smaller than

78 6) Results: P = , reject the hypothesis that the two proportions of patients passing away for the same disease are different! Even for one tail, p = /2 = , still not significant! 73

79 1.5 Non-parametric tests corresponding to various t tests So far in this book, we are talking about detection for mean, std. dev., proportion etc., which are parameters of a population! For such a calculation, the population being tested must fulfill some assumptions e.g. being normal distributed, having same variance etc.! However, we might often not know about whether these assumptions can be fulfilled or not! We might even not know above what distribution the population is, at all! In these situations, we should run the non-parametric tests corresponding to various t-tests! They are: - Wilcoxon signed rank test corresponding to One sample t test Mann-Whitney test corresponding to Two samples t test or Wilcoxon rank sum test (two samples, identical results for the 2 tests) Wilcoxon matched-pairs sign-rank test corresponding to Paired t test (Dependent Samples, sometimes also called Wilcoxon signed rank test in some books and software, careful!) (* Before giving up parametric t test to run non-parametric tests, please also consider the being robust of t test, that can still give correct decision even where there is small to mediate violation of assumptions for the parametric test, especially on equal variance and equal sample size cases!) 74

80 1.5.a Wilcoxon signed rank test (corresponding to one sample t test) The only assumptions for running Wilcoxon signed rank sum test are: 1) The population is continuous 2) The population has a median 3) The population is symmetric Running the Wilcoxon signed rank sum test in SPSS: For example, we have got a data set as following: 9, 11, 18, 16, 17, 21, 12, 10, 11, 11, 19, 16, 12, , 14, 15, 13 We want to test the hypothesis: Ho: Median = 16 Ha: Median =\= 16 1) Data in SPSS: 75

81 2) Analysis, Nonparametric Tests, One Sample 3) Click Assign Manually : 76

82 4) Move Data to Continuous : 5) Click OK 77

83 6) You would go back to the previous window, select the test again: 7) Select Automatically compare., Click Settings

84 8) Select Choose Tests, Customize tests, Compare median. (Wilcoxon signed-rank test) Input 16: 9) Choose Test Options, input Significance level and Confidence interval, use default is no need to change: 79

85 10) Results: P > 0.05, can t reject the hypothesis that the population median is equal to 16. But 0.066/2 = < 0.05, sign. for 1 tailed test! Calculation by hand if SPSS is not available: i) Subtracting 16 from each observation, we get -7, -5, 2, 0, 1, 5, -4, -6, -5, -5, 3, 0, -4, -3, 4, -2, -1, -3 ii) Discarding the zeros and ranking the others in order of increasing absolute magnitude, we have 1, -1, 2, -2, 3, -3, -3, -4, -4, 4, -5, 5, -5, -5, -6, -7 iii) The 1 s occupy ranks 1 and 2; the mean (average) of these ranks is 1.5; and each 1 is given a rank of 1.5 iv) The 2 s occupy ranks 3 and 4; the means of these ranks is 3.5; each 2 is given a rank of 3.5. v) In a similar manner, each 3 receives a rank of 6; each 4 a rank of 9; each 5 and rank of 12.5; the 6 is assigned a rank of 15; and the -7 a rank of 16. vi) The sequence of the ranks is now 1.5, -1.5, -3.5, 3.5, 6, -6, -6, -9, -9, 9, -12.5, 12.5, -12.5, -12.5, -15, -16. ( - indicate negative ranks). vii) The positive rank sum = 32.5 The negative rank sum = Take the smaller rank sum is taken as T = 32.5 viii) In the table for Wilcoxon signed-rank test, find, in the column headed by the value α = 0.05, n = number of ranks = 16 (18-2), critical value = is not less than or equal 29, so we have to accept the null hypothesis that the median is 16! For one tailed test, α = 0.01, critical value = 35, we can reject the null hypothesis! 80