Non-parametric Tests Using SPSS

Size: px
Start display at page:

Download "Non-parametric Tests Using SPSS"

Transcription

1 Non-parametric Tests Using SPSS Statistical Package for Social Sciences Jinlin Fu January 2016 Contact Medical Research Consultancy Studio Australia

2 Contents 1 INTRODUCTION UNIVARIATE LOGISTIC REGRESSION ASSUMPTIONS AND DATA REQUIREMENTS TESTING FOR NORMALITY MULTIPLE LOGISTIC REGRESSION TESTING FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES FISHER S EXACT TEST FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES TESTING A THEORETICAL MODEL (GOODNESS OF FIT) USING CHI-SQUARE FURTHER ANALYSIS USING CHI-SQUARE BINOMIALTESTS USING CHI-SQUARE RUNS TEST ASSUMPTIONS AND DATA REQUIREMENTS COMFIRMATION OF APPROPRIATE CUT POINTS RUNS TEST ONE-SAMPLE TEST ASSUMPTIONS AND DATA REQUIREMENTS ONE-SAMPLE KOLMOGOROV-SMIRNOV TEST TWO-INDEPENDENT-SAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS TWO-IINDEPENDENT-SAMPLE MANN-WHITNEY AND WILCOXON TESTS TWO-INDEPENDENT-SAMPLE KOLMOGOROV-SMIRNOV TEST MULTI-INDEPENDENT-SAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS KRUSKALL-WALLIS TEST THE MEDIAN TEST POST HOC TESTS TWO-RELATED-SAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS WILCOXON SIGNED-RANKS TEST THE SIGN TEST MCNEMAR TEST MULTIPLE-RELATED-SAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS FRIEDMAN TEST KENDALL S WTEST COCHRAN S QTEST NONPARAMETRIC CORRELATIONS... 错 误! 未 定 义 书 签 10.1 ASSUMPTIONS AND DATA REQUIREMENTS SPEARMAN S RANK CORRELATION KENDALL S TAU RANK CORRELATION... 47

3 1 INTRODUCTION Regression methods have an integral component of any data analysis concerned with describing the relationship between a response (or outcome or dependent) variable and one or more explanatory (predictor or independent) variables, or covariates. It is often the case that the outcome variable is discrete, taking on two or more possible values. Over the last two decades the logistic regression has become, in many fields, the standard method of analysis in this situation. Logistic regression allows one to predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, dichotomous, or a mix. Because of its popularity in the health sciences, the discrete outcome in logistic regression is often disease/no disease. For example, can presence or absence of hay fever be diagnosed from geographic area, season, degree of nasal stuffiness, and body temperature? Logistic regression has no assumptions about the distributions of the predictor variables; in logistic regression, the predictors do not have to be normally distributed, linearly related, or of equal variance within each group. Unlike multiway frequency analysis, the predictors do not need to be discrete; the predictors can be any mix of continuous, discrete and dichotomous variables, Unlike multiple regression analysis, which also has distributional requirements for predictors, logistic regression cannot produce negative predicted probabilities. There may be two or more outcomes (groups) in logistic regression. If there are more than two outcomes, they may or may not have order (e.g., no hay fever, moderate hay fever, severe hay fever). Logistic regression emphasizes the probability of a particular outcome for each case. For example, it evaluates the probability that a given person has hay fever, given that person's pattern of responses to questions about geographic area, season, nasal stuffiness, and temperature. Logistic regression analysis is especially useful when the distribution of responses on the dependent variable is expected to be nonlinear with one or more of the independent variables. Because the model produced by logistic regression is nonlinear, the equations used to describe the outcomes are slightly more complex than those for multiple regression. The outcome variable, Ŷ, is the probability of having one outcome or another based on a nonlinear function of the best linear combination of predictors; with two outcomes: Ŷ = 1+ Where Ŷ. is the estimated probability that the ith case (i = 1,..., n) is in one of the categories and u is the usual linear regression equation: = with constant C, coefficients β,, and predictors, X for k predictors (j = 1, 2,..., k). This linear regression equation creates the logit or log of the odds: Ŷ 1 Ŷ = + That is, the linear regression equation is the natural log ( ) of the probability of being in one group divided by the probability of being in the other group. For example, that is the natural log of the probability of being in the disease group divided by the probability of being in the non-disease group. 1

4 Logistic regression can also be used to fit and compare models. The simplest (and worst-fitting) model includes only the constant and none of the predictors. The most complex (and "best"-fitting) model includes the constant, all predictors, and, perhaps, interactions among predictors. Often, however, not all predictors (and interactions) are related to the outcome. There searcher uses goodness-of-fit tests to choose the model that does the best job of prediction with the fewest predictors. 2

5 2 UNIVARIATE LOGISTIC REGRESSION Logistic regression is useful for situations in which you want to be able to predict the presence or absence of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear regression model but is suited to models where the dependent variable is dichotomous. Logistic regression coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic regression is applicable to a broader range of research situations than discriminant analysis. 2.1 ASSUMPTIONS AND DATA REQUIREMENTS ASSUMPTIONS: Logistic regression does not rely on distributional assumptions. However, the solution may be more stable if selected predictors have a multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among the predictors can lead to biased estimates and inflated standard errors. The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, "high IQ" versus "low IQ"), you should consider using linear regression to take advantage of the richer information offered by the continuous variable itself. DATA: The dependent variable should be dichotomous. Independent variables can be interval level or categorical; if categorical, they should be dummy or indicator coded (there is an option in the procedure to recode categorical variables automatically). The question we want to answer is: what is the average level of weight in the study sample? To what extent does the weight measurement vary? We will now examine the variable Weight. From the toolbar menus, select Analyse Descriptive Statistics Explore. Add Weight to the Dependent List using the arrow button. Your screen should look like the one below: 3

6 Click OK. Your output screen should look like the output on the next page: Interpreting the output First table (Case Processing Summary): This provides the total number and percentage of observations and any missing values. Second table (Descriptives): This table lists all statistics used to describe the variable, Weight. We can see that the mean weight of patients is 60.9 kg, with standard deviation of 14.4 kg and 95% confidence interval of 56.8 to 65.0kg. We can also have median weight of 59.6kg with IQR of 11.0k, minimum (43.0kg), maximum (136.4kg) and range (93.4kg). The second table also includes two statistics, Skewness and Kurtosis, which are used to examine the shape of distribution curve. Skewness and Kurtosis are and respectively in the second table of the outputs. The skewness and kurtosis statistics are far from 0. This is strong evidence that distribution of Weight is not regarded as normal distribution. 4

7 Definition of Skewness: A measure of the asymmetry of a distribution. The normal distribution is symmetric and has a skewness value of 0. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail. As a guideline, a skewness value more than twice its standard error is taken to indicate a departure from symmetry. Definition of Kurtosis: A measure of the extent to which observations cluster around a central point. For a normal distribution, the value of the kurtosis statistic is zero. Positive kurtosis indicates that the observations cluster more and have longer tails than those in the normal distribution, and negative kurtosis indicates that the observations cluster less and have shorter tails. 2.2 TESTING FOR NORMALITY It is a good habit to examine the distributions of the continuous variables of interest before we start analysing on them. It helps us choose the right statistical methods to perform the analyses we want by doing so. Many statistical tests require that one or more variables are normally distributed. If a variable is normally distributed then parametric tests apply. If it is not the case, it is impropriate to employ parametric tests because violation of normal distribution of the variable may result in false outcomes. So, if a variable is nonnormally distributed, nonparametric tests, which is as valuable as parametric ones such as t-test and ANOVA, apply. The question we want to answer is: Is the variable Weight normally distributed? To check that this is the case, select Analyse Descriptive Statistics Explore. This time we still select Weight into the Dependent List and then click on Plots. Check that Stem-and-leaf is not selected and that Histogram is selected, and then select Normality plots with tests. 5

8 Click Continue and OK. Your output screen should look like the output below: Interpreting the output First table (Descriptives): This table is exactly the same as the one in 2.2. Second table (Tests of Normality): The table contains two formal tests for normality: the Kolmogorov- Smirnov and Shapiro-Wilk tests. The Kolmogorov-Smirnov test is only used for datasets with a large number of observations (i.e. > 5000). The Shapiro-Wilk significance level (p-value, labelled as Sig.) for Weight is less than 0.001, which is significant, and the histogram (below) looks to be nonnormally distributed. From the output we can see that the variable Weight is nonnormally distributed. 6

9 First graph (Histogram):This graph is a visual summary of the distribution of values. The overlay of the normal curve helps you to assess the skewness and kurtosis. The below histogram does show that the distribution is not symmetric, but left skewed. Second graph (Box Plot): The second graph is a box plot. Outliers are identified with a star sign *. The yield has two outlying values, labeled27 and 42. The label refers to the row number in the Data Editor where that observation is found. 7

10 3 ANALYSING CATEGORICAL DATA HYPOTHESIS TESTING USING CHI-SQUARE 3.1 TESTING FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES The Chi-square analysis can be used to determine whether there is a dependency between two categorical variables. The question we want to answer is: Is the level of exposure independent of gender? Go to Analyse Descriptive Statistics Crosstabs. Put the variable Exposure into the Row box and Gender into the Column box. Click the Statistics button and select Chi-square. and Continue. 8

11 Click the Cells button and select Percentages: Row. Click Continue and OK. Your output screen should look like the one below: 9

12 The results of the Chi-square test do not depend on whether you place Gender or Exposure in the rows or columns these can be switched around. However, your interpretation of the table tends to depend on the variables you have designated to be the rows and the columns. By custom, the variable you are interested in is designated to the rows. So in this example, our interest is in the level of exposure and we are investigating level of exposure by gender. Exposure is therefore designated to the row variable. Interpreting the output First table (Case Processing Summary): This provides the total number and percentage of observations and any missing values. Second table (Exposure * Gender Cross tabulation): This is a cross-tabulation displaying the two variables of interest, in this case, Exposure by Gender. Both the observed values and the row percentages are presented. This table is of benefit when the two variables are dependent, to interpret what the dependency between the two variables is. Third table (Chi-Square Tests): This shows the results of the Pearson Chi-square test. The p-value that responds to the question of independence is in the third column (Asymp. Sig (2-sided)) in the top row, and for this test was This indicates that the two variables (Exposure and Gender) are independent. Warning: The Chi-square test is not appropriate if the expected values are too small. SPSS will issue a warning below the third table if any of the cells have an expected value of <5. A guideline that is often used is that we should not have any cells with expected values less than 1 and at most one or two cells with expected values less than 5. Essentially you do not have enough data to reliably perform the Chi-square test, given the number of rows and columns in the table, and you do not have enough data upon which to make any reliable conclusions. In these instances you either (i) increase your sample size, (ii) reduce the number of the rows and/or columns, or (iii) use a Fisher s exact test (as follows) if there are only two categories for each variable. It is important to note that columns and/or rows can only be reduced if it is theoretically valid to do so. If you require a Fisher s exact test with more than two categories in either of the variables, please contact DM&A. 3.2 FISHER S EXACT TEST FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES SPSS outputs the results of Fisher s exact test within the Chi-square output (Section 3.1) when each variable only has two categories. If this is the case, and the expected values of each cell are too small, use the Fisher s exact p-value instead of the Pearson Chi-Square. In the example below, the Fisher s exact p-value (use the 2- sided value) is This indicates that the two variables (Insurance status and Gender) are independent. 10

13 3.3 TESTING A THEORETICAL MODEL (GOODNESS OF FIT) USING CHI-SQUARE The question we want to answer is: Are there equal numbers of males and females? To perform this test in SPSS, go to Analyse Nonparametric Tests Chi-Square.Put Gender in the right-hand Test Variable List box. Your screen should look like the one below: 11

14 In the Expected Values section, All categories equal is already selected. Click OK. Your output screen should look like the one below: 12

15 Interpreting the output First table (Gender): Provides the values expected under the assumption of equal numbers by gender. The residual values are calculated as observed - expected. Second table (Test Statistics): This provides the Pearson Chi-square statistic, the degrees of freedom and the p-value (labelled Asymp. Sig.); here the p-value is This indicates that there are equal numbers of males and females. Now suppose that a claim has been made that there are twice as many male patients as female patients. We can use the same procedure as before, this time choosing our own expected values. If there are twice as many males as females then the ratio of males to females is 2:1 and so one-third of the patients are females and two-thirds are males. Given that there are 50 patients in total, we would then expect: 50/ females and 16.7 x males. We enter these as expected values, entering 33.3 first since males correspond to the value 0. Go to Analyse Nonparametric Tests Chi-Square and select Values. Type 33.3 and click Add, then type 16.7 and click Add. Note that the expected values must be in the same order as the categories. Your screen should look like the one below: Click OK. Your output screen should look like the one below: 13

16 Interpreting the output First table (Gender): Provides the values expected under the assumption of twice as many males as females. The residual values are calculated as observed - expected. Second table (Test Statistics): This provides the Pearson Chi-square statistic, the degrees of freedom and the p-value (labelled Asymptotic Significance); here the p-value is This indicates that the numbers of males and females do not match the expected values that we entered. Note: You can do Goodness of Fit for a few variables (e.g. Gender, Exposure and Group) in the same procedure as selecting all the variables of interest as shown in the following window:. 14

17 Click OK. Your output screen should look like the one below: 3.4 FURTHER ANALYSIS USING CHI-SQUARE We have known how to do independent test between two categorical variables using Chi-square (2.1). If we are asked, Does insurance status change the situation, the level of exposure independent of gender?, we have to further cross-classify by whether they had public or private insurance. Go to Analyse Descriptive Statistics Crosstabs. Put the variable Exposure into the Row box and Gender into the Column box, and this time also put Insurance into the Layer 1 of 1 box. Do other selections as you did in Section 2.1, and then click OK. 15

18 Your output screen should look like the one below: 16

19 Interpreting the output First table (Exposure * Gender * Insurance Cross tabulation): This is a cross-tabulation displaying the two variables of interest, in this case, Exposure by Gender at different Insurance levels. Both the observed values and the row percentages are presented. This table is of benefit when the two variables are dependent, to interpret what the dependency between the two variables is at each level of Insurance. Second table (Chi-Square Tests): This shows the results of the Pearson Chi-square test. The p-values that respond to the question of independence are in the third column (Asymp. Sig (2-sided)) in the top row, and for this test were for public insurance holders and for private insurance holders. This indicates that the two variables (Exposure and Gender) are independent for public, but not for private insurance holders. Warning: Because 83.3% cells have expected count less than 5, you either (i) increase your sample size, (ii) reduce the number of the rows and/or columns (It is important to note that columns and/or rows can only be reduced if it is theoretically valid to do so.), or (iii) use a Fisher s exact test if there are only two categories for each variable in case you are still trying to use SPSS (version 15 or less). If you require a Fisher s exact test with more than two categories in either of the variables, you can go to IBM website for solution (You can find that IBM has issued a SPSS Exact Tests program on its website), or please contact DM&A. 3.5 BINOMIAL TESTS USING SPSS The Binomial Test procedure compares the observed frequencies of the two categories of a dichotomous variable to the frequencies that are expected under a binomial distribution with a specified probability parameter. A dichotomous variable is a variable that can take only two possible values: yes or no, true or false, 0 or 1, and so on. If the variables are not dichotomous, you must specify a cut point. The cut point assigns cases with values that are greater than the cut point to one group and assigns the rest of the cases to another group. In general population, the prevalence of a disease is about 21% and we are going to test if our sample patients have a higher rate of the disease. The question we want to answer is: Is the disease prevalence in the study sample same as that in general population? Go to Analyse Nonparametric Tests Binomial, and then put Disease into Test Variable List box. By default, the probability parameter for both groups is 0.5, although this may be changed. To change the probability, you enter a Test proportion for the first group (for this exercise we enter 0.21). The probability for the second group is equal to 1 minus the probability for the first group (1-0.21=0.79). Click on Options and then tick Descriptive under the Statistics. Click Continue and then OK to run the procedure. The outputs are then displayed in the Output window as it shows on next page. 17

20 Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of the variable, Disease. It tells the number of the cases (N=50), the proportion of disease in the sample patients (Mean=0.28), standard deviation of the proportion (Std. Deviation =0.454),and the possible smallest and biggest value of the proportion (Minimum=0 and Maximum=1). Second table (Binomial Test): This shows the results of the Binomial test. The first three columns show the category, number of cases and observed proportions at each level of Disease. The fourth column gives the test or reference proportion that you have entered. The p-value that responds to the question of whether prevalence of disease in the sample patients (0.28 or 28%) is different from that of population (0.21 or 21%) is in the fifth column [Asymp. Sig. (1-tailed)] in the top row, and for this test it was This indicates that there is no statistically significant difference in proportions between sample patients and population. We can also use this test to check the proportions at levels of a variable. Say, for the above test we want to know if it is still true between males and females separately. Go to Data Split File and click Compare groups and Sort the file by grouping variables. Put Gender into Group based on box as it shown on the next page and then click OK. 18

21 Repeat the procedure as we did above in Binomial Test. Your output screen should look like the one below: Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of Disease by Gender. Proportions of disease in male and female patients were 0.26 and 0.30, or 26% and 30%, respectively. Second table (Binomial Test): This shows the results of the Binomial test by Gender. The p-values that respond to the question of whether prevalence of disease in the sample patients (0.26 in males and 030 in females) is different from that of population (0.21for both males and females) is in the fifth [Exact Sig. (1-tailed) ] or sixth column [Asymp. Sig. (1-tailed)] in the top row, and for this test it was for males and for females. This indicates that there is no statistically significant difference in proportions between sample patients and population, no matter it is for males or for females. 19

22 When the number of cases in one category of gender is less than 30, Exact Gig.(exact significance level) will be displayed instead of Asymp. Sig. for that category. Most often we also want to know whether the patients with disease are heavier in weight, that is, whether the patients were more obese in terms of BMI, than those without disease. So, the question is: whether those who had disease tend to be above or below the cut-off value of 25.0 for overweight and obesity. First, we split our data by doing: Go to Data Split File and click Compare groups and Sort the file by grouping variables. Put Disease into Group based on box and then click OK. Next, we are going to employ Binomial test to find an answer to the above question. Go to Analyse Nonparametric Tests Binomial, and then put BMI into Test Variable List box. Enter in the Cut Point box and keep Test proportion box as default. (I will explain the reason why I entered for Cut point instead of 25 later in output section).click Options to select Quartiles, and then click Continue. Click OK to run the test. Your output screen should look like the one below: 20

23 Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of BMI by Disease. The table tells us the number of cases (N) and percentiles [including 25 th, 50 th (median) and 75 th percentiles] in each category of Disease(Yes and No). Second table (Binomial Test): This shows the results of the Binomial test. BMI was classified into two subgroups based on cutoff value (25.0) as shown in the first column, namely Group 1 contains all cases with BMI less than 25.0 and Group 2 with BMI equal to or greater than (if we enter 25 as cut point in binomial Test window, here group 2 will exclude the value of 25 in BMI; but if we enter 24.99, in group 2 the smallest value will be greater than 24.99, implying the smallest value in group 2 is equal to 25.0 because in the data set the value below 25.0 is 24.9 which is smaller than So, in group 2 all the BMI values will be equal to or greater than 25.0 as we expected).this table also shows the number of cases (N) and the proportions (Observed Prop.) in each group under the different Disease status. Proportions of cases with BMI equal to or greater than 25.0 (Group 2) in Disease (Yes) and non-disease (No)groups are 0.21 vs 0.44, respectively, but the test did not tell us if there was a statistically significant difference between these two proportions. The p values were shown in fifth [Asymp. Sig. (1-tailed)]and sixth [Exact Sig. (1-tailed) ] columns. The above p-values indicate that whether proportions of cases between two BMI groups are different at each Disease status. The outputs show that the proportions of cases divided by BMI cutoff value of 25.0 were found significant difference neither in diseased (with p =0.608) nor in non-diseased cases (with p =0.057), that is, among the diseased there is no evidence to say that they are more overweighted or obsess, based on current data available. 21

24 4 RUNS TEST The Runs Test procedure tests whether the order of occurrence of two values of a variable is random. A run is a sequence of like observations. A sample with too many or too few runs suggests that the sample is not random. 4.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Nonparametric tests do not require assumptions about the shape of the underlying distribution. Use samples from continuous probability distributions. Data: The variables must be numeric. To convert string variables to numeric variables, use the Automatic Recode procedure, which is available on the Transform menu. Many statistical tests assume that the observations in a sample are independent; in other words, that the order in which the data were collected is irrelevant. If the order does matter, then the sample is not random, and you cannot draw accurate conclusions about the population from which the sample was drawn. Therefore, it is prudent to check the data for a violation of this important assumption. You can use the Runs Test procedure to test whether the order of values of a variable is random. The procedure first classifies each value of the variable as falling above or below a cut point and then tests to ensure that there is no order to the resulting sequence. 4.2 CONFIRMATION OF APPROPRIATE CUT POINTS The cut point is based either on a measure of central tendency (mean, median, or mode) or a custom value. You can obtain descriptive statistics and/or quartiles of the test variable. Go to Graphs Chart Builder...and then in the Choose from box select Histogram gallery and choose the first Simple Bar. Select variable Status as the x axis and click OK. The Bar Chart of the Test Variable appears on the output window as shown below. The scale for Status theoretically ranges from 0 to 15, where 0 = highly in poor health and 20 = highly in good health. The actual range of scores is narrower, dispersing from a low of 6 to a high of 14. The Histogram shows that Status is non-normally distributed, so that we choose median as the cut point. 22

25 10 8 Frequency 6 4 Mean =9.52 Std. Dev. =2.332 N = Status 4.3 RUNS TEST The question we want to answer is: Is the order of values of Status random? Go to Analyse Nonparametric Tests Runs. The median is selected by default, so keep it as it is. In th etest Variable List box we put in Status and then click Options. Select Descriptive and Quartiles, and then click Continue. Back to the Runs Test dialog box, and then click OK to run the test. Your output screen appears like the one below: Please note: before you run the procedure, you have to make sure that your records are sorted descendingly by their study orders (study ID, the orders by which the participants actually entered the study). 23

26 Interpreting the output First table (Descriptive Statistics):The statistics table will help you understand more about the distribution of Status by displaying the basic information of it. While the default table is very wide, you can easily pivot it to column format by following the steps below: Double-click the table to activate it. From the Viewer menus choose :Pivot Transpose Rows and Columns The table then transpose from row into column as it is shown below: 24

27 Second table (Runs Test): This shows the results of the Runs test. The test value is used as a cut point to dichotomize the sample. In this table, the cut point is the sample median. Of 50patients, 21 scored below the median (Cases <Test Value). Think of them as the "negative" cases. The remaining 29patients(Cases >=Test Value) scored at or above the median. Think of them as the "positive" cases. The next statistic is a count of the observed runs (Number of Runs)in the test variable. A run is defined as a sequence of cases on the same side of the cut point. If the order of the Status is purely random with respect to the median value, you would expect about 26 runs across these 50 cases. Because you observed only 2 runs, the Z statistic is negative. The 2-tailed significance value [Asymp. Sig. (2-tailed) ]is the probability of obtaining a Z statistic as or more extreme (in absolute value) than the obtained value, if the order of Status above and below the median is purely random. In another word, the 2-tailed significance value (here p=<0.201) allows you not to reject the null hypothesis that the order of the Status is random with respect to the higher median value of 9. This is to say that the order of Status is random with a cut point of 9. 25

28 5 ONE-SAMPLE TEST The One-Sample Kolmogorov-Smirnov Test procedure compares the observed cumulative distribution function for a variable with a specified theoretical distribution, which may be normal, uniform, Poisson, or exponential. The Kolmogorov-Smirnov Z is computed from the largest difference (in absolute value) between the observed and theoretical cumulative distribution functions. This goodness-of-fit test tests whether the observations could reasonably have come from the specified distribution. 5.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: The Kolmogorov-Smirnov test assumes that the parameters of the test distribution are specified in advance. This procedure estimates the parameters from the sample. The sample mean and sample standard deviation are the parameters for a normal distribution, the sample minimum and maximum values define the range of the uniform distribution, the sample mean is the parameter for the Poisson distribution, and the sample mean is the parameter for the exponential distribution. The power of the test to detect departures from the hypothesized distribution may be seriously diminished. For testing against a normal distribution with estimated parameters, consider the adjusted K-S Lilliefors test (available in the Explore procedure). Data: Use quantitative variables (interval or ratio level of measurement). 5.2 ONE-SAMPLE KOLMOGOROV-SMIRNOV TEST The question we want to answer is: Is the variable of Status normally distributed or one of particular distribution? Let us take Status as an example. Go to Analyse Nonparametric Tests 1-Sample K-S, and then select Status as the test variable to the Test Variable List box. Tick all the four options in the Test Distribution box and then click Options. Tick Descriptive and Quartiles and then click Continue. Finally click OK to run the procedure. 26

29 Your output screen should be expected like the two below: Interpreting the outputs First table (Descriptive Statistics): This table will help you understand more about the distribution of Status in these data by displaying the basic information of Status. The information includes number of cases (N), Mean, Standard Deviation, Minimum, Maximum, Percentiles (25 th, 50 th and 75 th ). Second table (One-Sample Kolmogorov-Smirnov Test): This is the default test--test of normal distribution. This table shows that the Normal distribution is indexed by two parameters--the mean and Standard deviation. The average weight of the sample is about 9.52 with SD of The next three rows fall under the general category Most Extreme Differences. The differences referred to are the largest positive and negative points of divergence between the empirical and theoretical Cumulative distribution functions (CDFs). The first difference value, labelled Absolute, is the absolute value of the larger of the two difference values printed directly below it. This value will be required to calculate the test statistic. The Positive difference is the point at which the empirical CDF exceeds the theoretical CDF by the greatest amount. 27

30 At the opposite end of the continuum, the Negative difference is the point at which the theoretical CDF exceeds the empirical CDF by the greatest amount. The Z test statistic is the product of the square root of the sample size and the largest absolute difference between the empirical and theoretical CDFs. Unlike much statistical testing, a significant result here is bad news. 28

31 The probability of the Z statistic is above 0.05, meaning that the Normal distribution with parameters of 9.52±2.33 is a good fit for the Status. Third and fourth tables (One-Sample Kolmogorov-Smirnov Test 2 and 3): These two tables convey the similar messages to second table. Fifth tables (One-Sample Kolmogorov-Smirnov Test 4): This table shows that the probabilities of the Z statistic are all below 0.05, meaning that the Exponential distribution with a parameters of 9.52 (Mean), is not a good fit for Status. The above outputs indicate that the Status has a good fit of Normal, Uniform and Poisson Distributions, but not of Exponential distribution. 29

32 6 TWO-INDEPENDENT-SAMPLE TESTS The Two-Independent-Samples Tests procedure compares two groups of cases on one variable. The nonparametric tests for two independent samples are useful for determining whether or not the values of a particular variable differ between two groups. This is especially true when the assumptions of the t test are not met. Suppose we want to know whether height differs between private and public patients. In other words: does the categorical (independent) variable insurance affect the continuous (dependent) variable height? 6.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Use independent, random samples. The Mann-Whitney U test requires that the two tested samples be similar in shape, that is, the variable you are testing is at least ordinal and that its distribution is similar in both groups. Data: Use numeric variables that can be ordered. We will assume that the independence assumption is met by the design of the experiment. 6.2 TWO-INDEPENDENT-SAMPLE MANN-WHITNEY AND WILCOXON TESTS The Mann-Whitney and Wilcoxon statistics can be used to test the null hypothesis that two independent samples come from the same population. Their advantage over the independent-samples t test is that Mann-Whitney and Wilcoxon do not assume normality and can be used to test ordinal variables. Go to Analyse Nonparametric Tests 2 Independent Samples. In the dialogue box add the dependent variable Height to the Test Variable List box. Add the independent variable Insurance to the Grouping Variable box. Click on Define Groups and in Group 1 type 1 and in Group 2 type 2. 30

33 Click Continue and OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the p-value. First, each case is ranked without regard to group membership. Cases tied on a particular value receive the average rank for that value (Mean Rank). After ranking the cases, the ranks are summed within groups (Sum of Ranks). Second table (Test Statistics): presents the results of the Mann-Whitney U test (otherwise known as the Mann- Whitney-Wilcoxon test or the Wilcoxon rank-sum test). The p-value [Asymp. Sig.(2-tailed)] is 0.041, implying that insurance does have an effect on height. Note: you can do this test for several variables in the same run by adding all them in the same box as Height. Outputs will be displayed in different panels by variable names. 31

34 6.3 TWO-SAMPLE KOLMOGOROV-SMIRNOV TEST The two-sample Kolmogorov-Smirnov test tests the null hypothesis that two samples have the same distribution. It's a very flexible test because no specific shape is assumed for the underlying distribution. However, because the test makes no assumptions, it is sensitive to differences in both location and scale. You may want to center the test variable if you are not interested in location differences; additionally, you may want to standardize the test variable to remove both location and scale. Go to Analyse Nonparametric Tests 2 Independent Samples. In the dialogue box add the dependent variable Height to the Test Variable List box. Add the independent variable Gender to the Grouping Variable box. Do the same as you did in Section 6.2. Deselect Mann-Whitney U, and select Kolmogorov-Smirnov Z. Then click OK to proceed. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the p-value. Second table (Test Statistics): presents the results of the Kolmogorov-Smirnov test. The p-value [Asymp. Sig.(2-tailed)] is 0.208, well above 0.05, implying that the distributions of the two yields are not significantly different from each other by that standard. 32

35 7 MULTI-INDEPENDENT-SAMPLE TESTS The Tests for Several Independent Samples procedure compares two or more groups of cases on one variable. The nonparametric tests for multiple independent samples are useful for determining whether or not the values of a particular variable differ between two or more groups. This is especially true when the assumptions of ANOVA are not met. Suppose we want to know whether BMI differs depending on which year patients were born in. In other words: does the categorical (independent) variable Year_born affect the continuous (dependent) variable BMI? 7.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Use independent, random samples. The Kruskal-Wallis H test requires that the tested samples be similar in shape. Data: Use numeric variables that can be ordered. We will assume that the independence assumption is met by the design of the experiment. 7.2 KRUSKALL-WALLIS TEST The Kruskal-Wallis test is a one-way analysis of variance by ranks. It tests the null hypothesis that multiple independent samples come from the same population. Unlike standard ANOVA, it does not assume normality, and it can be used to test ordinal variables. Go to Analyse Nonparametric Tests K Independent Samples. In the dialogue box add the dependent variable BMI to the Test Variable List box. Add the independent variable Year_born to the Grouping Variable box. Click on Define Range and type 1997 for Minimum and 2000 for Maximum. 33

36 Click Continue, Keep the Test Type as it is (default =Kruskal-Wallis) and click OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the p-value. Second table (Test Statistics): presents the results of the Kruskal-Wallis test. The p-value (Asymp.Sig.) is 0.038, implying that the year of birth does have an effect on BMI. 34

37 7.3 THE MEDIAN TEST The median method tests the null hypothesis that two or more independent samples have the same median. It assumes nothing about the distribution of the test variable, making it a good choice when you suspect that the distribution varies by group. Go to Analyse Nonparametric Tests K Independent Samples. Do the same as you did in Section 7.2, but this time deselect Kruskal-Wallis H, and select Median as the test type. Click Options and select Quartiles in the Statistics group. Click Continue to go back to the Tests for Several Independent Samples dialog box and then click OK to run the analysis. Your output screen should look like the one below: 35

38 Interpreting the output First table (Descriptive Statistics):presents numbers of each of the variables of interest and their percentiles. Second table (Frequencies):presents figures used to calculate the p-value, by Year-born and cut point (Median) of BMI. Third table (Test Statistics): presents the results of the Median test. The p-value (Asymp.Sig.) is 0.024, implying that the BMIs are different among the years of birth. 7.4 POST HOC TESTS SPSS doesn t have a convenient tool to do non-parametric post hoc testing. To find where the differences are, use multiple Mann-Whiney tests to compare each pair of categories. Because there are multiple comparisons here, the p-value is not longer significant at 0.05, but rather at 0.05 (# pairs possible). In the example above, there are six pairs possible. So, the significance level should be =

39 8 TWO-RELATED-SAMPLE TESTS The Two-Related-Samples Tests procedure compares the distributions of two variables. The nonparametric tests for two related samples allow you to test for differences between paired scores when you cannot (or would rather not) make the assumptions required by the paired-samples t test. Procedures are available for testing nominal, ordinal, or scale variables. 8.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Although no particular distributions are assumed for the two variables, the population distribution of the paired differences is assumed to be symmetric. Data: Use numeric variables that can be ordered. 8.2 THE WILCOXON SIGNED-RANKS TEST The Wilcoxon signed-ranks test is a non-parametric version of the paired samples t-test. This is used when you have non-parametric data for one group of people measured over two time periods, or two different conditions, where a dependency exists between two measures and the test must account for this dependency. The question we want to answer is: Is there a difference in BMI measurements before and after exposure? Go to Analyse Nonparametric Tests 2 Related Samples.Click on BMI and BMI2 in the list on the left, then click the arrow button to add them to the box on the right (Paired Variables). 37

40 Click OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the p-value. 38

41 Second table (Test Statistics): The results of the non-parametric paired samples test are displayed here. The p-value (Asymp.Sig.) is<0.001, implying that there is a difference in BMI depending on exposure. 8.3 THE SIGN TEST The sign test, like the Wilcoxon signed-ranks test isa nonparametric statistic that can be used with ordinally (or above) scaled dependent variable when the independent variable has two levels and the participants have been matched or the samples are correlated. Thus, itis useful when a t-test cannot be employed because its assumptions have been violated. The sign test uses only directional information while the Wilcoxon test uses both direction and magnitude information. Thus the Wilcoxon test is more powerful statistically than the sign test. However, the Wilcoxon test assumes that the difference between pairs of scores is ordinally scaled, and this assumption is difficult to test. We repeat the above test in Section 8.2 using the sign test. Go to Analyse Nonparametric Tests 2 Related Samples.Do the same as you did in Section 8.2, but this time deselect Wilcoxon, and select Sign as the test type as it is shown below. Click OK to proceed. Your output screen should look like the one below: 39

42 Interpreting the output First table (Frequencies): presents figures used to calculate the p-value. Second table (Test Statistics): The results of the non-parametric paired samples test are displayed here. The p-value (Asymp.Sig.)is0.010, implying that there is a difference in BMI depending on exposure. 8.4 THE MCNEMAR TEST The McNemar method tests the null hypothesis that binary responses are unchanged. As with the Wilcoxon test, the data may be from a single sample measured twice or from two matched samples. The McNemar test is particularly appropriate with nominal or ordinal test variables. The question we want to answer is: Is there a difference in re-admission before and after intervention? Go to Analyse Nonparametric Tests 2 Related Samples.Click on Before and After in the list on the left. Deselect Wilcoxon, and select McNemar as the test type and then click OK. Your output screen should look like the one below: 40

43 Interpreting the output First table (Re-admission before intervention & Re-admission after intervention): presents figures used to calculate the p-value. Second table (Test Statistics): The results of the non-parametric paired samples test are displayed here. The p-value (Asymp.Sig.)is 0.541, implying that there is no difference in re-admission depending on intervention, this is, the readmission has not changed after intervention. 41

44 9 MULTIPLE-RELATED-SAMPLE TESTS The Tests for Several Related Samples procedure compares the distributions of two or more variables. The nonparametric tests for multiple related samples are useful alternatives to a repeated measures analysis of variance. They are especially appropriate for small samples and can be used with nominal or ordinal test variables. 9.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Nonparametric tests do not require assumptions about the shape of the underlying distribution. Use dependent, random samples. Data: Use numeric variables that can be ordered. 9.2 FRIEDMAN TEST The Friedman procedure tests the null hypothesis that multiple ordinal responses come from the same population. As with the Wilcoxon test for two related samples, the data may come from repeated measures of a single sample or from the same measure from multiple matched samples. An insurance group is evaluating four health care plans for customers. The fifty patients are asked to rank the plans by how much they would prefer to accept them. The question we want to answer is: Is there a difference in preference for the four health care plans? Go to Analyse Nonparametric Tests K Related Samples.Click on Plan 1-4 in the list on the left, and then click the arrow button to add them to the box on the right (Test Variables) as it is shown below. Click OK. Your output screen should look like the one below: 42

45 Interpreting the output First table (Ranks): presents figures used to calculate the p-value. Second table (Test Statistics): The results of the non-parametric Friedman test are displayed here. The p- value (Asymp.Sig.) is <0.001, implying that there is difference in preference for health care plans, that is, the fifty patients do not have equal preference for all four health care plans. 9.3 KENDALL S W TEST The Kendall s W test is referred to the normalization of the Friedman statistic. Kendall s W is used to assess the trend of agreement among the respondents. Kendall s W ranges from 0 to 1. The value 1 refers to the complete agreement among/between the raters, and value 0 refers to no agreement. Go to Analyse Nonparametric Tests K Related Samples.Click on Plan 1-4 in the list on the left, and then click the arrow button to add them to the box on the right (Test Variables). This time deselect Friedman, but select Kendall s W instead. Click OK to proceed. Your output screen should look like the one below: 43

46 Interpreting the output First table (Ranks): presents figures used to calculate the p-value. Second table (Test Statistics): The results of the non-parametric Kendall s W test are displayed here. Kendall's Coefficient of Concordance is 0.281, with Chi-square being and degrees of freedom being 3. The p-value (Asymp.Sig.)is<0.001, implying that the patients preferences are not statistically concordant (p<0.001), and the test rejected the hypothesis. That is to say that the level of preference for the four health care plans among 50 patients appears different. 9.4 COCHRAN Q TEST The Cochran Q procedure tests the null hypothesis that multiple related proportions are the same, that is, used for variables are dichotomous with the same values. The Cochran test is a multivariate extension of the McNemar test used for two related samples. Fifty patients are asked to perform five tasks on the site, all of which are designed to be equally easy. The question we want to answer is: Is there a difference in success rates of the 5 tasks? 44

47 Go to Analyse Nonparametric Tests k Related Samples.Click on Task1, Task2 and Task5 in the list on the left and then click the arrow button to add five of them to the box on the right (Test Variables). Deselect Friedman, and select Cochran s Q as the test type and then click Statistics. Tick Descriptive and then click Continue. Click OK to proceed. Your output screen should look like the one below: 45

48 Interpreting the output First table (Descriptive Statistics): presents basic statistics of the 5 tasks. Means here stand for the proportions of users who succeeded at each task. Second table (Frequencies): presents figures used to calculate the p-value. Third table (Test Statistics): The results of the non-parametric Cochran s Q test are displayed here. Cochran s Qis 0.985, with degrees of freedom being 4. The p-value (Asymp.Sig.) is 0.912, implying that all tasks have an equal number of successes, that is, there is no significant difference in the success rates among five tasks completed by fifty patients, to answer our question. 46

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

SPSS TUTORIAL & EXERCISE BOOK

SPSS TUTORIAL & EXERCISE BOOK UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS

More information

The Statistics Tutor s Quick Guide to

The Statistics Tutor s Quick Guide to statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

Instructions for SPSS 21

Instructions for SPSS 21 1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples Statistics One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples February 3, 00 Jobayer Hossain, Ph.D. & Tim Bunnell, Ph.D. Nemours

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

SPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012

SPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012 SPSS: AN OVERVIEW Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012 The abbreviation SPSS stands for Statistical Package for the Social Sciences and is a comprehensive system

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

An SPSS companion book. Basic Practice of Statistics

An SPSS companion book. Basic Practice of Statistics An SPSS companion book to Basic Practice of Statistics SPSS is owned by IBM. 6 th Edition. Basic Practice of Statistics 6 th Edition by David S. Moore, William I. Notz, Michael A. Flinger. Published by

More information

Difference tests (2): nonparametric

Difference tests (2): nonparametric NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Descriptive and Inferential Statistics

Descriptive and Inferential Statistics General Sir John Kotelawala Defence University Workshop on Descriptive and Inferential Statistics Faculty of Research and Development 14 th May 2013 1. Introduction to Statistics 1.1 What is Statistics?

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Statistics for Sports Medicine

Statistics for Sports Medicine Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Analysis of categorical data: Course quiz instructions for SPSS

Analysis of categorical data: Course quiz instructions for SPSS Analysis of categorical data: Course quiz instructions for SPSS The dataset Please download the Online sales dataset from the Download pod in the Course quiz resources screen. The filename is smr_bus_acd_clo_quiz_online_250.xls.

More information

SPSS Notes (SPSS version 15.0)

SPSS Notes (SPSS version 15.0) SPSS Notes (SPSS version 15.0) Annie Herbert Salford Royal Hospitals NHS Trust July 2008 Contents Page Getting Started 1 1 Opening SPSS 1 2 Layout of SPSS 2 2.1 Windows 2 2.2 Saving Files 3 3 Creating

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals

More information

ADD-INS: ENHANCING EXCEL

ADD-INS: ENHANCING EXCEL CHAPTER 9 ADD-INS: ENHANCING EXCEL This chapter discusses the following topics: WHAT CAN AN ADD-IN DO? WHY USE AN ADD-IN (AND NOT JUST EXCEL MACROS/PROGRAMS)? ADD INS INSTALLED WITH EXCEL OTHER ADD-INS

More information

Table of Contents. Preface

Table of Contents. Preface Table of Contents Preface Chapter 1: Introduction 1-1 Opening an SPSS Data File... 2 1-2 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 1-3 Reading Non-SPSS Files... 6 o Convert

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

SPSS Guide How-to, Tips, Tricks & Statistical Techniques SPSS Guide How-to, Tips, Tricks & Statistical Techniques Support for the course Research Methodology for IB Also useful for your BSc or MSc thesis March 2014 Dr. Marijke Leliveld Jacob Wiebenga, MSc CONTENT

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Data Analysis for Marketing Research - Using SPSS

Data Analysis for Marketing Research - Using SPSS North South University, School of Business MKT 63 Marketing Research Instructor: Mahmood Hussain, PhD Data Analysis for Marketing Research - Using SPSS Introduction In this part of the class, we will learn

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

IBM SPSS Statistics 20 Part 1: Descriptive Statistics CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate 1 Chapter 13 Chi-Square This section covers the steps for running and interpreting chi-square analyses using the SPSS Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running

More information

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

Introduction to Statistics with SPSS (15.0) Version 2.3 (public) Babraham Bioinformatics Introduction to Statistics with SPSS (15.0) Version 2.3 (public) Introduction to Statistics with SPSS 2 Table of contents Introduction... 3 Chapter 1: Opening SPSS for the first

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

SPSS Introduction. Yi Li

SPSS Introduction. Yi Li SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information