Nonparametric Tests Using SPSS


 Linette McCoy
 4 years ago
 Views:
Transcription
1 Nonparametric Tests Using SPSS Statistical Package for Social Sciences Jinlin Fu January 2016 Contact Medical Research Consultancy Studio Australia
2 Contents 1 INTRODUCTION UNIVARIATE LOGISTIC REGRESSION ASSUMPTIONS AND DATA REQUIREMENTS TESTING FOR NORMALITY MULTIPLE LOGISTIC REGRESSION TESTING FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES FISHER S EXACT TEST FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES TESTING A THEORETICAL MODEL (GOODNESS OF FIT) USING CHISQUARE FURTHER ANALYSIS USING CHISQUARE BINOMIALTESTS USING CHISQUARE RUNS TEST ASSUMPTIONS AND DATA REQUIREMENTS COMFIRMATION OF APPROPRIATE CUT POINTS RUNS TEST ONESAMPLE TEST ASSUMPTIONS AND DATA REQUIREMENTS ONESAMPLE KOLMOGOROVSMIRNOV TEST TWOINDEPENDENTSAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS TWOIINDEPENDENTSAMPLE MANNWHITNEY AND WILCOXON TESTS TWOINDEPENDENTSAMPLE KOLMOGOROVSMIRNOV TEST MULTIINDEPENDENTSAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS KRUSKALLWALLIS TEST THE MEDIAN TEST POST HOC TESTS TWORELATEDSAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS WILCOXON SIGNEDRANKS TEST THE SIGN TEST MCNEMAR TEST MULTIPLERELATEDSAMPLE TESTS ASSUMPTIONS AND DATA REQUIREMENTS FRIEDMAN TEST KENDALL S WTEST COCHRAN S QTEST NONPARAMETRIC CORRELATIONS... 错 误! 未 定 义 书 签 10.1 ASSUMPTIONS AND DATA REQUIREMENTS SPEARMAN S RANK CORRELATION KENDALL S TAU RANK CORRELATION... 47
3 1 INTRODUCTION Regression methods have an integral component of any data analysis concerned with describing the relationship between a response (or outcome or dependent) variable and one or more explanatory (predictor or independent) variables, or covariates. It is often the case that the outcome variable is discrete, taking on two or more possible values. Over the last two decades the logistic regression has become, in many fields, the standard method of analysis in this situation. Logistic regression allows one to predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, dichotomous, or a mix. Because of its popularity in the health sciences, the discrete outcome in logistic regression is often disease/no disease. For example, can presence or absence of hay fever be diagnosed from geographic area, season, degree of nasal stuffiness, and body temperature? Logistic regression has no assumptions about the distributions of the predictor variables; in logistic regression, the predictors do not have to be normally distributed, linearly related, or of equal variance within each group. Unlike multiway frequency analysis, the predictors do not need to be discrete; the predictors can be any mix of continuous, discrete and dichotomous variables, Unlike multiple regression analysis, which also has distributional requirements for predictors, logistic regression cannot produce negative predicted probabilities. There may be two or more outcomes (groups) in logistic regression. If there are more than two outcomes, they may or may not have order (e.g., no hay fever, moderate hay fever, severe hay fever). Logistic regression emphasizes the probability of a particular outcome for each case. For example, it evaluates the probability that a given person has hay fever, given that person's pattern of responses to questions about geographic area, season, nasal stuffiness, and temperature. Logistic regression analysis is especially useful when the distribution of responses on the dependent variable is expected to be nonlinear with one or more of the independent variables. Because the model produced by logistic regression is nonlinear, the equations used to describe the outcomes are slightly more complex than those for multiple regression. The outcome variable, Ŷ, is the probability of having one outcome or another based on a nonlinear function of the best linear combination of predictors; with two outcomes: Ŷ = 1+ Where Ŷ. is the estimated probability that the ith case (i = 1,..., n) is in one of the categories and u is the usual linear regression equation: = with constant C, coefficients β,, and predictors, X for k predictors (j = 1, 2,..., k). This linear regression equation creates the logit or log of the odds: Ŷ 1 Ŷ = + That is, the linear regression equation is the natural log ( ) of the probability of being in one group divided by the probability of being in the other group. For example, that is the natural log of the probability of being in the disease group divided by the probability of being in the nondisease group. 1
4 Logistic regression can also be used to fit and compare models. The simplest (and worstfitting) model includes only the constant and none of the predictors. The most complex (and "best"fitting) model includes the constant, all predictors, and, perhaps, interactions among predictors. Often, however, not all predictors (and interactions) are related to the outcome. There searcher uses goodnessoffit tests to choose the model that does the best job of prediction with the fewest predictors. 2
5 2 UNIVARIATE LOGISTIC REGRESSION Logistic regression is useful for situations in which you want to be able to predict the presence or absence of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear regression model but is suited to models where the dependent variable is dichotomous. Logistic regression coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic regression is applicable to a broader range of research situations than discriminant analysis. 2.1 ASSUMPTIONS AND DATA REQUIREMENTS ASSUMPTIONS: Logistic regression does not rely on distributional assumptions. However, the solution may be more stable if selected predictors have a multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among the predictors can lead to biased estimates and inflated standard errors. The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, "high IQ" versus "low IQ"), you should consider using linear regression to take advantage of the richer information offered by the continuous variable itself. DATA: The dependent variable should be dichotomous. Independent variables can be interval level or categorical; if categorical, they should be dummy or indicator coded (there is an option in the procedure to recode categorical variables automatically). The question we want to answer is: what is the average level of weight in the study sample? To what extent does the weight measurement vary? We will now examine the variable Weight. From the toolbar menus, select Analyse Descriptive Statistics Explore. Add Weight to the Dependent List using the arrow button. Your screen should look like the one below: 3
6 Click OK. Your output screen should look like the output on the next page: Interpreting the output First table (Case Processing Summary): This provides the total number and percentage of observations and any missing values. Second table (Descriptives): This table lists all statistics used to describe the variable, Weight. We can see that the mean weight of patients is 60.9 kg, with standard deviation of 14.4 kg and 95% confidence interval of 56.8 to 65.0kg. We can also have median weight of 59.6kg with IQR of 11.0k, minimum (43.0kg), maximum (136.4kg) and range (93.4kg). The second table also includes two statistics, Skewness and Kurtosis, which are used to examine the shape of distribution curve. Skewness and Kurtosis are and respectively in the second table of the outputs. The skewness and kurtosis statistics are far from 0. This is strong evidence that distribution of Weight is not regarded as normal distribution. 4
7 Definition of Skewness: A measure of the asymmetry of a distribution. The normal distribution is symmetric and has a skewness value of 0. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail. As a guideline, a skewness value more than twice its standard error is taken to indicate a departure from symmetry. Definition of Kurtosis: A measure of the extent to which observations cluster around a central point. For a normal distribution, the value of the kurtosis statistic is zero. Positive kurtosis indicates that the observations cluster more and have longer tails than those in the normal distribution, and negative kurtosis indicates that the observations cluster less and have shorter tails. 2.2 TESTING FOR NORMALITY It is a good habit to examine the distributions of the continuous variables of interest before we start analysing on them. It helps us choose the right statistical methods to perform the analyses we want by doing so. Many statistical tests require that one or more variables are normally distributed. If a variable is normally distributed then parametric tests apply. If it is not the case, it is impropriate to employ parametric tests because violation of normal distribution of the variable may result in false outcomes. So, if a variable is nonnormally distributed, nonparametric tests, which is as valuable as parametric ones such as ttest and ANOVA, apply. The question we want to answer is: Is the variable Weight normally distributed? To check that this is the case, select Analyse Descriptive Statistics Explore. This time we still select Weight into the Dependent List and then click on Plots. Check that Stemandleaf is not selected and that Histogram is selected, and then select Normality plots with tests. 5
8 Click Continue and OK. Your output screen should look like the output below: Interpreting the output First table (Descriptives): This table is exactly the same as the one in 2.2. Second table (Tests of Normality): The table contains two formal tests for normality: the Kolmogorov Smirnov and ShapiroWilk tests. The KolmogorovSmirnov test is only used for datasets with a large number of observations (i.e. > 5000). The ShapiroWilk significance level (pvalue, labelled as Sig.) for Weight is less than 0.001, which is significant, and the histogram (below) looks to be nonnormally distributed. From the output we can see that the variable Weight is nonnormally distributed. 6
9 First graph (Histogram):This graph is a visual summary of the distribution of values. The overlay of the normal curve helps you to assess the skewness and kurtosis. The below histogram does show that the distribution is not symmetric, but left skewed. Second graph (Box Plot): The second graph is a box plot. Outliers are identified with a star sign *. The yield has two outlying values, labeled27 and 42. The label refers to the row number in the Data Editor where that observation is found. 7
10 3 ANALYSING CATEGORICAL DATA HYPOTHESIS TESTING USING CHISQUARE 3.1 TESTING FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES The Chisquare analysis can be used to determine whether there is a dependency between two categorical variables. The question we want to answer is: Is the level of exposure independent of gender? Go to Analyse Descriptive Statistics Crosstabs. Put the variable Exposure into the Row box and Gender into the Column box. Click the Statistics button and select Chisquare. and Continue. 8
11 Click the Cells button and select Percentages: Row. Click Continue and OK. Your output screen should look like the one below: 9
12 The results of the Chisquare test do not depend on whether you place Gender or Exposure in the rows or columns these can be switched around. However, your interpretation of the table tends to depend on the variables you have designated to be the rows and the columns. By custom, the variable you are interested in is designated to the rows. So in this example, our interest is in the level of exposure and we are investigating level of exposure by gender. Exposure is therefore designated to the row variable. Interpreting the output First table (Case Processing Summary): This provides the total number and percentage of observations and any missing values. Second table (Exposure * Gender Cross tabulation): This is a crosstabulation displaying the two variables of interest, in this case, Exposure by Gender. Both the observed values and the row percentages are presented. This table is of benefit when the two variables are dependent, to interpret what the dependency between the two variables is. Third table (ChiSquare Tests): This shows the results of the Pearson Chisquare test. The pvalue that responds to the question of independence is in the third column (Asymp. Sig (2sided)) in the top row, and for this test was This indicates that the two variables (Exposure and Gender) are independent. Warning: The Chisquare test is not appropriate if the expected values are too small. SPSS will issue a warning below the third table if any of the cells have an expected value of <5. A guideline that is often used is that we should not have any cells with expected values less than 1 and at most one or two cells with expected values less than 5. Essentially you do not have enough data to reliably perform the Chisquare test, given the number of rows and columns in the table, and you do not have enough data upon which to make any reliable conclusions. In these instances you either (i) increase your sample size, (ii) reduce the number of the rows and/or columns, or (iii) use a Fisher s exact test (as follows) if there are only two categories for each variable. It is important to note that columns and/or rows can only be reduced if it is theoretically valid to do so. If you require a Fisher s exact test with more than two categories in either of the variables, please contact DM&A. 3.2 FISHER S EXACT TEST FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES SPSS outputs the results of Fisher s exact test within the Chisquare output (Section 3.1) when each variable only has two categories. If this is the case, and the expected values of each cell are too small, use the Fisher s exact pvalue instead of the Pearson ChiSquare. In the example below, the Fisher s exact pvalue (use the 2 sided value) is This indicates that the two variables (Insurance status and Gender) are independent. 10
13 3.3 TESTING A THEORETICAL MODEL (GOODNESS OF FIT) USING CHISQUARE The question we want to answer is: Are there equal numbers of males and females? To perform this test in SPSS, go to Analyse Nonparametric Tests ChiSquare.Put Gender in the righthand Test Variable List box. Your screen should look like the one below: 11
14 In the Expected Values section, All categories equal is already selected. Click OK. Your output screen should look like the one below: 12
15 Interpreting the output First table (Gender): Provides the values expected under the assumption of equal numbers by gender. The residual values are calculated as observed  expected. Second table (Test Statistics): This provides the Pearson Chisquare statistic, the degrees of freedom and the pvalue (labelled Asymp. Sig.); here the pvalue is This indicates that there are equal numbers of males and females. Now suppose that a claim has been made that there are twice as many male patients as female patients. We can use the same procedure as before, this time choosing our own expected values. If there are twice as many males as females then the ratio of males to females is 2:1 and so onethird of the patients are females and twothirds are males. Given that there are 50 patients in total, we would then expect: 50/ females and 16.7 x males. We enter these as expected values, entering 33.3 first since males correspond to the value 0. Go to Analyse Nonparametric Tests ChiSquare and select Values. Type 33.3 and click Add, then type 16.7 and click Add. Note that the expected values must be in the same order as the categories. Your screen should look like the one below: Click OK. Your output screen should look like the one below: 13
16 Interpreting the output First table (Gender): Provides the values expected under the assumption of twice as many males as females. The residual values are calculated as observed  expected. Second table (Test Statistics): This provides the Pearson Chisquare statistic, the degrees of freedom and the pvalue (labelled Asymptotic Significance); here the pvalue is This indicates that the numbers of males and females do not match the expected values that we entered. Note: You can do Goodness of Fit for a few variables (e.g. Gender, Exposure and Group) in the same procedure as selecting all the variables of interest as shown in the following window:. 14
17 Click OK. Your output screen should look like the one below: 3.4 FURTHER ANALYSIS USING CHISQUARE We have known how to do independent test between two categorical variables using Chisquare (2.1). If we are asked, Does insurance status change the situation, the level of exposure independent of gender?, we have to further crossclassify by whether they had public or private insurance. Go to Analyse Descriptive Statistics Crosstabs. Put the variable Exposure into the Row box and Gender into the Column box, and this time also put Insurance into the Layer 1 of 1 box. Do other selections as you did in Section 2.1, and then click OK. 15
18 Your output screen should look like the one below: 16
19 Interpreting the output First table (Exposure * Gender * Insurance Cross tabulation): This is a crosstabulation displaying the two variables of interest, in this case, Exposure by Gender at different Insurance levels. Both the observed values and the row percentages are presented. This table is of benefit when the two variables are dependent, to interpret what the dependency between the two variables is at each level of Insurance. Second table (ChiSquare Tests): This shows the results of the Pearson Chisquare test. The pvalues that respond to the question of independence are in the third column (Asymp. Sig (2sided)) in the top row, and for this test were for public insurance holders and for private insurance holders. This indicates that the two variables (Exposure and Gender) are independent for public, but not for private insurance holders. Warning: Because 83.3% cells have expected count less than 5, you either (i) increase your sample size, (ii) reduce the number of the rows and/or columns (It is important to note that columns and/or rows can only be reduced if it is theoretically valid to do so.), or (iii) use a Fisher s exact test if there are only two categories for each variable in case you are still trying to use SPSS (version 15 or less). If you require a Fisher s exact test with more than two categories in either of the variables, you can go to IBM website for solution (You can find that IBM has issued a SPSS Exact Tests program on its website), or please contact DM&A. 3.5 BINOMIAL TESTS USING SPSS The Binomial Test procedure compares the observed frequencies of the two categories of a dichotomous variable to the frequencies that are expected under a binomial distribution with a specified probability parameter. A dichotomous variable is a variable that can take only two possible values: yes or no, true or false, 0 or 1, and so on. If the variables are not dichotomous, you must specify a cut point. The cut point assigns cases with values that are greater than the cut point to one group and assigns the rest of the cases to another group. In general population, the prevalence of a disease is about 21% and we are going to test if our sample patients have a higher rate of the disease. The question we want to answer is: Is the disease prevalence in the study sample same as that in general population? Go to Analyse Nonparametric Tests Binomial, and then put Disease into Test Variable List box. By default, the probability parameter for both groups is 0.5, although this may be changed. To change the probability, you enter a Test proportion for the first group (for this exercise we enter 0.21). The probability for the second group is equal to 1 minus the probability for the first group (10.21=0.79). Click on Options and then tick Descriptive under the Statistics. Click Continue and then OK to run the procedure. The outputs are then displayed in the Output window as it shows on next page. 17
20 Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of the variable, Disease. It tells the number of the cases (N=50), the proportion of disease in the sample patients (Mean=0.28), standard deviation of the proportion (Std. Deviation =0.454),and the possible smallest and biggest value of the proportion (Minimum=0 and Maximum=1). Second table (Binomial Test): This shows the results of the Binomial test. The first three columns show the category, number of cases and observed proportions at each level of Disease. The fourth column gives the test or reference proportion that you have entered. The pvalue that responds to the question of whether prevalence of disease in the sample patients (0.28 or 28%) is different from that of population (0.21 or 21%) is in the fifth column [Asymp. Sig. (1tailed)] in the top row, and for this test it was This indicates that there is no statistically significant difference in proportions between sample patients and population. We can also use this test to check the proportions at levels of a variable. Say, for the above test we want to know if it is still true between males and females separately. Go to Data Split File and click Compare groups and Sort the file by grouping variables. Put Gender into Group based on box as it shown on the next page and then click OK. 18
21 Repeat the procedure as we did above in Binomial Test. Your output screen should look like the one below: Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of Disease by Gender. Proportions of disease in male and female patients were 0.26 and 0.30, or 26% and 30%, respectively. Second table (Binomial Test): This shows the results of the Binomial test by Gender. The pvalues that respond to the question of whether prevalence of disease in the sample patients (0.26 in males and 030 in females) is different from that of population (0.21for both males and females) is in the fifth [Exact Sig. (1tailed) ] or sixth column [Asymp. Sig. (1tailed)] in the top row, and for this test it was for males and for females. This indicates that there is no statistically significant difference in proportions between sample patients and population, no matter it is for males or for females. 19
22 When the number of cases in one category of gender is less than 30, Exact Gig.(exact significance level) will be displayed instead of Asymp. Sig. for that category. Most often we also want to know whether the patients with disease are heavier in weight, that is, whether the patients were more obese in terms of BMI, than those without disease. So, the question is: whether those who had disease tend to be above or below the cutoff value of 25.0 for overweight and obesity. First, we split our data by doing: Go to Data Split File and click Compare groups and Sort the file by grouping variables. Put Disease into Group based on box and then click OK. Next, we are going to employ Binomial test to find an answer to the above question. Go to Analyse Nonparametric Tests Binomial, and then put BMI into Test Variable List box. Enter in the Cut Point box and keep Test proportion box as default. (I will explain the reason why I entered for Cut point instead of 25 later in output section).click Options to select Quartiles, and then click Continue. Click OK to run the test. Your output screen should look like the one below: 20
23 Interpreting the output First table (Descriptive Statistics): This is a table displaying the basic information of BMI by Disease. The table tells us the number of cases (N) and percentiles [including 25 th, 50 th (median) and 75 th percentiles] in each category of Disease(Yes and No). Second table (Binomial Test): This shows the results of the Binomial test. BMI was classified into two subgroups based on cutoff value (25.0) as shown in the first column, namely Group 1 contains all cases with BMI less than 25.0 and Group 2 with BMI equal to or greater than (if we enter 25 as cut point in binomial Test window, here group 2 will exclude the value of 25 in BMI; but if we enter 24.99, in group 2 the smallest value will be greater than 24.99, implying the smallest value in group 2 is equal to 25.0 because in the data set the value below 25.0 is 24.9 which is smaller than So, in group 2 all the BMI values will be equal to or greater than 25.0 as we expected).this table also shows the number of cases (N) and the proportions (Observed Prop.) in each group under the different Disease status. Proportions of cases with BMI equal to or greater than 25.0 (Group 2) in Disease (Yes) and nondisease (No)groups are 0.21 vs 0.44, respectively, but the test did not tell us if there was a statistically significant difference between these two proportions. The p values were shown in fifth [Asymp. Sig. (1tailed)]and sixth [Exact Sig. (1tailed) ] columns. The above pvalues indicate that whether proportions of cases between two BMI groups are different at each Disease status. The outputs show that the proportions of cases divided by BMI cutoff value of 25.0 were found significant difference neither in diseased (with p =0.608) nor in nondiseased cases (with p =0.057), that is, among the diseased there is no evidence to say that they are more overweighted or obsess, based on current data available. 21
24 4 RUNS TEST The Runs Test procedure tests whether the order of occurrence of two values of a variable is random. A run is a sequence of like observations. A sample with too many or too few runs suggests that the sample is not random. 4.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Nonparametric tests do not require assumptions about the shape of the underlying distribution. Use samples from continuous probability distributions. Data: The variables must be numeric. To convert string variables to numeric variables, use the Automatic Recode procedure, which is available on the Transform menu. Many statistical tests assume that the observations in a sample are independent; in other words, that the order in which the data were collected is irrelevant. If the order does matter, then the sample is not random, and you cannot draw accurate conclusions about the population from which the sample was drawn. Therefore, it is prudent to check the data for a violation of this important assumption. You can use the Runs Test procedure to test whether the order of values of a variable is random. The procedure first classifies each value of the variable as falling above or below a cut point and then tests to ensure that there is no order to the resulting sequence. 4.2 CONFIRMATION OF APPROPRIATE CUT POINTS The cut point is based either on a measure of central tendency (mean, median, or mode) or a custom value. You can obtain descriptive statistics and/or quartiles of the test variable. Go to Graphs Chart Builder...and then in the Choose from box select Histogram gallery and choose the first Simple Bar. Select variable Status as the x axis and click OK. The Bar Chart of the Test Variable appears on the output window as shown below. The scale for Status theoretically ranges from 0 to 15, where 0 = highly in poor health and 20 = highly in good health. The actual range of scores is narrower, dispersing from a low of 6 to a high of 14. The Histogram shows that Status is nonnormally distributed, so that we choose median as the cut point. 22
25 10 8 Frequency 6 4 Mean =9.52 Std. Dev. =2.332 N = Status 4.3 RUNS TEST The question we want to answer is: Is the order of values of Status random? Go to Analyse Nonparametric Tests Runs. The median is selected by default, so keep it as it is. In th etest Variable List box we put in Status and then click Options. Select Descriptive and Quartiles, and then click Continue. Back to the Runs Test dialog box, and then click OK to run the test. Your output screen appears like the one below: Please note: before you run the procedure, you have to make sure that your records are sorted descendingly by their study orders (study ID, the orders by which the participants actually entered the study). 23
26 Interpreting the output First table (Descriptive Statistics):The statistics table will help you understand more about the distribution of Status by displaying the basic information of it. While the default table is very wide, you can easily pivot it to column format by following the steps below: Doubleclick the table to activate it. From the Viewer menus choose :Pivot Transpose Rows and Columns The table then transpose from row into column as it is shown below: 24
27 Second table (Runs Test): This shows the results of the Runs test. The test value is used as a cut point to dichotomize the sample. In this table, the cut point is the sample median. Of 50patients, 21 scored below the median (Cases <Test Value). Think of them as the "negative" cases. The remaining 29patients(Cases >=Test Value) scored at or above the median. Think of them as the "positive" cases. The next statistic is a count of the observed runs (Number of Runs)in the test variable. A run is defined as a sequence of cases on the same side of the cut point. If the order of the Status is purely random with respect to the median value, you would expect about 26 runs across these 50 cases. Because you observed only 2 runs, the Z statistic is negative. The 2tailed significance value [Asymp. Sig. (2tailed) ]is the probability of obtaining a Z statistic as or more extreme (in absolute value) than the obtained value, if the order of Status above and below the median is purely random. In another word, the 2tailed significance value (here p=<0.201) allows you not to reject the null hypothesis that the order of the Status is random with respect to the higher median value of 9. This is to say that the order of Status is random with a cut point of 9. 25
28 5 ONESAMPLE TEST The OneSample KolmogorovSmirnov Test procedure compares the observed cumulative distribution function for a variable with a specified theoretical distribution, which may be normal, uniform, Poisson, or exponential. The KolmogorovSmirnov Z is computed from the largest difference (in absolute value) between the observed and theoretical cumulative distribution functions. This goodnessoffit test tests whether the observations could reasonably have come from the specified distribution. 5.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: The KolmogorovSmirnov test assumes that the parameters of the test distribution are specified in advance. This procedure estimates the parameters from the sample. The sample mean and sample standard deviation are the parameters for a normal distribution, the sample minimum and maximum values define the range of the uniform distribution, the sample mean is the parameter for the Poisson distribution, and the sample mean is the parameter for the exponential distribution. The power of the test to detect departures from the hypothesized distribution may be seriously diminished. For testing against a normal distribution with estimated parameters, consider the adjusted KS Lilliefors test (available in the Explore procedure). Data: Use quantitative variables (interval or ratio level of measurement). 5.2 ONESAMPLE KOLMOGOROVSMIRNOV TEST The question we want to answer is: Is the variable of Status normally distributed or one of particular distribution? Let us take Status as an example. Go to Analyse Nonparametric Tests 1Sample KS, and then select Status as the test variable to the Test Variable List box. Tick all the four options in the Test Distribution box and then click Options. Tick Descriptive and Quartiles and then click Continue. Finally click OK to run the procedure. 26
29 Your output screen should be expected like the two below: Interpreting the outputs First table (Descriptive Statistics): This table will help you understand more about the distribution of Status in these data by displaying the basic information of Status. The information includes number of cases (N), Mean, Standard Deviation, Minimum, Maximum, Percentiles (25 th, 50 th and 75 th ). Second table (OneSample KolmogorovSmirnov Test): This is the default testtest of normal distribution. This table shows that the Normal distribution is indexed by two parametersthe mean and Standard deviation. The average weight of the sample is about 9.52 with SD of The next three rows fall under the general category Most Extreme Differences. The differences referred to are the largest positive and negative points of divergence between the empirical and theoretical Cumulative distribution functions (CDFs). The first difference value, labelled Absolute, is the absolute value of the larger of the two difference values printed directly below it. This value will be required to calculate the test statistic. The Positive difference is the point at which the empirical CDF exceeds the theoretical CDF by the greatest amount. 27
30 At the opposite end of the continuum, the Negative difference is the point at which the theoretical CDF exceeds the empirical CDF by the greatest amount. The Z test statistic is the product of the square root of the sample size and the largest absolute difference between the empirical and theoretical CDFs. Unlike much statistical testing, a significant result here is bad news. 28
31 The probability of the Z statistic is above 0.05, meaning that the Normal distribution with parameters of 9.52±2.33 is a good fit for the Status. Third and fourth tables (OneSample KolmogorovSmirnov Test 2 and 3): These two tables convey the similar messages to second table. Fifth tables (OneSample KolmogorovSmirnov Test 4): This table shows that the probabilities of the Z statistic are all below 0.05, meaning that the Exponential distribution with a parameters of 9.52 (Mean), is not a good fit for Status. The above outputs indicate that the Status has a good fit of Normal, Uniform and Poisson Distributions, but not of Exponential distribution. 29
32 6 TWOINDEPENDENTSAMPLE TESTS The TwoIndependentSamples Tests procedure compares two groups of cases on one variable. The nonparametric tests for two independent samples are useful for determining whether or not the values of a particular variable differ between two groups. This is especially true when the assumptions of the t test are not met. Suppose we want to know whether height differs between private and public patients. In other words: does the categorical (independent) variable insurance affect the continuous (dependent) variable height? 6.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Use independent, random samples. The MannWhitney U test requires that the two tested samples be similar in shape, that is, the variable you are testing is at least ordinal and that its distribution is similar in both groups. Data: Use numeric variables that can be ordered. We will assume that the independence assumption is met by the design of the experiment. 6.2 TWOINDEPENDENTSAMPLE MANNWHITNEY AND WILCOXON TESTS The MannWhitney and Wilcoxon statistics can be used to test the null hypothesis that two independent samples come from the same population. Their advantage over the independentsamples t test is that MannWhitney and Wilcoxon do not assume normality and can be used to test ordinal variables. Go to Analyse Nonparametric Tests 2 Independent Samples. In the dialogue box add the dependent variable Height to the Test Variable List box. Add the independent variable Insurance to the Grouping Variable box. Click on Define Groups and in Group 1 type 1 and in Group 2 type 2. 30
33 Click Continue and OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. First, each case is ranked without regard to group membership. Cases tied on a particular value receive the average rank for that value (Mean Rank). After ranking the cases, the ranks are summed within groups (Sum of Ranks). Second table (Test Statistics): presents the results of the MannWhitney U test (otherwise known as the Mann WhitneyWilcoxon test or the Wilcoxon ranksum test). The pvalue [Asymp. Sig.(2tailed)] is 0.041, implying that insurance does have an effect on height. Note: you can do this test for several variables in the same run by adding all them in the same box as Height. Outputs will be displayed in different panels by variable names. 31
34 6.3 TWOSAMPLE KOLMOGOROVSMIRNOV TEST The twosample KolmogorovSmirnov test tests the null hypothesis that two samples have the same distribution. It's a very flexible test because no specific shape is assumed for the underlying distribution. However, because the test makes no assumptions, it is sensitive to differences in both location and scale. You may want to center the test variable if you are not interested in location differences; additionally, you may want to standardize the test variable to remove both location and scale. Go to Analyse Nonparametric Tests 2 Independent Samples. In the dialogue box add the dependent variable Height to the Test Variable List box. Add the independent variable Gender to the Grouping Variable box. Do the same as you did in Section 6.2. Deselect MannWhitney U, and select KolmogorovSmirnov Z. Then click OK to proceed. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. Second table (Test Statistics): presents the results of the KolmogorovSmirnov test. The pvalue [Asymp. Sig.(2tailed)] is 0.208, well above 0.05, implying that the distributions of the two yields are not significantly different from each other by that standard. 32
35 7 MULTIINDEPENDENTSAMPLE TESTS The Tests for Several Independent Samples procedure compares two or more groups of cases on one variable. The nonparametric tests for multiple independent samples are useful for determining whether or not the values of a particular variable differ between two or more groups. This is especially true when the assumptions of ANOVA are not met. Suppose we want to know whether BMI differs depending on which year patients were born in. In other words: does the categorical (independent) variable Year_born affect the continuous (dependent) variable BMI? 7.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Use independent, random samples. The KruskalWallis H test requires that the tested samples be similar in shape. Data: Use numeric variables that can be ordered. We will assume that the independence assumption is met by the design of the experiment. 7.2 KRUSKALLWALLIS TEST The KruskalWallis test is a oneway analysis of variance by ranks. It tests the null hypothesis that multiple independent samples come from the same population. Unlike standard ANOVA, it does not assume normality, and it can be used to test ordinal variables. Go to Analyse Nonparametric Tests K Independent Samples. In the dialogue box add the dependent variable BMI to the Test Variable List box. Add the independent variable Year_born to the Grouping Variable box. Click on Define Range and type 1997 for Minimum and 2000 for Maximum. 33
36 Click Continue, Keep the Test Type as it is (default =KruskalWallis) and click OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. Second table (Test Statistics): presents the results of the KruskalWallis test. The pvalue (Asymp.Sig.) is 0.038, implying that the year of birth does have an effect on BMI. 34
37 7.3 THE MEDIAN TEST The median method tests the null hypothesis that two or more independent samples have the same median. It assumes nothing about the distribution of the test variable, making it a good choice when you suspect that the distribution varies by group. Go to Analyse Nonparametric Tests K Independent Samples. Do the same as you did in Section 7.2, but this time deselect KruskalWallis H, and select Median as the test type. Click Options and select Quartiles in the Statistics group. Click Continue to go back to the Tests for Several Independent Samples dialog box and then click OK to run the analysis. Your output screen should look like the one below: 35
38 Interpreting the output First table (Descriptive Statistics):presents numbers of each of the variables of interest and their percentiles. Second table (Frequencies):presents figures used to calculate the pvalue, by Yearborn and cut point (Median) of BMI. Third table (Test Statistics): presents the results of the Median test. The pvalue (Asymp.Sig.) is 0.024, implying that the BMIs are different among the years of birth. 7.4 POST HOC TESTS SPSS doesn t have a convenient tool to do nonparametric post hoc testing. To find where the differences are, use multiple MannWhiney tests to compare each pair of categories. Because there are multiple comparisons here, the pvalue is not longer significant at 0.05, but rather at 0.05 (# pairs possible). In the example above, there are six pairs possible. So, the significance level should be =
39 8 TWORELATEDSAMPLE TESTS The TwoRelatedSamples Tests procedure compares the distributions of two variables. The nonparametric tests for two related samples allow you to test for differences between paired scores when you cannot (or would rather not) make the assumptions required by the pairedsamples t test. Procedures are available for testing nominal, ordinal, or scale variables. 8.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Although no particular distributions are assumed for the two variables, the population distribution of the paired differences is assumed to be symmetric. Data: Use numeric variables that can be ordered. 8.2 THE WILCOXON SIGNEDRANKS TEST The Wilcoxon signedranks test is a nonparametric version of the paired samples ttest. This is used when you have nonparametric data for one group of people measured over two time periods, or two different conditions, where a dependency exists between two measures and the test must account for this dependency. The question we want to answer is: Is there a difference in BMI measurements before and after exposure? Go to Analyse Nonparametric Tests 2 Related Samples.Click on BMI and BMI2 in the list on the left, then click the arrow button to add them to the box on the right (Paired Variables). 37
40 Click OK. Your output screen should look like the one below: Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. 38
41 Second table (Test Statistics): The results of the nonparametric paired samples test are displayed here. The pvalue (Asymp.Sig.) is<0.001, implying that there is a difference in BMI depending on exposure. 8.3 THE SIGN TEST The sign test, like the Wilcoxon signedranks test isa nonparametric statistic that can be used with ordinally (or above) scaled dependent variable when the independent variable has two levels and the participants have been matched or the samples are correlated. Thus, itis useful when a ttest cannot be employed because its assumptions have been violated. The sign test uses only directional information while the Wilcoxon test uses both direction and magnitude information. Thus the Wilcoxon test is more powerful statistically than the sign test. However, the Wilcoxon test assumes that the difference between pairs of scores is ordinally scaled, and this assumption is difficult to test. We repeat the above test in Section 8.2 using the sign test. Go to Analyse Nonparametric Tests 2 Related Samples.Do the same as you did in Section 8.2, but this time deselect Wilcoxon, and select Sign as the test type as it is shown below. Click OK to proceed. Your output screen should look like the one below: 39
42 Interpreting the output First table (Frequencies): presents figures used to calculate the pvalue. Second table (Test Statistics): The results of the nonparametric paired samples test are displayed here. The pvalue (Asymp.Sig.)is0.010, implying that there is a difference in BMI depending on exposure. 8.4 THE MCNEMAR TEST The McNemar method tests the null hypothesis that binary responses are unchanged. As with the Wilcoxon test, the data may be from a single sample measured twice or from two matched samples. The McNemar test is particularly appropriate with nominal or ordinal test variables. The question we want to answer is: Is there a difference in readmission before and after intervention? Go to Analyse Nonparametric Tests 2 Related Samples.Click on Before and After in the list on the left. Deselect Wilcoxon, and select McNemar as the test type and then click OK. Your output screen should look like the one below: 40
43 Interpreting the output First table (Readmission before intervention & Readmission after intervention): presents figures used to calculate the pvalue. Second table (Test Statistics): The results of the nonparametric paired samples test are displayed here. The pvalue (Asymp.Sig.)is 0.541, implying that there is no difference in readmission depending on intervention, this is, the readmission has not changed after intervention. 41
44 9 MULTIPLERELATEDSAMPLE TESTS The Tests for Several Related Samples procedure compares the distributions of two or more variables. The nonparametric tests for multiple related samples are useful alternatives to a repeated measures analysis of variance. They are especially appropriate for small samples and can be used with nominal or ordinal test variables. 9.1 ASSUMPTIONS AND DATA REQUIREMENTS Assumptions: Nonparametric tests do not require assumptions about the shape of the underlying distribution. Use dependent, random samples. Data: Use numeric variables that can be ordered. 9.2 FRIEDMAN TEST The Friedman procedure tests the null hypothesis that multiple ordinal responses come from the same population. As with the Wilcoxon test for two related samples, the data may come from repeated measures of a single sample or from the same measure from multiple matched samples. An insurance group is evaluating four health care plans for customers. The fifty patients are asked to rank the plans by how much they would prefer to accept them. The question we want to answer is: Is there a difference in preference for the four health care plans? Go to Analyse Nonparametric Tests K Related Samples.Click on Plan 14 in the list on the left, and then click the arrow button to add them to the box on the right (Test Variables) as it is shown below. Click OK. Your output screen should look like the one below: 42
45 Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. Second table (Test Statistics): The results of the nonparametric Friedman test are displayed here. The p value (Asymp.Sig.) is <0.001, implying that there is difference in preference for health care plans, that is, the fifty patients do not have equal preference for all four health care plans. 9.3 KENDALL S W TEST The Kendall s W test is referred to the normalization of the Friedman statistic. Kendall s W is used to assess the trend of agreement among the respondents. Kendall s W ranges from 0 to 1. The value 1 refers to the complete agreement among/between the raters, and value 0 refers to no agreement. Go to Analyse Nonparametric Tests K Related Samples.Click on Plan 14 in the list on the left, and then click the arrow button to add them to the box on the right (Test Variables). This time deselect Friedman, but select Kendall s W instead. Click OK to proceed. Your output screen should look like the one below: 43
46 Interpreting the output First table (Ranks): presents figures used to calculate the pvalue. Second table (Test Statistics): The results of the nonparametric Kendall s W test are displayed here. Kendall's Coefficient of Concordance is 0.281, with Chisquare being and degrees of freedom being 3. The pvalue (Asymp.Sig.)is<0.001, implying that the patients preferences are not statistically concordant (p<0.001), and the test rejected the hypothesis. That is to say that the level of preference for the four health care plans among 50 patients appears different. 9.4 COCHRAN Q TEST The Cochran Q procedure tests the null hypothesis that multiple related proportions are the same, that is, used for variables are dichotomous with the same values. The Cochran test is a multivariate extension of the McNemar test used for two related samples. Fifty patients are asked to perform five tasks on the site, all of which are designed to be equally easy. The question we want to answer is: Is there a difference in success rates of the 5 tasks? 44
47 Go to Analyse Nonparametric Tests k Related Samples.Click on Task1, Task2 and Task5 in the list on the left and then click the arrow button to add five of them to the box on the right (Test Variables). Deselect Friedman, and select Cochran s Q as the test type and then click Statistics. Tick Descriptive and then click Continue. Click OK to proceed. Your output screen should look like the one below: 45
48 Interpreting the output First table (Descriptive Statistics): presents basic statistics of the 5 tasks. Means here stand for the proportions of users who succeeded at each task. Second table (Frequencies): presents figures used to calculate the pvalue. Third table (Test Statistics): The results of the nonparametric Cochran s Q test are displayed here. Cochran s Qis 0.985, with degrees of freedom being 4. The pvalue (Asymp.Sig.) is 0.912, implying that all tasks have an equal number of successes, that is, there is no significant difference in the success rates among five tasks completed by fifty patients, to answer our question. 46
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stemandleaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationAn introduction to IBM SPSS Statistics
An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive
More informationSPSS Tests for Versions 9 to 13
SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)
Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationSPSS TUTORIAL & EXERCISE BOOK
UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS
More informationThe Statistics Tutor s Quick Guide to
statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcpmarshallowen7
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NONSTATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationData analysis process
Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationInstructions for SPSS 21
1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3
More informationJanuary 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationEPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST
EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeatedmeasures data if participants are assessed on two occasions or conditions
More informationIBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
More informationTesting for differences I exercises with SPSS
Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the ttest and its nonparametric equivalents in their various forms. In SPSS, all these tests can
More informationTHE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKALWALLIS TEST: The nonparametric alternative to ANOVA: testing for difference between several independent groups 2 NON
More informationIBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationStatistics. Onetwo sided test, Parametric and nonparametric test statistics: one group, two groups, and more than two groups samples
Statistics Onetwo sided test, Parametric and nonparametric test statistics: one group, two groups, and more than two groups samples February 3, 00 Jobayer Hossain, Ph.D. & Tim Bunnell, Ph.D. Nemours
More informationDESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi  110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics
More informationNonparametric TwoSample Tests. Nonparametric Tests. Sign Test
Nonparametric TwoSample Tests Sign test MannWhitney Utest (a.k.a. Wilcoxon twosample test) KolmogorovSmirnov Test Wilcoxon SignedRank Test TukeyDuckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGrawHill/Irwin, 2010, ISBN: 9780077384470 [This
More informationSPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi110 012
SPSS: AN OVERVIEW Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi110 012 The abbreviation SPSS stands for Statistical Package for the Social Sciences and is a comprehensive system
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationOneWay ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 OneWay ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationNonparametric Statistics
Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics
More informationChapter G08 Nonparametric Statistics
G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................
More informationUsing SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
More informationAn SPSS companion book. Basic Practice of Statistics
An SPSS companion book to Basic Practice of Statistics SPSS is owned by IBM. 6 th Edition. Basic Practice of Statistics 6 th Edition by David S. Moore, William I. Notz, Michael A. Flinger. Published by
More informationDifference tests (2): nonparametric
NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationDescriptive and Inferential Statistics
General Sir John Kotelawala Defence University Workshop on Descriptive and Inferential Statistics Faculty of Research and Development 14 th May 2013 1. Introduction to Statistics 1.1 What is Statistics?
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationData exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
More informationStatistics for Sports Medicine
Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationThe ChiSquare Test. STAT E50 Introduction to Statistics
STAT 50 Introduction to Statistics The ChiSquare Test The Chisquare test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationNCSS Statistical Software. OneSample TTest
Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationRankBased NonParametric Tests
RankBased NonParametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs
More informationAnalysis of categorical data: Course quiz instructions for SPSS
Analysis of categorical data: Course quiz instructions for SPSS The dataset Please download the Online sales dataset from the Download pod in the Course quiz resources screen. The filename is smr_bus_acd_clo_quiz_online_250.xls.
More informationSPSS Notes (SPSS version 15.0)
SPSS Notes (SPSS version 15.0) Annie Herbert Salford Royal Hospitals NHS Trust July 2008 Contents Page Getting Started 1 1 Opening SPSS 1 2 Layout of SPSS 2 2.1 Windows 2 2.2 Saving Files 3 3 Creating
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationStatCrunch and Nonparametric Statistics
StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationPaired TTest. Chapter 208. Introduction. Technical Details. Research Questions
Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals
More informationADDINS: ENHANCING EXCEL
CHAPTER 9 ADDINS: ENHANCING EXCEL This chapter discusses the following topics: WHAT CAN AN ADDIN DO? WHY USE AN ADDIN (AND NOT JUST EXCEL MACROS/PROGRAMS)? ADD INS INSTALLED WITH EXCEL OTHER ADDINS
More informationTable of Contents. Preface
Table of Contents Preface Chapter 1: Introduction 11 Opening an SPSS Data File... 2 12 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 13 Reading NonSPSS Files... 6 o Convert
More informationOverview of NonParametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS
Overview of NonParametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical
More informationOnce saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
More informationSPSS Guide Howto, Tips, Tricks & Statistical Techniques
SPSS Guide Howto, Tips, Tricks & Statistical Techniques Support for the course Research Methodology for IB Also useful for your BSc or MSc thesis March 2014 Dr. Marijke Leliveld Jacob Wiebenga, MSc CONTENT
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationHow To Test For Significance On A Data Set
NonParametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A nonparametric equivalent of the 1 SAMPLE TTEST. ASSUMPTIONS: Data is nonnormally distributed, even after log transforming.
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 42 A Note on NonLinear Relationships 44 Multiple Linear Regression 45 Removal of Variables 48 Independent Samples
More informationBivariate Statistics Session 2: Measuring Associations ChiSquare Test
Bivariate Statistics Session 2: Measuring Associations ChiSquare Test Features Of The ChiSquare Statistic The chisquare test is nonparametric. That is, it makes no assumptions about the distribution
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationCome scegliere un test statistico
Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0195086074) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table
More informationTwoSample TTests Assuming Equal Variance (Enter Means)
Chapter 4 TwoSample TTests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when the variances of
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationData Analysis for Marketing Research  Using SPSS
North South University, School of Business MKT 63 Marketing Research Instructor: Mahmood Hussain, PhD Data Analysis for Marketing Research  Using SPSS Introduction In this part of the class, we will learn
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationIBM SPSS Statistics 20 Part 1: Descriptive Statistics
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. JaeWan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationNAG C Library Chapter Introduction. g08 Nonparametric Statistics
g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationChapter 13. ChiSquare. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate
1 Chapter 13 ChiSquare This section covers the steps for running and interpreting chisquare analyses using the SPSS Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running
More informationIntroduction to Statistics with SPSS (15.0) Version 2.3 (public)
Babraham Bioinformatics Introduction to Statistics with SPSS (15.0) Version 2.3 (public) Introduction to Statistics with SPSS 2 Table of contents Introduction... 3 Chapter 1: Opening SPSS for the first
More informationNONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)
NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of pvalues classical significance testing depend on assumptions
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationSPSS Introduction. Yi Li
SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information