Table of Contents. Preface



Similar documents
Using SPSS, Chapter 2: Descriptive Statistics

Introduction Course in SPSS - Evening 1

Chapter 23. Inferences for Regression

IBM SPSS Statistics for Beginners for Windows

4. Descriptive Statistics: Measures of Variability and Central Tendency

Directions for using SPSS

An SPSS companion book. Basic Practice of Statistics

Introduction to SPSS 16.0

An introduction to IBM SPSS Statistics

SPSS Manual for Introductory Applied Statistics: A Variable Approach

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

January 26, 2009 The Faculty Center for Teaching and Learning

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Data Analysis Tools. Tools for Summarizing Data

The Dummy s Guide to Data Analysis Using SPSS

Describing, Exploring, and Comparing Data

Exercise 1.12 (Pg )

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Data analysis process

SPSS Explore procedure

GETTING YOUR DATA INTO SPSS

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

SPSS TUTORIAL & EXERCISE BOOK

Learning SPSS: Data and EDA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Chapter 7: Simple linear regression Learning Objectives

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

Data exploration with Microsoft Excel: analysing more than one variable

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Using Excel for descriptive statistics

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Using Excel for Statistics Tips and Warnings

Diagrams and Graphs of Statistical Data

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

MTH 140 Statistics Videos

SPSS (Statistical Package for the Social Sciences)

Using Microsoft Excel to Plot and Analyze Kinetic Data

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Getting started manual

Simple Predictive Analytics Curtis Seare

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Instructions for SPSS 21

Exploratory data analysis (Chapter 2) Fall 2011

How To Run Statistical Tests in Excel

Introduction to StatsDirect, 11/05/2012 1

Psych. Research 1 Guide to SPSS 11.0

Excel Charts & Graphs

Appendix 2.1 Tabular and Graphical Methods Using Excel

The Chi-Square Test. STAT E-50 Introduction to Statistics

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Data exploration with Microsoft Excel: univariate analysis

GeoGebra Statistics and Probability

Summarizing and Displaying Categorical Data

How to Use a Data Spreadsheet: Excel

An introduction to using Microsoft Excel for quantitative data analysis

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

TIPS FOR DOING STATISTICS IN EXCEL

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Analyzing Research Data Using Excel

When to use Excel. When NOT to use Excel 9/24/2014

A Picture Really Is Worth a Thousand Words

Chapter 5 Analysis of variance SPSS Analysis of variance

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Main Effects and Interactions

4 Other useful features on the course web page. 5 Accessing SAS

Using Excel for Analyzing Survey Questionnaires Jennifer Leahy

Scatter Plots with Error Bars

SAS Analyst for Windows Tutorial

Charting LibQUAL+(TM) Data. Jeff Stark Training & Development Services Texas A&M University Libraries Texas A&M University

An analysis method for a quantitative outcome and two categorical explanatory variables.

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Chapter 7 Section 7.1: Inference for the Mean of a Population

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics


Drawing a histogram using Excel

SPSS-Applications (Data Analysis)

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Introduction to PASW Statistics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Figure 1. An embedded chart on a worksheet.

Survey Research Data Analysis

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

IBM SPSS Direct Marketing 23

Intro to Excel spreadsheets

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

SPSS Introduction. Yi Li

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Final Exam Practice Problem Answers

IBM SPSS Direct Marketing 22

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Transcription:

Table of Contents Preface Chapter 1: Introduction 1-1 Opening an SPSS Data File... 2 1-2 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 1-3 Reading Non-SPSS Files... 6 o Convert From Excel to SPSS o Convert From Text to SPSS 1-4 Data View in SPSS... 9 1-5 Variable View in SPSS... 10 Chapter 2 2-1 Tables (1.1)... 16 2-2 One Proportion (1.5)... 18 2-3 Summarizing Variables (3.1)... 22 2-4 Cross-Tabulation Tables (4.1)... 25 2-5 Descriptive Statistics (6.1)... 27 o Descriptives o Descriptives with Percentiles, Interquartile Ranges, and Confidence Intervals 2-6 Independent T-Tests (6.3)... 33 Chapter 3 3-1 Paired Samples T-Test (7.2)... 36 3-2 One Sample T-Test (7.3)... 38 3-3 Chi-Square Test of Association (8.2). 40 3-4 ANOVA (9.3)... 43 3-5 Scatterplot (10.1)... 47 o Line of Best Fit o Correlation 3-6 Linear Regression... 51 Chapter 4: Appendix 4-1 Bar Chart... 56 4-2 Boxplot... 58 4-3 Histogram... 60 4-4 Pie Chart... 62 4-5 Side-By-Side Boxplots... 65 4-6 Chi Squared Goodness of Fit Test... 67 4-7 Chi Squared Tests... 70 o Linear Trend Tests o McNemar Test o Relative Risk i

Preface Welcome to your SPSS Statistics 21 Manual which is a companion to Introduction to Statistical Investigations. The manual is designed with simple commands for an operation first, then an example, and then the output. The examples use databases in SPSS (.sav); your professor will tell you where to find these files. The names of the databases have been italicized. To simplify the text, we have also italicized all buttons that should be clicked, and we have put quotations around all labels. We hope you find the manual user friendly. Be aware of special notes (see the icon) that are sure to be useful. We have included helpful screen shots. Use the Table of Contents to navigate the manual. Chapter sections are arranged alphabetically (where possible) for your convenience. The manual provides help for basic graphics and hypothesis testing, but is not all inclusive. Feel free to explore SPSS on your own to find other features. Last Revised: September 2013 iii

Chapter 1: Introduction to SPSS 1 Chapter 1: Introduction to SPSS 1-1 Opening an SPSS Data File... 2 1-2 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 1-3 Reading Non-SPSS Files... 6 o Convert From Excel to SPSS o Convert From Text to SPSS 1-4 Data View in SPSS... 9 1-5 Variable View in SPSS... 10

2 Chapter 1: Introduction to SPSS 1-1 Opening an SPSS Data File Use if you want to open a SPSS data file (*.sav). Commands: 1. Open SPSS Statistics 21. 2. In the SPSS Statistics 21 window, check the bubble next to Open an existing data source. 3. Select OK. 4. Find and double click the file you wish to open. Note: SPSS data files are *.sav files. If your data is not in a *.sav file see 1-3 Reading Non- SPSS Files.

Chapter 1: Introduction to SPSS 3 1-2 Viewing the SPSS Screens Use if you are unfamiliar with the SPSS screens. Data View After opening a SPSS data file, there are two possible ways to view the data (Data View & Variable View). In the bottom left corner of the screen there is a tab labeled Data View. Highlight the Data View tab by clicking on it. Data View displays the data in a spreadsheet. For more information see 1-4 Data Entry in SPSS. Note: Each row is a different subject or individual, while each column is a different variable. Example: Look at row 5 in Data View using Animal Sleep Data.sav. Looking at row 5, this subject is a mountain beaver who has a body weight of 1.35 kg, brain weight of 8.10 grams, non-dreaming sleep of 8.4 hours, and a dreaming sleep of 2.8 hours. Notice the columns are the different variables (species, body weight, brain weight, non-dreaming sleep, and dreaming sleep).

4 Chapter 1: Introduction to SPSS Variable View In the bottom of the screen there is also a tab labeled Variable View (next to Data View). Highlight the Variable View tab by clicking on it. Variable View allows for editing and labeling of the variables. For more information see 1-5 Variable View in SPSS. Output View After running a test or creating a graph, an *Output1 [Document1]-SPSS Statistics Viewer window will appear and will show your results. The column of the title outputs on the left side of the window is like a table of contents for your results. Simply double click the desired title and it will show up in the Output screen. (See figure at right.) To save the Output, select File>Save. Next to File name: type an appropriate name for the data file. Note: Make sure to save both the Output window and the SPSS data file separately because they are separate files. To copy the output into Microsoft Word, right click the item of interest and select Copy. Open Microsoft Word and paste the item into the document.

Chapter 1: Introduction to SPSS 5 This window appears after running descriptive statistics. Notice the Output on the left side of the box.

6 Chapter 1: Introduction to SPSS 1-3 Reading Non-SPSS Files Convert from Excel (.xls) to SPSS (.sav) Use if you want to convert a file from Excel (.xls) to SPSS (.sav). Commands: 1. Make sure your desired Excel file is saved and closed. Note: Some Excel files also contain explanatory sentences about the dataset. Make sure these sentences are deleted (not including the variable names or string values). You want the variable names to be in the first row of the spreadsheet. 2. Open SPSS Statistics 21. 3. In the SPSS Statistics 21 window, select Cancel at the bottom. 4. Click File>Read Text Data in the top left corner. 5. At the bottom of the Open Data window, click the down arrow for Files of type. 6. Select Excel (*.xls, *.xlsx, *.xlsm). 7. Find the Excel file you saved earlier and double-click it. 8. In the Opening Excel Data Source window, make sure Read variable names from the first row of data is checked.

Chapter 1: Introduction to SPSS 7 9. Select OK. 10. Your results should now be in SPSS. Don t forget to save it as a SPSS (*.sav) file by going to File>Save As. Convert from Text (.txt) to SPSS (.sav) Use if you want to convert a file from.txt to SPSS (.sav)

8 Chapter 1: Introduction to SPSS Commands: 1. Make sure your desired.txt file is saved and closed. Note: Some.txt files also contain explanatory sentences about the dataset. Make sure these sentences are deleted (not including the variable names or string values). You want the variable names to be at the top of the document. 2. Open SPSS Statistics 21. 3. In the SPSS Statistics 21 window, select Cancel at the bottom. 4. Click File>Read Text Data in the top left corner. 5. At the bottom of the Open Data window, click the down arrow for Files of type. 6. Select Text (*.txt, *.dat). 7. Find the.txt file you saved earlier and double-click it. 8. In the Text Import Wizard - Step 1 of 6 check the desired bubbles (the default is usually correct most.txt files have a No checked for Does you text file match a predefined format? ). Select Next >. Note: In the following six steps there will be a Text file or Data Preview that shows what the data will look like once it is converted to SPSS. 9. In the Text Import Wizard Step 2 of 6 check the desired bubbles (the default is usually correct most.txt files are Delimited, variable names are at the top of the file). Select Next >. 10. In the Text Import Wizard Step 3 of 6 check the desired bubbles (the default is usually correct most.txt files have the first case of data beginning at line 2, each line represents a case, and you will want to import all of the cases). Select Next>. 11. In the Text Import Wizard Step 4 of 6 check the desired bubbles (the default is usually correct most.txt files have a Tab between variables, the text qualifier is None ). Select Next>. 12. In the Text Import Wizard Step 5 of 6 change the desired information about the variables. Select Next>. 13. In the Text Import Wizard Step 6 of 6 check the desired bubbles (the default is usually correct most.txt files do not save the file for future use, would not like to paste the syntax). Select Finish. 14. Your results should now be in SPSS. Don t forget to save it as a SPSS (*.sav) file by going to File>Save As. Note: When converting a file from Fathom to SPSS, you need to save it as a.txt file. Go through the following steps: 1. Open your Fathom work. 2. Click (once) your brown Collection box so that the boarder is highlighted. 3. Select File>Export Collection in the top left corner. 4. Next to File name save the file as a text file (.txt). 5. Then follow the above instructions to get the.txt file to SPSS.

Chapter 1: Introduction to SPSS 9 1-4 Data View in SPSS Use if you have raw data to enter into SPSS. Data View Commands: 1. Open SPSS Statistics 21. 2. In the SPSS Statistics 21 window, check the bubble next to Type in data. 3. Select OK. 4. In the bottom left corner of the screen make sure the Data View tab is highlighted. 5. Type the data into the cells. Note: Each row is a different subject or individual, while each column is a different variable.

10 Chapter 1: Introduction to SPSS 1-5 Variable View in SPSS Use if you have variables to change in SPSS Variable View Commands: 1. Select Variable View on the bottom of the screen. 2. To change the type of the variable, select Numeric under the Type column. Select the button next to Numeric. Pick the appropriate variable type. Select OK. Note: The most popular variable types are numeric (numbers) and string (words). However, string variables cannot be used for many tests and graphs. A solution to this problem is to code your categorical variable numerically and connect each number with a value label. This is done for the variable dog type below. The process is described more completely later in this section. 3. To change the number of characters allowed in a cell in Data View, select the number under Width. Then click the up and down arrows to get the desired number. 4. To change the number of decimal places in a cell in Data View for numeric variables, select the number under Decimal. Then click the up and down arrows to get the desired number. 5. To label the variables, type your preferred variable name in the Label cell. This label will appear when running tests and descriptive statistics instead of the actual variable name as shown in Data View. Note: Labeling variables can be extremely important. When using datasets with many variables similarly named (i.e. Research_ Question_1, Research_Question_2, Research_ Question_3) running tests will display only Research_Que, Research_Que, Research_Que which is difficult to understand. Labels will make this much easier to comprehend in tests without changing the actual name of the variable. Also, labels can contain spaces, whereas variable names cannot. 6. To add a Value to a variable, Select the button next under Values. Enter the assigned number next to Value:. Enter the assigned label next to Label:. Select Add. Continue to add labels until all the values are assigned. Select OK. Note: Assigning a value to a variable is very common when using SPSS 21. This feature allows for string values to appear in tests and descriptive statistics, but for numeric values to appear in Data View. See the example below. 7. To increase the horizontal length of a variable s column, select the number under Columns. Then click the up and down arrows to get the desired number.

Chapter 1: Introduction to SPSS 11 8. To align the information to the left, right, or center of the cell, select the word under Align. Then select the drop down arrow and select the desired alignment. 9. To change the measure of the variable (scale, nominal, ordinal), select the word under Measure. Then select the drop down arrow and select the desired measure. Example: Enter data into SPSS (using both Data View & Variable View) using the following information: A dog owner has 4 dogs, which include 2 Chihuahuas and 2 Great Danes. The owner wants to look at how much the dogs weigh (in pounds) and the amount of food the dogs consume per day (in cups). The first Chihuahua weighs 4 pounds and eats.5 cups of food per day. The second Chihuahua weighs 2 pounds and eats.25 cups of food per day. The first Great Dane weighs 120 pounds and eats 2.5 cups of food per day. The second Great Dane weights 200 pounds and eats 4 cups of food per day. Also, label Chihuahua as 1 and Great Dane as 2 (so the variables are not string). This is Data View. Notice in the figure above, Chihuahua has a value of 1, and Great Dane has a value of 2.

12 Chapter 1: Introduction to SPSS The figure above shows how to change the values of the Dog_Type variable. Notice also that the labels have been changed for the Dog_Type and Dog_Weight variables. The effects of these changes are shown in the two figures below. Both of these images come from running descriptive statistics. Notice instead of 1 or 2, "Chihuahua and Great Dane are shown, and instead of Dog_Type or Dog_Weight, Type of Dog and Weight of Dog are shown.

Chapter 1: Introduction to SPSS 13 Computing a New Variable Using Mathematical Operations Use if you want to create a new variable as a result of mathematical or conditional operations on existing variables. Commands: 1. Transform>Compute Variable. 2. Type the name of the new variable under Target Variable: (make sure the new name has no spaces in it). 3. Select and drag the variable(s) of interest under Numeric Expression:. Also indicate the mathematical relationship between the target variable and the existing variable(s) using the calculator pad or the computer keyboard. 4. Select OK. Example: Calculate the average golf score of the four rounds using GolfScores.sav. This shows the process. Notice under Numeric Expression: there is an equation to get the average score of the four rounds, which was created by the above calculator pad and the computer keyboard.

14 Chapter 1: Introduction to SPSS This shows that the average scores for the four rounds are now recoded into a new column (AverageScore). For example, Tiger had an average score of 68=((67+68+70+67)/4).

Chapter 2: Descriptive Statistics 15 Chapter 2 2-1 Tables (1.1)... 16 2-2 One Proportion (1.5)... 18 2-3 Summarizing Variables (3.1)... 22 2-4 Cross-Tabulation Tables (4.1)... 25 2-5 Comparing Statistics (6.1)... 27 o Descriptives o Descriptives with Percentiles, Interquartile Ranges, and Confidence Intervals 2-6 Independent T-Tests (6.3)... 33

16 Chapter 2: Descriptive Statistics 2-1 Tables Use if variable is categorical. Commands: 1. Analyze>Descriptive Statistics>Frequencies. 2. Select and drag the variable of interest into the Variable(s): box. 3. Make sure Display frequency tables is checked. 4. Select OK. Example: Explore the variable Shop from the data set Coffee Data.sav. We will create a table of frequencies (counts) and percentages of students who frequent the different coffee shops. Output: This box shows there are 177 students in our sample and there is no missing data.

Chapter 2: Descriptive Statistics 17 This box shows the desired table. The Frequency column shows the number of students who preferred a certain coffee shop. The Percent column shows the percent of students who chose a certain shop out of the total number of students in the sample (including the missing data). The Valid Percent column shows the percent of students who chose a certain shop out of the number of students in the sample (without the missing data). If there is no missing data then the Percent and Valid Percent will be exactly the same. The Cumulative Percent column shows the percent of students who chose that shop and the shop(s) already listed from the Valid Percent column. For example, 41.8 + 31.6 = 73.4 percent of students chose Lemonjello s or JP s.

18 Chapter 2: Descriptive Statistics 2-2 One Proportion Use if variable is categorical and binary. Note: To use this method, you will need to create a new document with two variables. Both variables need to be numeric. The first variable will have the two values for the categorical variable. In Variable View under Values, add value labels for these two values. This will make your output more readable. The second variable will be the observed count of each category. Commands: 1. Data>Weight Cases 2. Select Weight cases by and send the frequency/count to the Frequency Variable: box. OK. 3. Analyze>Nonparametric Tests>Chi-Square 4. The Test Variable List: is the categorical variable. 5. In the Expected Values box, select All categories equal if the population proportion is 50%. If the population proportion is not 50%, select Values. Type in one value at a time in the Value box, starting with the hypothesized proportion of the first group, then adding the corresponding proportion for the second group. (The first group is the group with the smallest numeric value in Variable View. These two proportions must have a sum of 100.) 6. Select OK.

Chapter 2: Descriptive Statistics 19 Example: Analyze if the proportion of females at Hope College is different from the national average of 57% female students in college. Use the data set Hope Student Survey 2008.sav. You will need to find the counts of males and females to create your new spreadsheet. The null and alternative hypotheses you will test are as follows. H 0 : The proportion of Hope College females is the same as the national average of 57%. H a : The proportion of Hope College females is different from the national average of 57%. First, create a new SPSS Data Sheet where the first variable is gender with values 0 and 1. In variable view under Values, enter value labels of female and male for the two values.

20 Chapter 2: Descriptive Statistics The second variable is the observed count. In the Hope Spring Survey 2008 data set, you will need to find the sum of each gender (Analyze>Descriptive Statistics>Frequencies, under Statistics check Sum ; the first box Statistics of the output gives the observed counts for each gender in the Sum row.) In the new SPSS Data Sheet, enter each count value, as shown below. Next, follow the aforementioned Commands to display the desired Output shown below. Output: This table displays the observed counts of females and males in our sample of Hope College students in the first column. The second column shows the counts we would have expected to see if the proportion of females at Hope were the same as the national average. The residual is the distance between the observed count and the expected count. This table displays the p-value of our test of significance (see the circle). The footnote a under the table tells you that all of the cells have expected cell counts of at least 5. The Chi-Square can have at most 20% of the cells in the table with an Expected Count less than 5.

Chapter 2: Descriptive Statistics 21 P-value: p =.695 >.05, fail to reject H 0. Conclusion: We do not have enough evidence to show that the proportion of Hope College females is significantly different from the national average of 57% females enrolled in college. Note: SPSS will always give the two-sided p-value. If you would like to get the one-sided p- value, divide the SPSS two-sided p-value in half (assuming that the observed proportion is in the direction of the alternative hypothesis).

22 Chapter 2: Descriptive Statistics 2-3 Summarizing Quantitative Variables Use if variable is quantitative. Commands: 1. Analyze>Descriptive Statistics>Frequencies. 2. Drag the appropriate quantitative variable from the list to the Variables(s): box. 3. Select Statistics. 4. Check the desired numeric summaries. To get the Quartiles or Percentiles, check the desired boxes under Percentile Values. When checking Percentile(s): enter the value in the box then select Add. To get the Mean, Median, Mode, or Sum, check the desired boxes under Central Tendency. To get the Standard Deviation, Variance, Range, Maximum, Minimum, or S.E. Mean, check the desired boxes under Dispersion. To get the Skewness or Kurtosis, check the desired boxes under Distribution. 5. Select Continue. 6. To create a histogram, select Charts and make sure Histograms: is selected. Check With normal curve if desired. 7. Select OK. Note: This histogram does not allow all the options to edit it or modify it. See Appendix: 4-3 Histograms on page 60 for more chart options.

Chapter 2: Descriptive Statistics 23 Example: Explore the variable egg length from the data set Cuckoo Eggs.sav. We will find the mean, median, mode, standard deviation, minimum, maximum, and the 20 th percentile. We will also create a histogram with a normal curve superimposed for the egg lengths. Output: This box gives the mean (22.5567), median (22.65), mode (23.05), standard deviation (1.15095), minimum (19.85), maximum (25.05), and the 20 th percentile (21.4900) for the egg lengths.

24 Chapter 2: Descriptive Statistics The box above gives information about individual egg lengths. The Valid column includes all the different egg lengths in the dataset. The Frequency column shows the number of eggs that have a certain egg length. The Percent column shows the percent of eggs that have a certain egg length out of the total number of eggs in the sample (including the missing data). The Valid Percent column shows the percent of eggs that have a certain egg length out of the number of eggs in the sample (without the missing data). If there is no missing data then the Percent and Valid Percent will be exactly the same. The Cumulative Percent column shows the percent of eggs that have a certain length or any length(s) less than it from the Valid Percent column. For example 1.3 + 1.3 + 1.3 = 4.0 percent of eggs have a length of 20.25 or less. Histogram of the egg lengths with a normal curve superimposed.

Chapter 2: Descriptive Statistics 25 2-4 Cross-Tabulation Table Use if both independent and dependent variables are categorical. Commands: 1. Analyze>Descriptive Statistics>Crosstabs. 2. Drag the independent (categorical, explanatory) variable into the Row(s): box. 3. Drag the dependent (categorical, response) variable into the Column(s): box. 4. Under Cells Check Row in the Percentages box. 5. Select Continue. 6. Select OK. Example: Explore the relationship between the variables gender and penalties from the data set Penalties Data.sav. We will create a cross-tabulation table of gender predicting penalties. Output: This box shows there are 100 students in our sample and there is no missing data.

26 Chapter 2: Descriptive Statistics This box shows the desired cross-tabulation table. Notice the circle shows that 16 males received a penalty. The percent of males who received a penalty is 16 out of the total 46 males, which is 34.8%.

Chapter 2: Descriptive Statistics 27 Descriptives 2-5 Comparing Descriptive Statistics Use if the dependent variable is quantitative or ordinal and the independent variable is categorical. Commands: 1. Analyze>Compare Means>Means. 2. Under Dependent List: enter the dependent (quantitative, response) variable. 3. Under Independent List: enter the independent (categorical, explanatory) variable. 4. Under Options drag any of the desired actions (mean, standard error of mean, median, grouped median, sum, minimum, maximum, range, number of cases, first, last, standard deviation, variance, kurtosis, standard error of kurtosis, harmonic mean, geometric mean, percent of total sum, percent of total n, etc.) from under Statistics: to under Cell Statistics: 5. Select Continue. 6. Select OK in the Means window. Example: Explore the variable time, which measures time spent in a bathroom, from the data set Restrooms & Gender.sav. We will compare the mean, range, and standard deviation for males and females (variable: gender).

28 Chapter 2: Descriptive Statistics Output: This box shows there are 97 participants that do not have missing data. There are 5 participants that have missing time data with a total of 102 participants. This box shows the desired descriptive statistics for both males and females so comparisons can be performed. The mean bathroom time for females is 133.7826 seconds, while the mean bathroom time for males is 106.1569 seconds. The range bathroom time for females is 265 seconds, while the range bathroom time for males is 387 seconds. The standard deviation bathroom time for females is 60.65894 seconds, while the standard deviation bathroom time for males is 78.23180.

Chapter 2: Descriptive Statistics 29 Descriptives with Percentiles, Interquartile Range, & Confidence Intervals Use if the dependent variable is quantitative or ordinal and the independent variable is categorical. Note: This method has a similar output as the Descriptives method, but this method will give percentiles, interquartile range, and confidence intervals. Commands: 1. Analyze>Descriptive Statistics>Explore. 2. Under Dependent List: enter the dependent (quantitative, response) variable. 3. Under Factor List: enter the independent (categorical, explanatory) variable. 4. Select Statistics Check Descriptives to get the mean, 95% confidence interval for mean (upper and lower bound), 5% trimmed mean, median, variance, standard deviation, minimum, maximum, range, interquartile range, skewness, and kurtosis. Check Percentiles to get the 5 th, 10 th, 25 th, 50 th, 75 th, 90 th, and 95 th percentile. 5. Select Continue. 6. Select OK. Example: Once again we will explore the variable time from the data set Restrooms & Gender.sav. We will compare the mean, range, standard deviation, and the 50 th percentile for males and females.

30 Chapter 2: Descriptive Statistics Output: This box shows there are 46 female and 51 male participants. There are 5 females that have missing time data with a total of 51 male and 51 female participants. This box shows the descriptive statistics for both males and females so comparisons can be performed. The mean bathroom times for females (133.7826 seconds) and males (106.1569 seconds) are given along with 95% confidence intervals for these values. Many other descriptive statistics are provided as well.

Chapter 2: Descriptive Statistics 31 This box shows different percentiles of male and female time in the bathroom. The two rows (Weighted Average and Tukey s Hinges) calculate percentiles slightly differently, but either one can be used. The 50 th percentile for females is 121.5 seconds in the bathroom, while the 50 th percentile for males is 82 seconds in the bathroom. This is the female stem-and-leaf plot. Notice the stem width is 100. This is the male stem-and-leaf plot. Notice the stem width is 100.

32 Chapter 2: Descriptive Statistics Shown here is a side-by-side boxplot of the amount of time spent in the bathroom for males and females. The open circles represent potential outliers using the 1.5 IQR rule, and the asterisks represent outliers using the 3 IQR rule.

Chapter 2: Descriptive Statistics 33 Independent Samples T Test 2-6 Independent T Tests Use if there is a binary categorical explanatory variable and a quantitative or ordinal response variable and the two groups being compared are independent (i.e. compare the means of two independent groups). Commands: 1. Analyze>Compare Means>Independent-Samples T Test 2. The Test Variable is the dependent (quantitative, response) variable. 3. The Grouping Variable is the independent (categorical, explanatory) variable. 4. Define Groups to indicate to SPSS which two groups are being compared. In the box Group1, type in the variable value (as it appears in Data View ) for the first group (i.e.: 0 ). In the box Group 2, type in the variable value (as it appears in Data View ) for the second group (i.e.: 1 ). Select Continue. 5. Select OK. Example: Use the variables sleep hours and gender from the data set Hope Student Survey 2008.sav to analyze whether or not the mean hours of sleep per night for females (µ F ) is significantly different from the mean hours of sleep per night for males (µ M ). We will test the following hypotheses: H 0 : The mean sleep hours of males is the same as the mean sleep hours of females. (µ M = µ F ) H a : The mean sleep hours of males is different than the mean sleep hours of females. (µ M µ F )

34 Chapter 2: Descriptive Statistics Output: The above table gives the sample size ( N ), mean, standard deviation, and standard error mean for the variable sleep hours for each gender. It shows that there is not much of a difference in the average hours of sleep per night between females and males. The p-value (.677) for the test (see the red circle) is in the Sig. (2-tailed) column. Sig. (short for significance) is what SPSS calls the p-value. Also notice the Equal variances not assumed row. The p-value will not always be the same in both rows (though it is in this case). Always use the equal variances not assumed row as it is generally accepted in statistical practice that it is safer not to assume equal variances. Notice the confidence interval (see the blue circle). The confidence interval contains zero, meaning zero is a possible difference in mean sleep hours. This makes sense because our difference in means is not significant. The confidence interval gives similar information as the p-value: because the confidence interval contains zero, the p-value is non-significant. P-value: p =.677 >.05, fail to reject H 0. Conclusion: We do not have enough evidence to show there is a significant difference in average amount of hours of sleep per night obtained by males and females. Note: SPSS will always give the two-sided p-value. To get the one-sided p-value (assuming that the sample mean of Group 1 is larger than Group 2 and the alternative hypothesis is such that the population mean of Group 1 is larger than Group 2), divide the SPSS two-sided p-value in half.

Chapter 3: Tests of Significance 35 Chapter 3 3-1 Paired Samples T-Test (7.2)... 36 3-2 One Sample T-Test (7.3)... 38 3-3 Chi-Square Test of Association (8.2). 40 3-5 Scatterplot (10.1)... 47 o Line of Best Fit o Correlation 3-6 Linear Regression (10.4)... 51 3-4 ANOVA (9.3)... 43

36 Chapter 3: Tests of Significance 3-1 Paired Samples T-Test Use if there is a quantitative or ordinal response variable and a binary categorical explanatory variable that has a 1-1 correspondence between the two groups (i.e. compare the means where the two groups in the sample are paired, such as married couples and comparing the mean of the men to the mean of the women). Commands: 1. Analyze>Compare Means>Paired-Samples T Test 2. Put one of the variables into the Variable1 column and the other into the Variable2 column in the first row. 3. Select OK. Note: The test will calculate Variable1 - Variable2. Note: If you have a 1-1 correspondence between the two groups, you may need to restructure your data so that each row corresponds to a pair of individuals instead of a single individual. Example: Analyze if there is a significant difference on average between the variables pulse sitting and pulse standing in the data set Pulse Data.sav. Since these pulse rates come form the same individual, there is dependency between the two pulse rate measures. We will test the following hypotheses to answer the question: On average, is there a significant difference between one s sitting pulse rate (µ Sit ) and one s standing pulse rate (µ Stand )? H 0 : The mean pulse rate sitting is the same as the mean pulse rate standing. (µ Sit = µ Stand ) H a : The mean pulse rate sitting is different from the mean pulse rate standing. (µ Sit µ Stand )

Chapter 3: Tests of Significance 37 Output: This table gives mean pulse rate, sample size N (which should be the same because the sample is the same for each variable), standard deviation, and standard error mean for each group. This box gives the correlation (.866, a strong correlation) between pulse rate sitting and pulse rate standing. The correlation is highly significant (p <.001) (see the circle). Although correlation is not the main purpose of the paired samples t test, it can give helpful information. The difference in the means is -6 (Pulse Sitting - Pulse Standing). The p-value for the difference in means is.005 (see the blue circle). Notice the confidence interval (see the red circle). The confidence interval does not contain zero, meaning zero is not a possible difference in mean pulse rates. This corresponds to the p- value from our test of significance. Our p-value leads us to reject the null hypothesis of no difference between the mean pulse rates. P-value: p =.005, reject H 0. Conclusion: We have enough evidence to show that the mean pulse rate while sitting is significantly different from the mean pulse rate while standing. On average, the standing pulse rate is 6 beats more per minute than the sitting pulse rate. Note: SPSS will always give the two-sided p-value. To get the one-sided p-value (assuming that the sample mean of Group 1 is larger than Group 2 and the alternative hypothesis is such that the population mean of Group 1 is larger than Group 2), divide the SPSS two-sided p-value by two. Note: To perform the paired samples t-test you can also create a new variable as the difference between your two quantitative variables (see 1-5 Variable View in SPSS, section entitled Computing a New Variable Using Mathematical Operations) and perform a one-sample T-Test on the newly created difference (see 2-6 Independent T-Tests).

38 Chapter 3: Tests of Significance 3-2 One Sample T-Test Use if there is a quantitative or ordinal variable and a given population parameter (i.e. compare a mean to a population mean). Commands: 1. Analyze>Compare Means>One-Sample T Test 2. The Test Variable is the variable of interest. 3. Type in the Test Value (the population mean that you wish to compare your sample to). 4. Select OK. Example: Use the variable StudyHrs from the data set Hope Student Survey 2008.sav to analyze whether or not the mean study hours per week (µ) of Hope College students is the same as or different from the recommended 30 hours per week for all full-time college students. We will test the following hypotheses: H 0 : The average study hours per week of Hope students is 30 hours. (µ = 30) H a : The average study hours per week of Hope students is not 30 hours. (µ 30) Output: The above table gives the sample size ( N ), mean standard deviation, and standard error mean of study hours for Hope students.

Chapter 3: Tests of Significance 39 The p-value (<.001) for the test (see the blue circle) is in the Sig. (2-tailed) column. Sig. (short for significance) is what SPSS calls the p-value. Notice the confidence interval (see the red circle). It is the confidence interval for the difference of means (Sample Mean Test Value). If you want the confidence interval for the true mean, add the test value to the lower and upper ends of the confidence interval. Also notice that because the confidence interval does not contain zero (zero is not a possible difference in means), the p-value is significant. If it had contained zero (zero is a possible difference in means), the p-value would have been non-significant. For this example, the confidence interval would have a lower bound of 30+(-13.178)=16.822 and an upper bound of 30+(-10.661)=19.339. P-value: Since p < 0.001 <.05, reject H 0. Conclusion: We have enough evidence to show that the average study hours per week of Hope students are significantly different from the recommended 30 hours per week. Note: SPSS will always give the two-sided p-value. If you would like to get the one-sided p- value, divide the SPSS two-sided p-value in half (assuming that the sample mean is in the direction of the alternative hypothesis).

40 Chapter 3: Tests of Significance 3-3 Chi Square Test of Association Use if both independent and dependent variables are categorical and the groups being compared are independent. Commands: 1. Analyze>Descriptive Statistics>Crosstabs 2. The Row(s) is the independent (explanatory) variable. 3. The Column(s) is the dependent (response) variable. 4. Under Statistics check Chi-square. Select Continue. 5. Under Cells check Expected in the Counts box. Check Row in the Percentage box. Select Continue. 6. Select OK. Example: Analyze the association between the variables gender and in Greek life from the data set Hope Student Survey 2008.sav. We will test whether or not the proportion of males involved in Greek life is different from the proportion of females involved in Greek life at Hope College. Our null and alternative hypotheses are as follows. H 0 : The proportion of males in Greek life is the same as the proportion of females in Greek life. H a : The proportion of males in Greek life is different from the proportion of females in Greek life.

Chapter 3: Tests of Significance 41 Output: This box gives the variables and the sample size (N) of the valid, missing, and total data. This box gives the cross-tabulation table. We see that 35 females ( Count ) are in Greek life. Because Row percentages were selected, we know that 19.1% of Hope College females are in Greek life (see the blue circle). Also, 25 males ( Count ) are in Greek life, which is 18.9% of Hope College males (see the red circle). The Expected Count is the number of each gender we would expect to see involved in Greek life if the proportions of students in Greek life were the same for both genders.

42 Chapter 3: Tests of Significance This box gives us the p-value of.967 (see the circle). The p-value is in the Pearson Chi- Square row and Asymp. Sig. (2-sided) column. The footnote a under the table tells you that all of the cells have expected cell counts of at least 5. The Chi-Square can have at most 20% of the cells in the table with an Expected Count less than 5. (The Expected Count can be seen in the cross-tabulation table.) P-value: p =.967 >.05, fail to reject H 0. Conclusion: We do not have enough evidence to show that the proportion of males in Greek life (18.9%) is significantly different from the proportion of females in Greek life (19.1%).

Chapter 3: Tests of Significance 43 3-4 ANOVA Use if there is a multiple category explanatory variable and a quantitative or ordinal response variable and the multiple groups being compared are independent (i.e. compare the means of multiple independent groups). Commands: 1. Analyze>Compare Means>One-Way ANOVA 2. The Dependent List is the dependent (response) variable. 3. The Factor is the independent (explanatory) variable. 4. Under Post-Hoc you can check either Tukey or Tamhane s T2 to do post-hoc comparisons of the group means. Select Continue. Note: Tukey can only be used when the standard deviations are within a factor of two of each other. 5. Under Options you can check Descriptive to get means and standard deviations within each group. Select Continue. 6. Select OK. Example: Analyze the variable egg length by host bird from the data set Cuckoo Eggs.sav. We will compare the means of cuckoo egg lengths between the five different host birds. Our null and alternative hypotheses are as follows. H 0 : The mean egg length for each host bird is equal. H a : At least one host bird has a different mean egg length from the other host birds.

44 Chapter 3: Tests of Significance Output: This box gives the sample sizes (N), the means, the standard deviations, and the confidence intervals of the means of each group. (In this example, the group is the type of host bird.) Because the p-value (Sig) is <.001 (see the circle), we have evidence that at least one of the group means is different from the others. Since there is evidence of a difference and the standard deviations (from the first table) are within a factor of two, (check if largest and smallest are within a factor of 2, if so then all are within a factor of two: 23.1214/21.13 = 1.0942 < 2) we can use the Tukey post-hoc comparison. The mean squares are also given, along with the F statistic: 10.268/.814 = 12.619. Note: If the p-value is not significant, then a post-hoc comparison is unnecessary. If the standard deviations are not within a factor of two of each other, but the p-value is significant, we should use the Tamhane s T2 post-hoc comparison. P-value: p <.001 <.05, reject H 0. Conclusion: We have evidence that at least one host bird has a different mean egg length from the other host birds.

Chapter 3: Tests of Significance 45 This box shows the results of the Tukey post-hoc analysis. It shows the difference in means between every possible group. The statistically significant differences are starred. From this, we see the Wren has a significantly different egg length than the Tree Pipit (p <.001), Hedge Sparrow (p <.001), Robin (p <.001), and the Pied Wagtail (p <.001) (see the red circle). However, none of the other groups are significantly different from each other. The confidence interval is given for the difference in means between the two groups. Thus, in cases where it does not include zero there is evidence of a statistically significant difference in means (see the blue circle), which coincide with the p-value. If the confidence interval contains zero (meaning zero is a possible difference in means), the p-value is non-significant.

46 Chapter 3: Tests of Significance This box summarizes the post-hoc test by identifying homogenous subsets of means. The different subsets are grouped so that each column contains a set of means that does not differ significantly from the other means in the column and does differ significantly from the means outside of its column. In this example, the mean egg length in the Wren nest is significantly different than the mean egg lengths of all the other host birds; and the Robin, Pied Wagtail, Tree Pipit, and Hedge Sparrow do not have significantly different mean egg lengths from each other.

Chapter 3: Tests of Significance 47 3-5 Scatterplot Use if both variables are quantitative. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Scatter/Dot. 3. Drag the first icon (Simple Scatter) into the white space labeled Drag a Gallery chart here... 4. Drag the appropriate quantitative variables from the Variables: list into the boxes for the X-Axis? and Y-Axis? 5. To change the range of the axes use the Element Properties dialog box and look at the Edit properties of:. Select X-Axis1 (Point1) or Y-Axis1 (Point1). Then under Scale Range change the Minimum or Maximum by unchecking the Automatic box and typing in your own Custom values. Select Apply. 6. Select OK. Note: Make sure the quantitative variables for both the X-axis and Y-axis are correctly identified by SPSS as scale variables or you will not be able to create a line of best fit on your scatterplot. To check this, right click on each of the variable names in the Chart Builder window under Variables:. Make sure there is a dot next to Scale.

48 Chapter 3: Tests of Significance Line of Best Fit Commands: 1. After creating the scatterplot, double click on the scatterplot graph in the Output window to get the Chart Editor window. 2. Elements>Fit Line at Total. 3. Make sure Linear is checked under Fit Method in the Properties window. Select Close. 4. File>Close in the Chart Editor window. Example: Explore the relationship between the variables weight and height from the data set Body Fat Data.sav. We will create a scatterplot of these variables with the line of best fit. Output: The scatterplot shows a moderately strong positive relationship between weight and height of individuals. The correlation coefficient, r, is 0.514. *Residual Plot: For instructions on how to do a residual plot see page 51.

Chapter 3: Tests of Significance 49 Correlation Use if both variables are quantitative. Commands: 1. Analyze>Correlate>Bivariate 2. Select the two variables of interest. 3. Make sure Pearson is checked in the Correlation Coefficients box. 4. Select OK. *For instructions on how to do a residual plot see page 51. For instructions on how to do a scatter plot see page 47.

50 Chapter 3: Tests of Significance Example: Analyze the correlation (ρ) between the variables body length and body width, measurements taken from a sample of Dover sole, from the data set Fish.sav. H 0 : There is no relationship between the body length and the body width of Dover sole. (ρ = 0) H a : There is a relationship between the body length and the body width of Dover sole. (ρ 0) Output: The correlation between length and width is 0.831 (see blue circles), a strong correlation, for this sample size (N) of 100. The p-value is < 0.001 (see red circles). This shows that the correlation is significantly different from 0. Note: The p-value for testing correlation is the same as testing for slope in a simple (one explanatory variable) linear regression model. P-value: p <.001 <.05, reject H 0. Conclusion: We have enough evidence to show a significant relationship between the body length and the body width of Dover sole. The correlation was.831.

Chapter 3: Tests of Significance 51 3-6 Linear Regression Use if both independent and dependent variables are quantitative or ordinal. Commands: 1. Analyze>Regression>Linear 2. Select the Dependent (response) and Independent(s) (explanatory) variables. 3. Under Statistics you can check Confidence intervals to have confidence intervals put on the beta coefficient estimates. Select Continue. 4. Under Save you can check the box for Unstandardized under Residuals in order to be able to create a residual plot later. Select Continue. 5. Select OK. Example: Analyze the linear relationship between heart rate and body temperature from the data set Body Temp & Heart Rate.sav. We will test the following hypotheses to answer the question: is heart rate a good predictor of body temperature? H 0 : There is no linear relationship between heart rate and body temperature. (β = 0) H a : There is a linear relationship between heart rate and body temperature. (β 0) Output: This box simply shows the variables used in the model: Variables Entered is the independent variable, and the dependent variable is shown at the bottom

52 Chapter 3: Tests of Significance This box indicates that the correlation coefficient, r, (column R) is.254, and the r 2 value (column R Square) is.064. This box gives the p-value (see the circle) for the significance of the correlation coefficient, r. When there is only one independent variable, the p-value for testing the correlation coefficient, r, will be the same as the p-value for testing the slope of the independent variable (see below). In the B column of the Unstandardized Coefficients section, we find the y-intercept of 96.307 and the slope of 0.026 for the least squares regression line (see the red circle). Thus, Body Temperature = 96.307 +.026*HeartRate. To use this equation, plug in the heart rate to estimate/predict the body temperature. For example, to estimate/predict the body temperature of a person with a heart rate of 70 beats per minute, plug in 70 for HeartRate : Body Temperature = 96.307 +.026*70 = 98.127 F. The p-value for the test of the intercept being significantly different than 0 is p <.001, which is a fairly useless test. The p-value for the test of the slope β (and correlation) being significantly different than 0 is p =.004 (see the black circle). Note: Recall that the p-value for testing correlation is the same as the p-value for testing slope in a simple (one explanatory variable) linear regression model.

Chapter 3: Tests of Significance 53 The 95% upper and lower confidence bounds are given for both the slope and the intercept estimates (see the blue circle). Neither of which contain zero, meaning zero is not a possible value for the intercept or for the slope, coinciding with the p-values both being significant. If the confidence intervals had included zero, meaning zero is a possible value for the intercept or for the slope, the p-value would have been non-significant. P-value: Since p =.004 <.05, reject H 0. Conclusion: We have enough evidence to show a significant linear relationship between heart rate and body temperature. (Evidence of β 0) Creating a Residual Plot Creating a residual plot is an additional step in SPSS. You should have requested Unstandardized Residuals be Saved (see Command 4 above) when running your analysis. If you look at the SPSS spreadsheet (DataView) you will see a new column (variable) called Res_1. This column gives the residual for each individual in your dataset. You can then create a residual plot by creating a scatter plot (Sections 3-6 and 3-6 of this manual) of the residuals (Res_1) vs. your explanatory variable.

54 Chapter 3: Tests of Significance Note: Other more sophisticated residual plots are available by using the Plots dialog when running the Linear Regression.

Chapter 4: Appendix 55 Appendix 4-1 Bar Chart... 56 4-2 Boxplot... 58 4-3 Histogram... 60 4-4 Pie Chart... 62 4-5 Side-By-Side Boxplots... 65 4-6 Chi Squared Goodness of Fit Test... 67 4-7 Chi Squared Tests... 70 o Linear Trend Test o McNemar Test o Relative Risk

56 Chapter 4: Appendix 4-1 Bar Chart Use if variable is categorical. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Bar. 3. Drag the first icon (Simple Bar) into the white space that says Drag a Gallery chart here... 4. Drag the appropriate categorical variable from the Variables: list into the X-Axis? box. 5. To change the y-axis to percentages instead of counts, use the Element Properties dialog box, shown on the left side of the image to the right. Under Statistics use the pull down arrow to select Percentage (?). Select Apply. 6. Select OK. 7. To add labels, right click the bar chart in the Output window. Select Edit Content>In Separate Window. In the Chart Editor window, right click a bar in the bar chart. Select Show Data Labels. In the Properties box, drag the desired labels from the Not Displayed: box to the Displayed: box. Select Apply then Close in the Properties box. Select File>Close in the Chart Editor window. 8. To move the data labels on the bar chart, right click the bar chart in the Output window. Select Edit Content>In Separate Window. Click and drag the labels to the appropriate spots. Select File>Close in the Chart Editor window.

Chapter 4: Appendix 57 Example: Explore the variable Shop from the data set Coffee Data.sav. We will create a bar chart of the percent of students who frequent the different coffee shops. Output: This bar chart shows that roughly 41% of students frequent Lemonjellos, 32% JP s, 23% Starbucks, and 4% some other coffee shop. Note: If you want to create a bar graph with two categorical variables (for example, shop and gender), drag the Clustered Bar instead of the Simple Bar. Then drag shop to the X-axis and drag gender to the Cluster of X: set color box. (See the graph below.)

58 Chapter 4: Appendix 4-2 Boxplot Use if variable is quantitative. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Boxplot. 3. Drag the last icon (1-D Boxplot) into the white space that says Drag a Gallery chart here... 4. Drag the appropriate quantitative variable from the Variables: list into the X-Axis? box. 5. Select OK. Note: Make sure the quantitative variable used for the Y-axis is correctly identified by SPSS as a scale variable or you will not be able to create a boxplot. To check this, right click on the variable name in the Chart Builder window under Variables. Make sure there is a dot next to Scale.

Chapter 4: Appendix 59 Example: Explore the variable weight from the data set Body Fat Data.sav. We will create a boxplot of weight. Output: This boxplot shows that individual number 40 in the data set has a weight that is classified as a potential outlier. (This individual s weight is 262.75.) Potential outliers are found using the 1.5 IQR rule and are denoted with an open circle. Outliers are found using the 3 IQR rule and are denoted with an asterisk. From this boxplot we can roughly note the 5-number summary. Minimum: 120 Q1: 155 Median: 175 Q3: 195 Maximum (not an outlier): 250

60 Chapter 4: Appendix 4-3 Histogram Use if variable is quantitative. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Histogram. 3. Drag the first icon (Simple Histogram) into the white space that says Drag a Gallery chart here... 4. Drag the appropriate quantitative variable from the Variables: list into the X-Axis? box. 5. To change the number of bars use the Element Properties dialog box and select Set Parameters. Select the Custom circle under Bin Sizes and type in the number of bars desired next to the Number of intervals:. 6. Select OK.

Chapter 4: Appendix 61 Note: Make sure the quantitative variable used for the X-axis is correctly identified by SPSS as a scale variable or you will not be able to create a histogram. To check this, right click on the variable name in the Chart Builder window under Variables:. Make sure there is a dot next to Scale. Example: Explore the variable weight from the data set Body Fat Data.sav. We will create a histogram of weight. Output: This histogram shows that the variable weight, for the most part, follows a normal distribution centered at about 175 pounds. There is a slight skew to the right.

62 Chapter 4: Appendix 4-4 Pie Chart Use if variable is categorical. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Pie/Polar. 3. Drag the icon (Pie Chart) into the white space that says Drag a Gallery chart here... 4. Drag the appropriate categorical variable from the Variables: list into the Slice by? box. 5. Select OK.

Chapter 4: Appendix 63 6. To add labels, right click the pie chart in the Output window. Select Edit Content>In Separate Window. In the Chart Editor window, right click the pie chart. Select Show Data Labels. In the Properties box, drag the desired labels from the Not Displayed: box to the Displayed: box. Select Apply then Close in the Properties box. To move the data labels on the pie chart, click and drag the labels to the appropriate spot in the Chart Editor window. Select File>Close in the Chart Editor window. Note: Make sure the categorical variable is correctly identified by SPSS as a nominal variable or an ordinal variable or you will not be able to create a pie chart. To check this, right click on each of the variable names in the Chart Builder window under Variables:. Make sure there is a dot next to Nominal or Ordinal. Example: Explore the variable shop from the data set Coffee Data.sav. We will create a pie chart of the percentages of students who frequent the different coffee shops. We will put the percent of students frequenting each shop and the shop name on each slice.

64 Chapter 4: Appendix Output: The pie chart shows that most of the students frequent Lemonjello s. The next most frequented coffee shop is JP s then Starbucks, and finally other coffee shops than the three already listed.

Chapter 4: Appendix 65 4-5 Side-By-Side Boxplots Use if the dependent variable is quantitative or ordinal and the independent variable is categorical. Commands: 1. Graphs>Chart Builder. 2. From the Gallery tab, select Boxplot. 3. Drag the first icon (Simple Boxplot) into the white space that says Drag a Gallery chart here... 4. Drag the appropriate independent (categorical, explanatory) variable from the Variables: list into the X-Axis? box and the appropriate dependent (quantitative, response) variable into the Y-Axis? box. 5. Select OK.

66 Chapter 4: Appendix Example: Explore the variable class attendance by the variable day in the data set Attendance Data.sav. We will create side-by-side boxplots of class attendance for each day of the week (Monday through Friday). Output: These side-by-side boxplots enable us to compare class attendance for different days of the week. We can see that Friday seems to have lower class attendance than the other days of the week.

Chapter 4: Appendix 67 4-6 Chi Squared Goodness of Fit Test Use if the variable is categorical and the known model of categories exists. Note: To use this test, a new document will need to be created with two variables. Both variables need to be numeric. The first variable will have the values 1 to k, where k is the number of categories for your categorical variable. In Variable View under Values, add value labels for these k values. This will make your output more readable. The second variable will be the observed count of each category. Commands: 1. Data>Weight Cases 2. Select Weight cases by and send the frequency/count to the Frequency Variable: box. 3. Select OK. 4. Analyze>Nonparametric Tests>Chi-Square 5. The Test Variable List: is the categorical variable. 6. In the Expected Values box, select All categories equal if the given model has equal proportions in each group (such as 50-50 or 20-20-20-20-20). If the given model has unequal proportions, select Values. Type in one value at a time in the Value box, starting with the frequency of the first group, then adding the second group, then the third group, etc. (The first group is the group with the smallest numeric value in Variable View ) 7. Select OK.

68 Chapter 4: Appendix Example: The old M&M s model for plain M&M s is 30% brown, 15% red, 25% yellow, and 10% of each of the remaining three colors: blue, green, orange. We will use the data set M&M s to create our new SPSS data set. Our null and alternative hypotheses are as follows H 0 : The M&M s model for color distribution in plain M&M s is correct. H a : The M&M s model for color distribution in plain M&M s is NOT correct. First, create a new SPSS Data Sheet where the first variable is color with values 1 through 6. In variable view under Values, enter value labels of brown, red, yellow, blue, green, and orange for the six numbers. The second variable is the observed count. In the M&M s data set, you will need to find the sum of all the M&M s of each color (Analyze>Descriptive Statistics>Frequencies, under Statistics check Sum ; the first box Statistics of the output gives the observed counts for each color of M&M s in the Sum row.)

Chapter 4: Appendix 69 Output: This table displays the observed counts of M&M s for each color and the counts we would have expected to see if the prescribed model for color distribution were true. The residual is the difference between the observed count and the expected count. This table displays the p-value of our test of significance (see the circle). The footnote a under the table tells you that all of the cells have expected cell counts of at least 5. The Chi-Square can have at most 20% of the cells in the table with an Expected Count less than 5. P-value: p =.012 <.05, reject H 0. Conclusion: We have enough evidence to show that the color distribution of M&M Mars plain M&M s does not follow the claimed model. Note: Chi-Square GOF doesn t have one-sided and two-sided p-values. Larger values for the Chi-Square statistic indicate observed values farther away from the expected values (null hypothesis model).

70 Chapter 4: Appendix 4-7 Chi Squared Tests Linear Trend Test Use if both independent and dependent variables are categorical and the groups being compared are independent. All variables with more than two categories are ordinal. Commands: 1. Analyze>Descriptive Statistics>Crosstabs 2. The Row(s) is the independent (explanatory) variable. 3. The Column(s) is the dependent (response) variable. 4. Under Statistics check Chi-square. Select Continue. 5. Under Cells check Expected in the Counts box. Check Row in the Percentage box. Select Continue. 6. Select OK. Example: Analyze the association between the variables attending the Gathering and being in Greek life from the data set Hope Student Survey 2008.sav. We will test to see if there is a linear trend between Gathering attendance and being involved in Greek life on Hope s campus. Our null and alternative hypotheses are as follows. H 0 : There is no linear trend between attending the Gathering and being in Greek life for Hope students. H a : There is a linear trend between attending the Gathering and being in Greek life for Hope students.

Chapter 4: Appendix 71 Output: This box gives the variables and the sample size (N) of the valid, missing, and total data. It appears that three students did not answer one of the questions about Greek life or Gathering attendance. This box gives the cross-tabulation table (see the circles). We see that 33 students ( Count ) are in Greek life and never attend the Gathering, which is 22.3% of students who never attend the Gathering. Also 16 students ( Count ) are in Greek life and occasionally attend the Gathering, which is 20.3% of students who occasionally attend the Gathering. Furthermore, 11 students ( Count ) are in Greek life and regularly attend the Gathering, which is 12.9% of students who regularly attend the Gathering. The Expected Count is the number of students in each Gathering attendance level we would expect to see involved in Greek life if the proportions of students in Greek life were the same for all three Gathering attendance levels (Never, Occasionally, and Regularly).

72 Chapter 4: Appendix Because the independent variable (Gathering attendance) is ordinal and the dependent variable (Greek life) only has two categories (Yes or No), we tested for a linear relationship. The p-value (.092) for the linear trend test is in the Linear-by-Linear Association row and in the Asymp. Sig (2-sided) column (see the circle). The footnote a under the table tells you that all of the cells have expected cell counts of at least 5. The Chi-Square test can have at most 20% of the cells in the table with an Expected Count less than 5. (The Expected Count can be seen in the cross-tabulation table.) P-value: p =.092 >.05, fail to reject H 0. Conclusion: We do not have enough evidence to show there is a linear trend between attending the Gathering and being in Greek life for Hope students.

Chapter 4: Appendix 73 McNemar Use if both variables are binomial categorical variables and the groups being compared are dependent (i.e. compare proportions of the same sample). Commands: 1. Analyze>Descriptive Statistics>Crosstabs 2. Put one variable in the Row(s) and the other in the Column(s). (Order does not matter.) 3. Under Statistics check McNemar. Select Continue. 4. Under Cells check Total in the Percentages box. Select Continue. 5. Select OK. Example: Analyze the association between the variables in a small group and in Greek life from the data set Hope Student Survey 2008.sav. We will test whether or not the proportion of students involved in Greek life is different from the proportion of students involved in a small group at Hope College. Our null and alternative hypotheses are as follows. H 0 : The proportion of Hope students in Greek life is the same as the proportion of Hope students in a small group. H a : The proportion of Hope students in Greek life is different from the proportion of Hope students in a small group. Output:

74 Chapter 4: Appendix This box gives the variables and the sample size (N) of the valid, missing, and total data. It appears that two students did not answer one of the questions about Greek life or a small group. This box gives the cross-tabulation table. We see that 60 students ( Count ) are in Greek life. Because Total (percentages) was selected, we know that 19.2% of Hope students are in Greek life (see the circles). Also, 86 students ( Count ) are in a small group, which is 27.5% of Hope students. This box gives us the p-value (.020). Because the sample for each group was the same (Hope students), we used the McNemar Test. Therefore, the p-value is in the McNemar Test row and in the Exact Sig. (2-sided) column (see the circle). P-value: p =.02 <.05, reject H 0. Conclusion: We do have enough evidence to show that the proportion of Hope students in Greek life (19.2%) is significantly different from the proportion of Hope students in a small group (27.5%).

Chapter 4: Appendix 75 Relative Risk Use if both independent and dependent variables are binomial categorical variables and the groups being compared are independent. Commands: 1. Analyze>Descriptive Statistics>Crosstabs 2. The Row(s) is the independent (explanatory) variable. 3. The Column(s) is the dependent (response) variable. 4. Under Statistics check Risk. Select Continue. 5. Select OK. Example: Analyze the association between the variables gender and in Greek life from the data set Hope Student Survey 2008.sav. We will calculate the relative risk ratios of being in Greek life or not between the genders. Output: This box gives the variables and the sample size (N) of the valid, missing, and total data.

76 Chapter 4: Appendix This box gives the cross-tabulation table. We see that 35 females are in Greek life. Also, 25 males ( Count ) are in Greek life (see the circles). This box gives the relative risk estimates. Notice it tells you that this is female/male. The row For cohort InGreekLife = No is the relative risk of females to males for not being in Greek Life (see the blue circle). Thus, we can say that females are 0.998 times more likely than males to not be in Greek life. Relative risk is most often reported as a number greater than one though. Therefore, we calculate 1/.998=1.002 and say that males are 1.002 times more likely than females to not be in Greek life. The relative risk for no is not usually reported, but rather the relative risk for yes (see below). The row For cohort InGreekLife = Yes is the relative risk of females to males for being in Greek life (see the red circle). Thus, we can say that females are 1.010 times more likely than males to be in Greek life. This coincides with our percentages from the cross-tabulation table: 19.1% of females and 18.9% of males are in Greek Life. This number is the one that is most often reported.