The Not-Even-Remotely Close to Being a Complete Guide to SPSS / PASW Syntax. (For SPSS / PASW v.18+)

Transcription

1 The Not-Even-Remotely Close to Being a Complete Guide to SPSS / PASW Syntax (For SPSS / PASW v.18+) Dr. Bryan R. Burnham Department of Psychology University of Scranton 1 of 49

2 Table of Contents 1. What is SPSS / PASW? Where is it Available? Finding and Opening PASW Three Types of PASW Files Data Files (Data View) Defining and Adjusting Variables in Data Files (Variable View) Basic Structure of Output Files Data Files Associated with this Guide The Syntax Editor Why Syntax? Because it s Better! Some Syntax Basics...It s Easy? Opening.sav files with Syntax Opening Microsoft Excel (.xls) Files with Syntax Opening Text (.txt) files with Syntax Syntax for Basic Statistical Needs Variable Labels Value Labels Frequencies Descriptive Statistics SORT CASES SPLIT FILE Correlation & Regression Pearson Correlations (Bivariate) Pearson Correlations (Partial) Univariate Regression (one regressor) t-tests One-Sample t-test Independent Groups t-tests Correlated Samples (Paired Samples) t-tests Analysis of Variance Oneway Analysis of Variance (via GLM) Between Subjects Factorial ANOVA (via GLM) Repeated Measures ANOVA (via GLM) Chi Square Cross-Tabulation Procedure (Factorial Chi-Square) Oneway Chi-Square Goodness of Fit Test Alternative Method for Goodness of Fit Test of 49

3 1. What is SPSS / PASW? Statistics Package for Social Scientists (SPSS) is a software tool for analyzing sets of data. I have absolutely no idea what the acronym PASW stands for. I wish it was PAWS, because it would be easier to say. Anyway, PASW is just the newest version of SPSS (currently in version 18). SPSS/PASW operate like a spreadsheet program, such as Microsoft Excel, and the data files look a lot like Excel. Unlike Excel, PASW/SPSS is designed for manipulating and analyzing data. As part of your course requirements, you will gain basic understanding of how to use PASW. Indeed, most statistical analyses are performed with PASW or some other software. Why do we teach you this stuff by hand, why not just use PASW? Simply put, it s because without conceptual knowledge of where the results of an analysis done with PASW come from, they re just a bunch of numbers in a computer file! Thus, we teach you what the variance of a set of data is and where it comes from by showing you how it s calculated. This way, variance should make sense when using PASW. If my logic doesn t make sense, drop out of the course and preferably out of college. :-) 1.1 Where is it Available? At the University of Scranton, SPSS / PASW is available in the Weinberg Memorial Library (WML) on the 1st floor and in group study rooms, Brennan Hall (BRN) rooms 102 and 201, McGurrin Hall (MGH) room 110, Hyland (HYL) Café and room 102 (where statistics classes are held), and Alumni Memorial Hall (AMH) rooms 214 and 202. It may be available in the PT/OT lab in the basement of Leahy Hall and in the Nursing Lab and the Stout Lab in McGurrin Hall Finding and Opening PASW From the Start Menu, All Programs SPSS Inc. PASW Statistics 18 PASW Statistics 18 (the red icon with the gray sigma symbol). Σ 1.3 Three Types of PASW Files There are three main files associated with PASW (and SPSS): 1. Data Files contain data to be analyzed, and have the extension '.sav'. Data files look a lot like a Microsoft Excel spreadsheet, with columns, rows and cells. Columns represent variables, with an abbreviated name of the variable at the top of each column. Rows represent cases, or research subjects. That is, each row/case could be the data associated with an individual, or a sample. The cells and values within the file are the data. (See Figures 1 & 2) 2. Syntax Files are used to request PASW conduct an analysis, and have the extension '.sps'. Hence, syntax files are command files that tell PASW what to do with data. I admit that most analyses and procedures in PASW can be obtained through the pull-down menus in the data file; but, syntax is better for reasons given later. Syntax files are similar to text editors where you insert text-based commands for PASW to interpret and, hopefully, run your requested analyses on the data. (See Figure 5) 3. Output Files are generated in response to PASW running an analysis on a set of data, and have the extension '.spv' (in SPSS the extension is '.spo'). Importantly, if something was written incorrectly in the syntax file, PASW will produce a Warning, usually with no additional output. Most of an output file is table-format, with the exception of graphs and charts. (See Figure 3) 1 Thanks to Dr. Barry Kuhle (University of Scranton) for compiling this list. 3 of 49

4 1.4 Data Files (Data View) There are two different 'views' of a PASW data file: 1. Data View, where your data can be entered by hand, and where you can view the actual values of the working data file. 2. Variable View, where you can define parameters of your variables, such as how many decimals are showing, whether the variable is a string, a date, or a numeric variable, etc. The figure below is a screen shot of the Data View in a blank PASW data file: Figure 1: Data View of a blank PASW data file. You can toggle between the Data View and the Variable View by clicking on the appropriate tab at the bottom left hand corner in any data file. You can also toggle back and forth between the Data View and the Variable View by double-clicking on any variable name. This amounts to double-clicking a column in Data View and double-clicking any row in Variable View. I will assume that you can figure out how to insert values into a data file, so I will not cover them here. 1.5 Defining and Adjusting Variables in Data Files (Variable View) If necessary, it is good to define the parameters of your variables first, so that when when you run an 4 of 49

5 analysis the output of any tables and graphs will be complete and understandable. Below is a screen shot of the Variable View in a blank PASW data file: Figure 2: Variable View of a blank PASW data file. Below, I've listed each of the parameters that can be seen at the top of each column in Variable View, with a brief description of what each parameter can do: NAME Refers to variables labels that you can enter, but must begin with a letter. TYPE Indicates whether a variable is numeric, a string, a date, etc. Clicking TYPE opens a dialogue box, in which you can specify the type of data contained in a variable. WIDTH Is how many numbers or letters is allowable for a value under a variable. DECIMAL The number of decimal places displayed for numeric variables. LABEL Allows you to assign a longer name to an abbreviated variable label in the data file. That is, you could 'name' a variable STAI, but 'label' the variable State Trait Anxiety Inventory at time 1. The abbreviated name appears under NAME, and the longer LABEL will appear on any tables or graphs in the output. 5 of 49

6 VALUES Allows you to assign dummy-codes to variable. For example, if your data file contains the variable Sex, a 0 could refer to males and 1 could refer to females. But, 0's and 1's are arbitrary unless they are defined. This packet will show you how to assign labels using syntax. MISSING Refers to what PASW should do with missing data entries. COLUMNS Refers to how many columns wide you want the variable name to appear in the Data View. Normally this is set to eight. ALIGN Allows you to have the values in each column left-justified, right-justified, or centered. MEASURE Relevant to numeric variables. Indicates the measurement scale of a variable. It allows three levels: nominal, ordinal and scale, which refers to both interval and ratio data. Most of these parameters are irrelevant for the time being. Later, you'll learn how to assign longer, more descriptive labels to a variable name, as well as dummy-code a variable. 1.6 Basic Structure of Output Files After you have opened a data file, written syntax commands to request an analysis, and then run that analysis; PASW will produce an output file, like that below: Figure 3: Example output file. 6 of 49

7 The output file is what we are trying to get PASW to provide us. It presents, in table or graph form, the descriptive and/or inferential statistics requested. As you can see in Figure 3, the output contains a single table with a listing of several descriptive statistics (N, Minimum, Maximum, Mean, Standard Deviation), for two different variables (SAT_CR and SAT_M). Don't worry about the variable names right now; trust me, you'll know what they are in a bit. Later, when you have PASW run an analysis on a set of data, I will not include whole screen shots of the output. Rather, I'll simply paste the output tables into the document. (Gotta conserve megabytes!) 1.7 Data Files Associated with this Guide The data file that will be used throughout most of this packet is 'GRE Therapy Data File.sav', and is available on my statistics course website ( on the course files page. There are actually three data files with the same name ('GRE Therapy Data File.sav'; 'GRE Therapy Data File.xls'; and 'GRE Therapy Data File.txt'). I'll show you how to open each of these types of data files using syntax, so download each file. Here's a screen shot of a portion the data file: Figure 4: A portion of the data file used in this packet. The file contains a set of data from a fictitious study that examined the influence of a new Study Drug and different Types of Tutoring on student scores on the Graduate Record Examination (GRE). The GREs are a set of standardized examinations, like the Scholastic Aptitude Tests (SATs). The GREs are required by most graduate school programs to be reported by applicants. The GREs contain three sections, like the SATs: (1) quantitative reasoning, (2) verbal reasoning, and (3) analytical writing. 7 of 49

8 In this fictitious study, researchers investigated whether two independent variables (Study Drug and Type of Tutoring) improved scores on each section of the GREs. For the independent variable Study Drug, subjects were given nothing (control group), a placebo (placebo group), or one of two different dosages of the drug (100 mg/day or 200 mg/day). For the independent variable Type of Tutoring, subjects were not tutored (control group), were tutored with other students in small groups (Group Tutoring), or were tutored one-on-one (Individual Tutoring). Subjects were tested at the beginning of the study during a pretest phase (before the independent variables were administered), and were tested several months later during a posttest phase (after the independent variables should have an influence). In addition to scores on each of the three sections of the GREs, there are a number of other variables included in the data set. Each subject's SAT scores were collected, their heights and weights were measured, and each subject was measured on their level of Trait Anxiety (enduring level of anxiety) and State Anxiety (temporary, situational anxiety). Trait and State anxieties were assessed using the State Trait Anxiety Inventory (STAI), during both the pretest and posttest phase. The table below lists the abbreviated NAME for each variable, along with a brief description of each variable. Variable NAME Description of Variable ID Sex Coll_Class Identification number assigned to each subject. Each subject's biological sex; dummy-coded, where 1 = male and 2 = female. Each subject's current year in college; dummy coded, where 1 = Freshmen, 2 = Sophomore, 3 = Junior, and 4 = Senior. Coll_Maj Each subject's primary major; dummy-coded, where 1 = Psychology, 2 = History, 3 = Biology, 4 = Communications, 5 = English, and 6 = Mathematics. Height_cm Weight_kg SAT_CR SAT_M SAT_V SAT_Tot GPA Drug_Group Tutor_Group Pre_STAIt Pre_STAIs Pre_GREv Pre_GREq Pre_GREa Post_STAIt Post_STAIs Post_GREv Post_GREq Post_GREa Each subject's height, measured to the nearest 0.1 cm. Each subject's weight, measured to the nearest 0.1 kg. Each subject's score on the Critical Reading (CR) section of the SATs. Each subject's score on the Mathematics (M) section of the SATs. Each subject's score on the Verbal (V) section of the SATs. Each subject's summed SAT score (SAT_CR + SAT_M + SAT_V) Each subject's current cumulative GPA. Level of the independent variable Drug Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no drug given), 2 = Placebo Group, 3 = 100-mg of Drug/Day, and 4 = 200-mg of Drug/Day. Level of the independent variable Tutor Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no tutoring), 2 = Group Tutoring, 3 = Individual Tutoring. Each subject's trait anxiety (t) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI). Each subject's state anxiety (s) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI). Each subject's score on the Verbal Reasoning (v) section of the GREs, during the pretest phase. Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the pretest phase. Each subject's score on the Analytical Writing (a) section of the GREs, during the pretest phase. Each subject's trait anxiety (t) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI). Each subject's state anxiety (s) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI). Each subject's score on the Verbal Reasoning (v) section of the GREs, during the posttest phase. Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the posttest phase. Each subject's score on the Analytical Writing (a) section of the GREs, during the posttest phase. Table 1: Variable NAMES and brief descriptions. 8 of 49

9 2. The Syntax Editor Looks and works like a text editor (Text Pad, Note Pad, Word Pad). You type in what you want PASW to do, in the correct sequence and using PASWs language, and PASW does what you asked it to do (hopefully). If anyone has ever done a little computer programming (C, C++, Matlab, etc.), then this is just like writing code; albeit much simpler code! PASW Syntax files have the file extension *.sps. Here s an example of what the text editor looks like: Figure 5: Example PASW syntax editor. Note, if you use SPSS, then you won't have the various colors and the numbers for each line. The inclusion of different colors for different syntax statements I the PASW structure is a huge improvement over SPSS. From here on out, I won't be pasting in screen shots of the syntax that we'll be using. Rather, I'll just be writing the syntax that you need to include in order to run a specific analysis or procedure. For example, rather than including a screen shop like Figure 5, I'll type out the syntax (with the appropriate colors and line numbers). Note, that you do not have to type out line numbers. Thus, the syntax in Figure 5 will appear as (see top of next page): 9 of 49

10 1 GET DATA 2 /TYPE=XLS 3 /FILE='C:\Documents and Settings\burnhamb2\My Documents\Class Materials\PSYC 210'+ 4 'Statistics\SPSS Assignments\SPSS-PASW Packet\GRE Therapy Data File.xls' 5 /SHEET=name 'Sheet1' 6 /CELLRANGE=full 7 /READNAMES=on 8 /ASSUMEDSTRWIDTH= DATASET NAME DataSet 2 WINDOW=FRONT Don't worry about what all of this means right now, it will make sense in a little while. :-) 2.1 Why Syntax? Because it s Better! There are two methods that can be used to have PASW do stuff: (1) using pull-down menus, (2) telling PASW what to do by writing syntax commands. (I ll refer to these as the wrong-way and right-way, respectively.) Is the syntax-method easier? No, but it s much more useful, for a variety of reasons. First, you can do more within one syntax file and in a shorter time than with the pull-down menu method. Specifically, you can plan out all of the stuff you need PASW to do, write the appropriate syntax for everything, and then run it all at once. In contrast, with pull-down menus you have to do one thing at a time. Second, you can do more with syntax. There are certain procedures that are simply not possible with the pulldown menus, but that are possible with syntax. Third (and certainly not finally), if you go to grad school, especially in the sciences, you ll need to learn programming. I m giving you a head start. You re welcome! 2.2 Some Syntax Basics...It s Easy? PASW syntax is not case-sensitive, except for variable names. Remember: variable names are case sensitive. If you spell a variable's name correctly, but forget to capitalize a letter or make a letter lowercase, the syntax will not run. I suggest writing commands and sub-commands in CAPS to help distinguish between commands and variables. This will allow you to parse the syntax quickly, especially if you write variable names in lowercase and uppercase. Syntax commands and sub-commands should be entered on separate lines, or ended with a period (.), but not every syntax line has to end with a period, just the overall procedures. That is, if you look at the syntax in Figure 5, there is a period only on Line 8. This is because lines 1-8 are, collectively, asking PASW to retrieve a data file; hence, these eight lines encompass one whole pocedure. Sub-commands within a command procedure, and parts of a command that appear on different lines, must start with a forward-slash (/), not a backward slash. PASW will not know what to do with such sub-commands if the forward slash is not entered. For example, if you look at Figure 5, you can see a forward slash beginning lines 2,3,5,6,7, and 8 (there is no slash in line 4, because line 4 is a continuation of line 3). It is good to enter 'EXECUTE.' at the end of a command procedure. Some commands will not run without this terminator command. Unfortunately, I have never figured out which commands will and will not run with and without this ending statement. 10 of 49

11 Once your syntax is written, you need to run it in order to generate an output file. Highlight the syntax that you want to run and hit Ctrl+R to run the procedures. Or, instead of hitting Ctrl+R, click the Run Button on the toolbar. The Run Button is the green rightward-pointing arrow in the middle. 2.3 Opening.sav files with Syntax I admit that if you have a PASW data file already created, you can really just locate that file and double click to open. Nonetheless, here's how to open a PASW data file using syntax (notes follow): 1 GET 2 FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.sav'. 3 DATASET NAME DataSet1 WINDOW=FRONT. The file directory address in line 2 will differ, depending on where the file is placed on your hard drive. In this case, I placed the file on the Desktop for easy access. Note that the directory address for the file must be contained in single quotes ('). DATASET NAME on line 3 should just be set to DataSet1 as listed. An output file will be generated when you run any syntax. When opening a data set, the output file will contain only the commands that led to the opening of the file. You can delete that output file. 2.4 Opening Microsoft Excel (.xls) Files with Syntax Below is an example of the syntax needed to open a data file saved as a Microsoft Excel spreadsheet: 1 GET DATA 2 /TYPE=XLS 3 /FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.xls' 4 /SHEET=name 'Sheet1' 5 /CELLRANGE=full 6 /READNAMES=on 7 /ASSUMEDSTRWIDTH= DATASET NAME DataSet1 WINDOW=FRONT. Notice that Line 1 here and Line 1 for opening a PASW data file are the same (GET DATA). You can think of this statement as the 'major command' that are you asking PASW to perform; all of the additional lines are sub-commands. When opening an Excel spreadsheet, special care must be taken that you are asking PASW to open the correct sheet within the workbook (usually Sheet1), that you are asking for the correct cells in the worksheet, and that you have asked PASW to read in any variable names in the spreadsheet. The sub-command on Line 2 (/TYPE) lists XLS, which is the file extension for Microsoft Excel files. On Line 4 (/SHEET=name), the name between the single quotes ('Sheet1') is the name of the worksheet within the Excel workbook where the data is located. If the data sheet in the workbook has a different name or number, this needs to be changed here. Line 5 (/CELLRANGE=full), refers to which cells within the named workbook sheet that are to be imported into PASW. If all of the cells with data are to 11 of 49

12 be imported, just use 'full', but if only some of the cells are to be imported, this should be indicated here (e.g., A1:B200). On Line 6 (/READNAMES=on), this tells PASW that the first row of the Excel sheet contains the names of the variables, and these should be treated as variable names. If the Excel book does not include variable names, then 'off' should be substituted for on. 2.5 Opening Text (.txt) files with Syntax Below is an example of the syntax necessary to open a data file that is saved as a text file: 1 GET DATA 2 /TYPE=TXT 3 /FILE="C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.txt" 4 /DELCASE=LINE 5 /DELIMITERS="\t" 6 /ARRANGEMENT=DELIMITED 7 /FIRSTCASE=2 8 /IMPORTCASE=ALL 9 /VARIABLES= 10 ID F Sex F Coll_Class F Coll_Maj F Height_cm F Weight_kg F SAT_CR F SAT_M F SAT_V F SAT_Tot F GPA F Drug_Group F Tutor_Group F Pre_STAIt F Pre_STAIs F Pre_GREv F Pre_GREq F Pre_GREa F Post_STAIt F Post_STAIs F Post_GREv F Post_GREq F Post_GREa F CACHE. 34 EXECUTE. 35 DATASET NAME DataSet4 WINDOW=FRONT. First thing, I have no idea why the lines are not colored; I was surprised myself. This set of syntax is a bit longer, mainly because you need to tell PASW to read in each variable name form the text file (Lines 10 32). Like the PASW syntax for importing data in an Excel spreadsheet, you need to be careful to include certain commands. 12 of 49

13 On Line 2 (/TYPE=TXT), the TXT is the file extension for text files. On Line 4 (/DELCASE=LINE), this is telling PASW that each new case (i.e., each subject) is a different line (row) within the text file. On Line 5 (/DELIMITERS="\t"), 'delimiters' define the boundaries between adjacent entries, that is, data points in a data file. The \t is telling PASW that the boundaries are defined by TABS. On Line 7 (/FIRSTCASE=2), this is telling PASW that the data in the text file actually begin on line 2; that is, the first case (subejct) is on line 2 of the data file. On Line 8 (/IMPORTCASE=ALL), this is telling PASW to import all of the data. This can be changed is you only want to import some of the data file. Lines list the labels of each variable in the data set. These variable labels actually appear on line 1 of the data set. Once you have opened a data set, you should save it as a PASW data file to be used in the future. Then, you can just double click it open. Throughout the reminder of this packet, when I am providing syntax examples or the output of a procedure, I am not going to provide too much commentary. I'd rather you explore the output and the syntax on your own to get a feel for everything. 13 of 49

14 3. Syntax for Basic Statistical Needs 3.1 Variable Labels In the data file, the NAME given to each variable is a short acronym. For example, 'ID' stands for 'Identification Number', 'Coll_Maj' stands for 'College Major', 'SAT_CR' stands for 'Critical Reading Score on the SATs', etc. So that you do not have to memorize each of these acronyms, it's a good idea to assign a LABEL to each variable. These VARIABLE LABELS do not show up in the data file, but will show up in an output file. Here is how to use the VARIABLE LABELS syntax to assign the label 'SAT Critical Writing Score' to SAT_CR (remember, you do not type the number at the beginning): 1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score'. All that you need to do is to list the variable NAME (SAT_CR) followed by the LABEL you wish to assign (SAT Critical Writing Score). Be sure that the label is in single quotes. You can also assign labels to more than one variable at a time: 1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score' SAT_M 'SAT Math Score'. 3.2 Value Labels For independent variables that have several levels/groups, it is best to dummy-code those groups in the data file. That is, in the data file, male subjects and female subjects will not be called 'male' and 'female'; rather, they will be assigned arbitrary numbers. In the data file for this packet, for the variable 'Sex', males are assigned 1 and females are assigned 2. The numbers can be anything, as long as all males have the same number, and all females have the same number. The reason, is that if you want to compare levels/groups of an independent variable, PASW requires they have numeric labels. The downside, is that if you run an analysis that involves those groups/levels, only the arbitrary numbers will appear in the output. You'd have to memorize what the label 1 means for the variable Sex, versus what the label 1 means for another independent variable. But, you can assign LABELS to the dummy-code VALUE assigned to groups. These VALUE LABELS will not show in the data file, but do show in output. Here is an example of how to use the VALUE LABELS syntax to assign labels to the dummy-coded males and females for the variable Sex: 1 VALUE LABEL Sex 1 'Males' 2 'Females'. If you want to assign labels to more than one independent variable at a time, it is best to use several individual commands: 1 VALUE LABEL Sex 1 'Males' 2 'Females'. 2 VALUE LABEL Coll_Class 1 'Freshmen' 2 'Sophomore' 3 'Junior' 4 'Senior'. 3 VALUE LABEL Coll_Maj 1 'Psychology' 2 'History' 3 'Biology' 4 'Communications' 5 'English' 6 14 of 49

15 4 'Mathematics'. 5 VALUE LABEL Drug_Group 1 'Control Group (no drug)' 2 'Placebo Group' 3 '100 mg/day 6 Group' 4 '200 mg/day Group'. 7 VALUE LABEL Tutor_Group 1 'Control Group (no tutoring)' 2 'Group Tutoring' 3 'Individual 8 Tutoring'. In the data file, I have assigned VALUE LABELS to each independent variable. Hence, when output is presented later in this packet, the groups will not have dummy-codes, they have the labels assigned from the syntax above. 3.3 Frequencies The FREQUENCIES command is used to obtain a frequency table for a variable. The syntax below asks PASW to determine the frequency for each group within the variables Sex and Coll_Class. Note that the variable names have to be entered just as they appear at the top of the columns in the data file. Also, note that you can request frequencies for several variables at once. This is typical for most PASW commands: you can request a procedure for several variables simultaneously: 1 FREQUENCIES VARIABLES=Sex Coll_Class 2 /ORDER=ANALYSIS. The syntax above provides the following output (comments were added by me): Statistics Sex Coll_Class Coll_Maj N Valid Missing How many cases (subjects) that contribute to each of the three variables. Frequency Table Sex Each group that contributes to each variable is listed to the left Frequency Percent Valid Percent Cumulative Percent Valid Males Females Total Coll_Class Frequency Percent Valid Percent Cumulative Percent Valid Freshmen Sophomore Junior Senior Total of 49

16 3.4 Descriptive Statistics Although descriptive statistics can be requested as a sub-command within many PASW commands, there is a specific DESCRIPTIVES command. Like the FREQUENCIES command, you can request descriptive statistics for several variables at the same time. In the syntax below, I requested PASW to compute descriptive statistics on the variables Height_cm and Weight_kg: 1 DESCRIPTIVES VARIABLES=Height_cm Weight_kg 2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS 3 SKEWNESS. You can request a variety of descriptive statistics. On Lines 2 and 3, I listed each descriptive statistic that can be requested; most should be self-explanatory, except for 'SEMEAN', which stands for standard error of the mean, and KURTOSIS and SKEWNESS, which refer to the peakedness of a distribution and the skewness of a distribution, respectively. In the output that follows, I did not request the KURTOSIS and the SKEWNESS statistics: Each requested variable is listed in a different column. Descriptive Statistics Std. N Range Minimum Maximum Sum Mean Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm Weight_kg Valid N 240 (listwise) Each variable is listed in the far left column. 3.5 SORT CASES If you want to sort all of the cases in the data file in ascending or descending order, based on a certain variable, the following SORT CASES command is used. The syntax below asks PASW to arrange the data file in ascending order (A) based on the variable Coll_Class. In the data file, freshmen will appear first, then sophomores, followed by juniors, and finally seniors. If you want to sort in descending order, use (D) in place of (A). (There is no output for this syntax command.) 1 SORT CASES BY Coll_Class(A). 16 of 49

17 3.6 SPLIT FILE I section 3.4 above, where PASW was asked to calculate descriptive statistics, each statistic was based on the n = 240 subjects in the data file. There is nothing wrong with this, but what if you wanted to look at the means and descriptive statistics for different groups? For example, you may want to look at students' mean weights and mean heights for each college class. But, the output in section 3.4 includes data combined from across all four college classes. Luckily, PASW has a SPLIT FILE command that asks PASW to calculate descriptive statistics for different groups within some independent variable. For example, say you wanted to examine the descriptive statistics by college class. First, you need to use the following syntax to 'split' the output file into different groups: 1 SORT CASES BY Coll_Class. 2 SPLIT FILE SEPARATE BY Coll_Class. T he variable by which you want the output 'split' into different groups is listed here. Next, run the same DESCRIPTIVES syntax in Section 3.4: 1 DESCRIPTIVES VARIABLES=Height_cm Weight_kg 2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS 3 SKEWNESS. You will get the following output, which is the descriptive statistics performed on each group within the variable Coll_Class: Coll_Class = Freshmen Descriptive Statistics N Range Minimum Maximum Sum Mean Std. Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm Weight_kg Valid N (listwise) 57 Coll_Class = Sophomore Descriptive Statistics N Range Minimum Maximum Sum Mean Std. Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm Weight_kg Valid N (listwise) of 49

18 Coll_Class = Junior Descriptive Statistics N Range Minimum Maximum Sum Mean Std. Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm Weight_kg Valid N (listwise) 63 Coll_Class = Senior Descriptive Statistics N Range Minimum Maximum Sum Mean Std. Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm Weight_kg Valid N (listwise) 55 When you're done using the SPLIT FILE COMMAND, don't forget to turn it off; or else all of your output will be separated into different groups: 1 SPLIT FILE OFF. 18 of 49

19 4. Correlation & Regression 4.1 Pearson Correlations (Bivariate) PASW can measure the statistical association between two variables in a variety of ways (e.g., Pearson correlation, Spearman correlation, Chi-Square, gamma coefficients). For the data in our file, we'll be dealing with how PASW can calculate the Pearson correlation between two variables. The CORRELATIONS syntax below asks PASW to calculate the Pearson correlation between the variables SAT_CR (SAT Critical Writing Score) and SAT_M (SAT Math Score). All that you need to do is to list on Line 2 the variables between which you want the Pearson correlation measured: 1 CORRELATIONS 2 /VARIABLES= SAT_CR SAT_M 3 /PRINT=TWOTAIL NOSIG 4 /MISSING=PAIRWISE. On Line 3, the TWOTAIL sub-command tells PASW to run the inferential test on the Pearson correlation as a non-directional, two-tailed test. NOSIG asks PASW to indicate which correlations are statistically significant with an asterisk (*). On Line 4, the /MISSING=PAIRWISE sub-command tells PASW what to do with any missing data points. (In this data file, there are no missing data.) If you have a missing data point, PASW must know what to do with that subject's data. You have two options: handle missing data PAIRWISE or LISTWISE. If you choose LISTWISE, any subject who has a missing data point for any variable will be excluded from all correlations. If you choose PAIRWISE, a subject will be excluded from only those correlations where the subject is missing a data point. When you run the syntax above, you get the following output: Correlations SAT_CR SAT_M SAT_CR Pearson Correlation Sig. (2-tailed).340 N SAT_M Pearson Correlation Sig. (2-tailed).340 N Each variable is listed in its own column and own row. To find the Pearson correlation between two variables, cross-reference one variable in the columns with the other variable in the rows. The Sig. (2- tailed) value under the Pearson correlation is the p-value for that correlation. It is the exact alpha-level (α) associated with that size correlation (r = -.062) based on that sample size (n = 240). To interpret a p-value: if the listed p-value is less than your chosen alpha-level, which is generally α =.05 or less, then the correlation is significant. In this case, the Pearson correlation is not significant, because the p-value (p =.340) is greater than of 49

20 It is also possible to calculate several Pearson correlations at the same time. The more variables that you list on the /VARIABLES sub-command line, the more correlations will be calculated. For example, in the syntax below, I have listed three variables (SAT_CR, SAT_M, and SAT_V). When I run this syntax, PASW will generate the Pearson correlation between each pair of variables: 1 CORRELATIONS 2 /VARIABLES= SAT_CR SAT_M SAT_V 3 /PRINT=TWOTAIL NOSIG 4 /MISSING=PAIRWISE. Correlations SAT_CR SAT_M SAT_V SAT_CR Pearson Correlation ** Sig. (2-tailed) N SAT_M Pearson Correlation Sig. (2-tailed) N SAT_V Pearson Correlation.481 ** Sig. (2-tailed) N You can see in the output above, in addition to the correlation between SAT_CR and SAT_M that was calculated earlier, PASW also calculated the correlation between SAT_CR and SAT_V (r =.541), and between SAT_M and SAT_V (r = -0.48). PASW also has a sub-command that allows you to request descriptive statistics to be calculated for each variable, and for the sums of squares, variances, sums of cross products, and covariances to be calculated. On line 4 of the syntax below, the DESCRIPTIVES command requests the means and standard deviations for each variable, and the XPROD command requests the variability and covariability measures: 1 CORRELATIONS 2 /VARIABLES=SAT_CR SAT_M SAT_V 3 /PRINT=TWOTAIL NOSIG 4 /STATISTICS DESCRIPTIVES XPROD 5 /MISSING=PAIRWISE. 20 of 49

21 Here is the output from the last set of syntax. The first table includes the descriptive statistics for each variable, and the second table includes the person correlations, measures of variability, and measures of co-variability: Descriptive Statistics Mean Std. Deviation N SAT_CR SAT_M SAT_V Between two different variables, this is the sum of cross products. Between the same variable, this is the sum of squares. Correlations SAT_CR SAT_M SAT_V SAT_CR Pearson Correlation ** Sig. (2-tailed) Sum of Squares and Cross-products Covariance N SAT_M Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N SAT_V Pearson Correlation.481 ** Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Between two different variables, this is the covarance. Between the same variable, this is the variance. 4.2 Pearson Correlations (Partial) Having PASW calculate the partial correlation between two variables (the correlation between two variables with the influence of other variables factored out from both variables), is not much different than asking PASW to calculate a raw (zero-order) correlation. For example, say you want to calculate the partial correlation between GPA and Pre_GREv scores (Pretest GRE Verbal Reasoning Scores), while factoring out the SAT_CR scores (Critical Reasoning Scores on the SAT) from both variables. 21 of 49

22 In the syntax below, on the /VARIABLES sub-command line, the two variables listed before the BY (GPA and Pre_GREv) are the variables between between which we want to calculate a partial correlation. The variable that comes after the BY (SAT_CR) is the variable we want factored out of the other variables. Please note that you can ask PASW to factor out more than one variable: 1 PARTIAL CORR 2 /VARIABLES=GPA Pre_GREv BY SAT_CR 3 /SIGNIFICANCE=TWOTAIL 4 /STATISTICS=DESCRIPTIVES CORR 5 /MISSING=LISTWISE. On Line 3, the /SIGNIFICANCE=TWOTAIL asks PASW to run the inferential test on the partial correlation as a non-directional, two-tailed test. You have the option of selecting a ONETAIL test as well. On line 4, the /STATISTICS sub-command is asking PASW to calculate the descriptive statistics (DESCRIPTIVES) for each variable. The CORR sub-command is asking PASW to provide the raw Pearson correlations between each pair of variables, in addition to the partial correlation between GPA and Pre_GREv. Here is the output from the syntax above. The first table reports the descriptive statistics, and the second table is the correlations and partial correlations. The areas in yellow are the raw Pearson correlations, and the areas in green are the partial correlations: Mean Std. Deviation N GPA Pre_GREv SAT_CR Correlations Control Variables GPA Pre_GREv SAT_CR -none- a GPA Correlation Significance (2-tailed) df Pre_GREv Correlation Significance (2-tailed) df SAT_CR Correlation Significance (2-tailed) df SAT_CR GPA Correlation Significance (2-tailed)..074 df Pre_GREv Correlation Significance (2-tailed).074. df of 49

23 4.3 Univariate Regression (one regressor) There is a mountain of stuff that you can do with PASWs REGRESSION procedures, including how a regression analysis is performed and what statistics can be requested. Below, I am performing a 'barebones' REGRESSION analysis to keep things simple. The analysis below will regress (predict) GPA on the Summed SAT Scores (SAT_tot). Hence, GPA is the dependent variable (Y) and SAT_tot is the predictor variable (X). In the syntax below, PASW is being asked to regress GPA on SAT_tot. The DEPENDENT (predicted, or regressed) variable is listed on Line 6. The predictor (independent, or regressor) variable is listed on Line 7 after the?method sub-command. A few notes on Line 7: First, if you have more than one predictor, each predictor would be entered here. In this example we have only one predictor (SAT_tot). Second, there are a number of methods that you can use to have PASW conduct the analysis (ENTER, STEPWISE, etc.), but this is beyond the scope of this packet. Just use METHOD=ENTER: 1 REGRESSION 2 /MISSING LISTWISE 3 /STATISTICS COEFF OUTS R ANOVA 4 /CRITERIA=PIN(.05) POUT(.10) 5 /NOORIGIN 6 /DEPENDENT GPA 7 /METHOD=ENTER SAT_Tot. The /STATISTICS sub-command on Line 2 is where you can ask PASW to provide various statistics and inferential tests as part of the regression analysis. COEFF requests the slope and intercept coefficients in the regression model. OUTS asks PASW to list any predictors that were entered into the regression model, but were not included due to their not meeting criteria specified on Line 4. 'R' asks for the R and R 2 values of the regression model. ANOVA ask for the analysis of variance to be conducted on the overall regression model. On Line 4, the /CRITERIA=PIN(.05) POUT(.10) are inclusion and exclusion criteria for each regressor coefficient that is initially entered into the model. Basically, if a regressor coefficient does not meet these set criteria, which are based on the t-tests for the coefficients, they are not included in the final regression model. These values can be adjusted, but the.05 and.10 are used by default. When you run the syntax above, you get the following output: Variables Entered/Removed b Model Variables Entered Variables Removed Method Model 1 SAT_Tot a. Enter Model Summary R R Square Adjusted R Square Std. Error of the Estimate a T his table simply lists the predictor variables that are being entered into the regression analysis. T his table provides the R and R 2 values. T he R 2 is the proportion of explained variance. 23 of 49

24 ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression a Residual Total T he ANOVA is the overall analysis of the regression model. Model Coefficients a Unstandardized Coefficients Standardized B Std. Error Beta T his table provides the values of the coefficients in the regression equation, as well as t-tests on each coefficient. Coefficients t Sig. 1 (Constant) SAT_Tot You can also ask PASW to report descriptive statistics for each variable, correlations between variables, and a host of other information. In the syntax below, I added a /DESCRIPTIVES subcommand on Line 2 that asks for the MEAN and standard deviation (STDEV) for each variable, the Pearson correlation (CORR) between each pair of variables, that a significance test (SIG) be performed on each correlation, and for the number of subjects (N) contributing to each variable and to each correlation: 1 REGRESSION 2 /DESCRIPTIVES MEAN STDDEV CORR SIG N 3 /MISSING LISTWISE 4 /STATISTICS COEFF OUTS R ANOVA ZPP 5 /CRITERIA=PIN(.05) POUT(.10) 6 /NOORIGIN 7 /DEPENDENT GPA 8 /METHOD=ENTER SAT_Tot. I also added ZPP to the /Statistics sub-command on Line 4. This asks PASW to calculate the zeroorder, partial, and semi-partial correlations between every pair of variables. In this case, because no variable is being factored out of the relationship between GPA and SAT_tot, each of these correlations will be the same. The output from this syntax appears below and on the next page: Descriptive Statistics Mean Std. Deviation N GPA SAT_Tot T he requested descriptive statistics for each variable. 24 of 49

25 Lists the requested correlations and p-values. Correlations GPA SAT_Tot Pearson Correlation GPA SAT_Tot Sig. (1-tailed) GPA..000 SAT_Tot.000. N GPA SAT_Tot Variables Entered/Removed b Model Variables Entered Variables Removed Method 1 SAT_Tot a. Enter Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression a Residual Total Coefficients a Model Unstandardized Coefficients Standardized Coefficients t Sig. Correlations B Std. Error Beta Zero-order Partial Part 1 (Constant) SAT_Tot Here are the requested zero-order, partial, and semi-partial correlations. 25 of 49

26 5. t-tests 5.1 One-Sample t-test There are three t-tests PASW can perform on a set of data: one-sample t-test, independent-groups t- Test (independent-samples t-test), and correlated samples t-test (paired-samples t-test). But, the statistics that can be requested and the test parameters that you can control are very limited. The syntax below asks PASW to run a one-sample t-test. The dependent variable is GPA, which is entered on the /VARIABLES sub-command on Line 4: 1 T-TEST 2 /TESTVAL=3 3 /MISSING=ANALYSIS 4 /VARIABLES=GPA 5 /CRITERIA=CI(.95). Importantly, for the one-sample t-test, you must state a value to which the mean of the dependent variable is compared. This value is entered after the /TESTVAL sub-command on Line 2. In this case, PASW is being asked to compare the mean GPA to a value of 3, which coincides with a grade of 'B'. The /CRITERIA sub-command on Line 5 is pretty much all you have control over, besides the /TESTVAL on Line 2. The CI value tells PASW what size confidence interval and what alpha-level to use in the t-test. In this case,.95 corresponds to the 95% confidence interval, and alpha level of.05. If you run the syntax above, you get the following in the output file: One-Sample Statistics N Mean Std. Deviation Std. Error Mean GPA T his table presents the descriptive statistics for the dependent variable. T his table presents the results of the inferential, one-sample t-test. One-Sample Test Test Value = 3 t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the Difference Lower Upper GPA In the table for the One-Sample Test above, the Sig. (2-tailed) value is the p-value used as a basis for determining statistical significance. If it is less than your chosen alpha level (α =.05, or less), then the 26 of 49

27 difference between the mean ( ) and the test value (3) is significant. In this case, the difference is not significant, because.803 >.05. The values underneath the heading 95% Confidence Interval of the Difference are the upper and lower boundaries for the 95% confidence interval around the difference between the mean and the test value ( ). As another example, the syntax below asks PASW to compare the mean pretest score on from the Analytical Writing section of the GREs (Pre_GREa) to a test value of 4.9. This test value of 4.9 is actually the national mean score on that section of the GREs: 1 T-TEST 2 /TESTVAL=4.9 3 /MISSING=ANALYSIS 4 /VARIABLES=Pre_GREa 5 /CRITERIA=CI(.95). Running this syntax, we get the following in the output file: One-Sample Statistics N Mean Std. Deviation Std. Error Mean Pre_GREa One-Sample Test Test Value = 4.9 t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the Difference Lower Upper Pre_GREa In this case, the One-Sample Test indicates that the mean difference (-.785) is statistically significant, because the p-value in the Sig. (2-tailed) column is less than the conventional alpha-level of α = Independent Groups t-tests The syntax on the next page illustrates how to conduct an independent groups t-test. Note that when comparing two different groups or levels within a between-subjects independent variable, you must be sure that the groups/levels of that independent variable have been dummy-coded; that is, assigned numeric values in the data file. PASW will not run the independent groups t-test if the groups have been assigned descriptive (string) labels in the data file. Say that we want to compare the mean posttest score on the Verbal Reasoning Section of the GREs between different levels of the independent variable Tutor_Group. Specifically, we want to compare mean performance between the group of subjects who did not receive tutoring (Control Group) and the group of subjects who received individual tutoring (Individual Tutoring Group). Recall, within the 27 of 49

28 independent variable Tutor_Group, the group that did not receive tutoring was dummy-coded with 1 and the group that received individual tutoring was dummy-coded with 3 (the group that received group tutoring was dummy coded with 2). In the syntax below, after the T-TEST command, the GROUPS sub-command is listed. In the parentheses, the 1 and 3 are the values that were assigned to the no tutoring group and the individual tutoring group, respectively. The dependent variable (Post_GREv) is listed after the /VARIABELS subcommand on Line 3: 1 T-TEST GROUPS=Tutor_Group(1 3) 2 /MISSING=ANALYSIS 3 /VARIABLES=Post_GREv 4 /CRITERIA=CI(.95). When you run this syntax, you get the following output: T his table presents the descriptive statistics on the dependent variable for each group within the independent variable. Group Statistics Tutor_Group N Mean Std. Deviation Std. Error Mean Post_GREv Control Group (no tutoring) Individual Tutoring Post_GREv Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. t df Sig. (2- tailed) T his table presents the results of the independent groups t-test that is comparing the means in the table above. t-test for Equality of Means Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper The table Independent Samples Test lists a lot of information, some of which is relevant, some of which is less relevant. First, you will almost always assume equal variances, so be sure to use information from those rows. Second, Levene's Test for Equality of Variances is a test for whether the variances of the groups being compared are statistically equivalent. If Levene's Test is not significant, which is the case here, then we can assume that the variances are indeed equal. The information under the heading t-test for Equality of Means is relevant to the independent groups t- Test on the data and most of the terms should be self-explanatory. Importantly, the Sig. (2-tailed) value is the p-value used for determining statistical significance. If it is less than a chosen alpha level (α =. 05, or less), then the mean difference ( ) is significant, which is the case here. Please note that the mean difference is negative because of how the groups were entered into the t-test in the syntax. That is, the no tutoring group was entered first in the syntax and the individual tutoring group was 28 of 49

29 entered second. This means that PASW will subtract the individual tutoring mean from the no tutoring mean. Thus, this value is negative only because of how the groups are being entered; it has nothing to do with any hypotheses. A nice feature about the PASW independent groups t-test procedure is that you can run several t- Tests that are comparing performance between the same two groups. For example, let's say we also want to compare mean posttest score on the Analytical Writing Section of the GREs between the no tutoring group and the individual tutoring group. All that you have to do is add this dependent variable on the /VARIABLES sub-command on Line 3: 1 T-TEST GROUPS=Tutor_Group(1 3) 2 /MISSING=ANALYSIS 3 /VARIABLES=Post_GREv Post_GREa 4 /CRITERIA=CI(.95). Running this syntax, we get the following output: Group Statistics Tutor_Group N Mean Std. Deviation Std. Error Mean Post_GREv Control Group (no tutoring) Individual Tutoring Post_GREa Control Group (no tutoring) Individual Tutoring Post_GREv Post_GREa Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. t df Sig. (2- tailed) t-test for Equality of Means Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper of 49

30 5.3 Correlated Samples (Paired Samples) t-tests Recall, that the correlated samples t-test is used to compare performance on some dependent variable across levels of a within-subjects independent variable. In PASW, the correlated samples t- Test is called the paired samples t-test. As was the case for the one-sample t-test and independent groups t-tests, there is not much control over what you can request for the paired samples t-test. Say we want to compare trait-anxiety levels between the pretest and posttest periods. Recall, in the hypothetical study, the researchers measured each subject's state anxiety and trait anxiety using the State-Trait Anxiety Inventory (STAI), and these types of anxiety were measured during the pretest and posttest periods. If we are interested in, specifically, the change in trait anxiety between the pretest and posttest periods, we're going to want to compare the Pre_STAIt mean with the Post_STAIt mean. The syntax below asks PASW to compare Pre_STAIt with Post_STAIt scores. On the T-TEST command line, the PAIRS sub-command tells PASW to run a paired samples t-test. The levels of the variable being compared come before and after the WITH. Thus, Line 1 is basically telling PASW to compare Pre_STAIt scores WITH Post_STAIt scores using a PAIRED samples t-test: 1 T-TEST PAIRS=Pre_STAIt WITH Post_STAIt (PAIRED) 2 /CRITERIA=CI(.95) 3 /MISSING=ANALYSIS. Running this syntax, you get the following output: Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Pre_STAIt Post_STAIt Paired Samples Correlations N Correlation Sig. Pair 1 Pre_STAIt & Post_STAIt Pair 1 Pre_STAIt - Post_STAIt Mean Std. Deviation Paired Samples Test Paired Differences t df Std. Error 95% Confidence Interval of Mean the Difference Lower Upper Sig. (2- tailed) In the table Paired Samples Test, most of the statistics should be familiar and straightforward. Mean is the mean difference in the dependent variable between the levels of the independent variable. Note 30 of 49

31 that this value is positive because of how PASW entered the levels of the independent variable into the t-test. In the syntax, Pre_STAIt was entered before WITH and Post_STAIt was entered after WITH; hence, the Post_STAIt mean was subtracted from the Pre_STAIt mean. Thus, it is positive only because of how the levels were entered. Ion this example, the mean difference (.175) is not statistically significant, because the p-value (.163) is greater than.05. You can request several Paired Samples t-tests at the same time. For example, in addition to comparing the Pre_STAIt mean with the Post-STAIt mean, say we also want to compare the pretest and posttest scores from the Quantitative Reasoning section of the GREs (Pre_GREq compared to Post_GREq). In the syntax below, two variables are listed before WITH (Pre_STAIt and Pre_GREq) and two variables are listed after WITH (Post_STAIt and Post_GREq): 1 T-TEST PAIRS=Pre_STAIt Pre_GREq WITH Post_STAIt Post_GREq (PAIRED) 2 /CRITERIA=CI(.95) 3 /MISSING=ANALYSIS. When the syntax is run, PASW will compare the mean of the first variable before WITH (Pre_STAIt) with the mean of first variable after WITH (Post_STAIt); and PASW will compare the mean of the second variable before WITH (Pre_GREq) with the mean of second variable after WITH (Post_GREq). Thus, it is critical to enter the variables on each side of the WITH in the appropriate order when running several paired-samples t-tests. Running this syntax provides the following output: Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Pre_STAIt Post_STAIt Pair 2 Pre_GREq Post_GREq Paired Samples Correlations N Correlation Sig. Pair 1 Pre_STAIt & Post_STAIt Pair 2 Pre_GREq & Post_GREq Pair 1 Pre_STAIt - Post_STAIt Pair 2 Pre_GREq - Post_GREq Paired Samples Test Paired Differences t df Sig. (2- tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper of 49

32 6. Analysis of Variance 6.1 Oneway Analysis of Variance (via GLM) Analysis of Variance (ANOVA) is used, for among other reasons, to compare performance on a dependent variable across two or more levels of one or more independent variables. Oh the things I could say about ANOVA and experimental design! Alas, we do not have time. The PASW procedure for ANOVA is the General Linear Model (GLM). Don't worry about what it means, just know that it calculates F-tests for single-factor and factorial designs. ANOVA can be used when the levels of an independent variable are manipulated (experimental design), or naturally-occurring (quasi-experimental design). Critically: Setting up ANOVA in PASW requires you to think about the design: Is there one independent variable, or more? How many levels of each independent variable are there? Do the levels of the independent variables differ betweensubjects or within-subjects? I don't want to get technical, so I'll be as simple as possible. From the data set, say we want to compare the Posttest GRE Verbal Reasoning Scores (Post_GREv) across the four groups within the independent variable Drug_Group. Thus, we have a oneway ANOVA; that is, one independent variable and one dependent variable. The syntax below presents the minimal set of sub-commands needed to run a oneway ANOVA. This syntax is used only if the independent variable is between-subjects (withing-subjects variables require a repeated measured GLM): 1 UNIANOVA Post_GREv BY Drug_Group 2 /METHOD=SSTYPE(3) 3 /INTERCEPT=INCLUDE 4 /CRITERIA=ALPHA(.05) 5 /DESIGN=Drug_Group. The variable before BY (Post_GREv) is always the dependent variable and the variable after BY (Drug_Group) is always the independent variable. If you have a factorial design, the additional independent variables would be entered here. On Line 2, the /METHOD sub-command tells PASW how the sums of squares should be calculated (SSTYPE), which is usually set to 3. On Line 4, the /CRITERIA sub-command tells PASW what alpha level to use. Finally, on Line 5, the /DESIGN subcommand is where you build the effects to be examined in the ANOVA. In the case of a oneway design, there is only one independent variable to influence the dependent variable; hence, you list that independent variable. When you are using ANOVA to analyze a factorial designs, additional factors can be included. Running the syntax above gives you the following: T his table lists each level of the independent variable, as well as the number of subjects (N) contributing to each level. Between-Subjects Factors Value Label N Drug_Group 1 Control Group (no drug) 60 2 Placebo Group mg/day Group mg/day Group of 49

33 Tests of Between-Subjects Effects Dependent Variable:Post_GREv Source Type III Sum of Squares df Mean Square F Sig. Corrected Model a Intercept 4.487E E Drug_Group Error Total 4.586E7 240 Corrected Total T his table is the ANOVA summary table. T he sums of squares, degrees of freedoms, mean squares, F-Tests, and p- values are listed here. The ANOVA summary table (Tests of Between-Subjects Effects) contains a lot of information, some of it unnecessary for our present purpose. I have highlighted relevant portions of the table in yellow. The terms associated with between group variance (variability due to the independent variable) are in the row labeled Drug_Group, which is the independent variable. The terms associated with the within group variance are in the row labeled Error. Most values in each column should be straightforward: Sums of squares for each source of variance are in the second column, degrees of freedom are in the third column, mean squares come next, followed by the F-test, and finally p-values. In this case, the F- Test on the independent variable is not statistically significant, because the p-value (.075) is greater than the chosen alpha-level (.05). Let's assume the test was significant, so we can do post-hoc tests. If you have a statistically significant F-Test, you need to know between which levels of the independent variable there is a significant difference in the dependent variable: we need post-hoc tests. The syntax below includes additional sub-commands. First, the /POSTHOC sub-command on Line 4 asks PASW to compare levels of the independent variable Drug_Group using Fisher's Least Significant Difference test (LSD). You have several options for what post-hoc test to use (TUKEY, BONFERRONI), but we'll stick with LSD for now. On Lines 5 and 6, the /EMEANS sub-command asks PASW to calculate the estimated mean of the dependent variable at each levels of the independent variable. Specifically, Line 5 asks for the grand mean (OVERALL), and Line 6 asks for the estimated mean for each level of Drug_Group. Finally, the /PRINT sub-command on Line 7 asks PASW to include additional items in the output. Specifically, ETASQ requests the eta-squared measure for the effect size, and DESCRIPTIVE asks for the descriptive statistics. There are many additional items that you can ask PASW to 'print' in the output, but we'll stick with these. 1 UNIANOVA Post_GREv BY Drug_Group 2 /METHOD=SSTYPE(3) 3 /INTERCEPT=INCLUDE 4 /POSTHOC=Drug_Group(LSD) 5 /EMMEANS=TABLES(OVERALL) 6 /EMMEANS=TABLES(Drug_Group) 7 /PRINT=ETASQ DESCRIPTIVE 8 /CRITERIA=ALPHA(.05) 9 /DESIGN=Drug_Group. 33 of 49

34 When you run the syntax, you get the following output: Between-Subjects Factors Value Label N Drug_Group 1 Control Group (no drug) 60 2 Placebo Group mg/day Group mg/day Group 60 T his table comes from requesting DESCRIPT IVES as part of the /PRINT sub-command. Descriptive Statistics Dependent Variable:Post_GREv Drug_Group Mean Std. Deviation N Control Group (no drug) Placebo Group mg/day Group mg/day Group Total Tests of Between-Subjects Effects Dependent Variable:Post_GREv Source Type III Sum of Partial Eta Squares df Mean Square F Sig. Squared Corrected Model a Intercept 4.487E E Drug_Group Error Total 4.586E7 240 Corrected Total Estimated Marginal Means 1. Grand Mean Dependent Variable:Post_GREv Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Estimated marginal means come from the /EMEANS sub-commands. Table 1 comes from the OVERALL request on Line 5 of the syntax, and Table 2 comes from Line Drug_Group Dependent Variable:Post_GREv Drug_Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Control Group (no drug) Placebo Group mg/day Group mg/day Group of 49

35 Post Hoc Tests Drug_Group T his table presents all of the pairwise comparisons between levels of the independent variable; that is, all of the POST HOC comparisons. Multiple Comparisons Dependent Variable:Post_GREv (I) Drug_Group (J) Drug_Group Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound LSD Control Group (no drug) Placebo Group mg/day Group mg/day Group * Placebo Group Control Group (no drug) mg/day Group mg/day Group * mg/day Group Control Group (no drug) Placebo Group mg/day Group mg/day Group Control Group (no drug) * Placebo Group * mg/day Group In the output above, the estimated marginal means and the descriptive statistics table provide more or less the same information: the means of each level of the independent variable Drug_Group. The table under Post Hoc Tests tells you which differences between levels of the independent variable are statistically significant. To read the Post Hoc Tests (Multiple Comparisons) table: There are two columns (I and J), both of which are labeled with the independent variable (Drug_Group). Under column I, one level of the independent variable should be listed, and in column J each of the other three levels of that independent variable are listed in separate rows. For example, the first level of the independent variable listed in column I is Control Group (no drug), and each of the other three levels of the independent variable are listed under column J: Placebo Group, 100 mg/day group, 200 mg/day Group. You should see a mean difference next to each of the groups in column J. This is the mean difference in the dependent variable between the level of the independent variable listed in column J with the level of the independent variable listed in column I. Thus, the mean difference in Posttest Verbal Reasoning GRE scores between the Placebo Group and the Control group is The mean difference in Posttest Verbal Reasoning GRE scores between the 100 mg/day Group and the Control group is (Note, they are negative only because of the direction PASW is subtracting.) To determine whether a mean difference is statistically significant, look at the column labeled Sig. This column lists the p-value that can be used to determine whether the mean difference is significant. If the p-value is less than a chosen alpha level (α =.05, or less), then the mean difference is significant. In this data set, the only statistically significant mean differences are between the Control Group and the 200 mg/day Group (-26.00, p =.027) and between the Placebo Group and the 200 mg/day Group (-27.00, p =.021). But, it should be noted that because the F-Test was not significant, these post-hoc, pairwise comparisons are meaningless. 35 of 49

36 6.2 Between Subjects Factorial ANOVA (via GLM) Factorial designs examine the influence of two or more independent variables on a dependent variable, and several possible effects can be significant (or not) in a factorial ANOVA: main effects and interactions. (I assume you know what these are.) The PASW procedure for requesting a factorial ANOVA is not very different from requesting a oneway ANOVA. In the syntax the follows, we will cover how to request factorial ANOVA in PASW with two between-subjects independent variables. Say that we want to examine the influence of the independent variables Drug_Group and Tutor_Group on Posttest GRE Verbal Reasoning Scores (Post_GREv). Recall that Drug_Group has four levels (Control, Placebo, 100 mg/day, and 200 mg/day), and Tutor_Group has three levels (Control, Group Tutoring, and Individual Tutoring). Thus, we have a 4 (Drug_Group) x 3 (Tutor_Group) factorial design. The set of syntax, below, which we not actually run, includes minimum sub-commands needed to have PASW run a factorial ANOVA. On the UNIANOVA command line (Line 1), before the BY, the dependent variable (Post_GREv) is listed. After the BY, both independent variables are listed (Drug_Group and Tutor_Group). The inclusion of the second independent variable is one difference from the oneway ANOVA in Section 6.1. Lines 2 4 are exactly the same at the oneway ANOVA performed in Section 6.1, and need no additional commentary. 1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group 2 /METHOD=SSTYPE(3) 3 /INTERCEPT=INCLUDE 4 /CRITERIA=ALPHA(.05) 5 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group. The /DESIGN sub-command on Line 5 is where you request effects to be included in the overall ANOVA design. Remember, in factorial designs there is the potential of a main effect of each independent variable, and the potential for interactions between independent variables. Thus, each main effect and interaction that should be included in the analysis should be listed here. To include a main effect in the design, list the name of that independent variable. In the syntax above, the inclusion of Drug_Group and Tutor_Group on Line 5 asks PASW conduct F-Tests for those main effects. To include an interaction, list the independent variables that are part of the desired interaction and include an asterisk (*) between them. In the syntax above, the inclusion of Drug_Group*Tutor_Group asks PASW to conduct an F-Test on that interaction. Because we have only two independent variables, this is the only possible interaction. With three of more independent variables, additional interactions could be listed here. So that's it! Again, we won't run this syntax; I'll include some more stuff before presenting any output. Below, I listed the minimum syntax for a oneway ANOVA alongside the minimum syntax for a factorial ANOVA, for comparison: Line Oneway ANOVA Line Factorial NAOVA 1 UNIANOVA Post_GREv BY Drug_Group 1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group 2 /METHOD=SSTYPE(3) 2 /METHOD=SSTYPE(3) 3 /INTERCEPT=INCLUDE 3 /INTERCEPT=INCLUDE 4 /CRITERIA=ALPHA(.05) 4 /CRITERIA=ALPHA(.05) 5 /DESIGN=Drug_Group. 5 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group. 36 of 49

37 The syntax below (output follows) builds on the syntax above. The /POSTHOC sub-command on Line 4 requests Fisher's LSD tests to be conducted for the main effects of Drug_Group and Tutor_Group. Post hoc tests for interactions are usually done by way of a simple main effects analysis, or t-test, but this is beyond the scope of this packet for now. The /EMEANS sub-commands on Line 5 8 asks PASW to calculate the grand mean (OVERALL, Line 5), the mean for each level of Drug_Group (Line 6), the mean for each level of Tutor_Group (Line 7), and the mean for each cell in the Drug_Group by Tutor_group design (Line 8). Finally, the /PRINT sub-command asks PASW to provide descriptive statistics (DESCRIPTIVE) and the eta-squared measure of the effect size for each F-Test: 1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group 2 /METHOD=SSTYPE(3) 3 /INTERCEPT=INCLUDE 4 /POSTHOC=Drug_Group Tutor_Group(LSD) 5 /EMMEANS=TABLES(OVERALL) 6 /EMMEANS=TABLES(Drug_Group) 7 /EMMEANS=TABLES(Tutor_Group) 8 /EMMEANS=TABLES(Drug_Group*Tutor_Group) 9 /PRINT=ETASQ DESCRIPTIVE 10 /CRITERIA=ALPHA(.05) 11 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group. Running the syntax above, you get the following output: Between-Subjects Factors Value Label N Drug_Group 1 Control Group (no drug) 60 2 Placebo Group mg/day Group mg/day Group 60 Tutor_Group 1 Control Group (no tutoring) 80 2 Group Tutoring 80 3 Individual Tutoring 80 T his table lists each independent variable (far left) and each level of each independent variable (under Value Label), along with the number of subjects in each combination of the variables. This table lists the descriptive statistics (mean and std. Deviation) for each level of each independent variable, as w ell as for Descriptive Statistics each combination of the levels of the independent variables. Dependent Variable:Post_GREq Drug_Group Tutor_Group Mean Std. Deviation N Control Group (no drug) Control Group (no tutoring) Group Tutoring Individual Tutoring Total Placebo Group Control Group (no tutoring) Group Tutoring Individual Tutoring Total of 49

38 Descriptive Statistics Dependent Variable:Post_GREq 100 mg/day Group Control Group (no tutoring) Group Tutoring Individual Tutoring Total mg/day Group Control Group (no tutoring) Group Tutoring Individual Tutoring Total Total Control Group (no tutoring) Group Tutoring Individual Tutoring Total Tests of Between-Subjects Effects This table is the ANOVA summary table. The highlighted sections are relevant for the F-Tests. In this case, only the main effect of Tutor_Group w as significant Dependent Variable:Post_GREq Source Type III Sum of Partial Eta Squares df Mean Square F Sig. Squared Corrected Model a Intercept 8.386E E Drug_Group Tutor_Group Drug_Group * Tutor_Group Error Total 8.538E7 240 Corrected Total Estimated Marginal Means 1. Grand Mean Dependent Variable:Post_GREq Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Drug_Group Estimates Dependent Variable:Post_GREq Drug_Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Control Group (no drug) Placebo Group mg/day Group mg/day Group of 49

39 3. Tutor_Group Estimates Dependent Variable:Post_GREq Tutor_Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Control Group (no tutoring) Group Tutoring Individual Tutoring Drug_Group * Tutor_Group Dependent Variable:Post_GREq Drug_Group Tutor_Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Control Group (no drug) Control Group (no tutoring) Group Tutoring Individual Tutoring Placebo Group Control Group (no tutoring) Group Tutoring Individual Tutoring mg/day Group Control Group (no tutoring) Group Tutoring Individual Tutoring mg/day Group Control Group (no tutoring) Group Tutoring Individual Tutoring Post Hoc Tests Drug_Group Multiple Comparisons Dependent Variable:Post_GREq (I) Drug_Group (J) Drug_Group Mean Difference (I- J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound LSD Control Group (no drug) Placebo Group mg/day Group mg/day Group Placebo Group Control Group (no drug) mg/day Group mg/day Group mg/day Group Control Group (no drug) Placebo Group mg/day Group mg/day Group Control Group (no drug) Placebo Group mg/day Group of 49

40 Tutor_Group Multiple Comparisons Dependent Variable:Post_GREq (I) Tutor_Group (J) Tutor_Group Mean Difference (I- J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound LSD Control Group (no tutoring) Group Tutoring Individual Tutoring * Group Tutoring Control Group (no tutoring) Individual Tutoring Individual Tutoring Control Group (no tutoring) * Group Tutoring The majority of the output in above is not all that much different than the output from the oneway ANOVA performed in Section 6.1, and needs no elaboration. As stated in the comment on the ANOVA summary table: only the main effect of Tutor Group was significant (p =.016). Exploring this main effect, you can see from the Multiple Comparisons table that includes the post hoc tests between the levels of the independent variable for Tutor_Group, the only statistically significant mean difference is between the Control Group (no tutoring) and the Individual Tutoring group (mean difference = , p =.004).The mean difference between the Group Tutoring group and the Individual Tutoring group was nearly significant (mean difference = , p =.091). I encourage the reader to explore the output more thoroughly. 6.3 Repeated Measures ANOVA (via GLM) Sections 6.1 and 6.2 showed how to request ANOVAs when the levels of an independent variable differed between subjects. In this section, I briefly introduce how to request ANOVA when the levels of an independent variable differ within subjects (repeated measures ANOVA). This section will cover only how to request a oneway, repeated measures ANOVA, as the data file includes only a single independent variable that can be considered to differ 'within subjects', and that is the pretest versus posttest period. The PASW GLM procedure for within-subjects variables is referred to as the 'repeated measures GLM'. Let's say that we want to compare the mean score on the Verbal Reasoning Section of the GREs between the pretest and posttest periods (Pre_GREv vs. Post_GREv). Thus, we have one independent variable (Pretest vs. Posttest) with two levels. The syntax below lists the minimum set of sub-commands needed to perform this oneway repeated measures ANOVA: 1 GLM Pre_GREv Post_GREv 2 /WSFACTOR=Pretest_Posttest 2 Difference 3 /METHOD=SSTYPE(3) 4 /CRITERIA=ALPHA(.05) 5 /WSDESIGN=Pretest_Posttest. On the GLM command line, the levels of the within-subject variable are listed (Pre_GREv and Post_GREv). If there was three or more levels of the independent variable, they would be listed here as well. The order in which the levels are entered is critically important for repeated measures factorial 40 of 49

41 ANOVAs, but is of a concern for oneway repeated measures ANOVAs. On the /WSFACTOR command line (Line 2), the independent variable is listed. This independent variable does not actually appear in the data set; rather, it is a name that you give to the independent variable. In this example, because we are comparing GRE Verbal Reasoning scores between the pretest and posttest periods, I have called the independent variable Pretest_Posttest (PASW does not allow spaces in the name). On this line, the 2 indicates how many levels are within that independent variable. Finally the 'Difference' request is telling PASW how to compare the levels of that independent variable. This is akin to requesting a post-hoc test. The 'Difference' request tells PASW to compare each level with every other level, just like a Fisher's LSD test. Lines 3 and 4 should be familiar from the between-subjects ANOVAs performed in Sections 6.1 and 6.2. The last line, /WSDESIGN lists each factor that should be included in the analysis. In this case, because we have only one independent variable, it should be the only factor listed. The syntax above will output the results of only the F-Test; it will not provide any descriptive information. The syntax below includes the /PRINT sub-command on Line 4, which asks PASW to provide the DESCRIPTIVE statistics as well as the eta-squared measure of effect size. There is also the ability to request descriptive statistics through the /EMEANS sub-command: 1 GLM Pre_GREv Post_GREv 2 /WSFACTOR=Pretest_Posttest 2 Difference 3 /METHOD=SSTYPE(3) 4 /PRINT=DESCRIPTIVE ETASQ 5 /CRITERIA=ALPHA(.05) 6 /WSDESIGN=Pretest_Posttest. If you run the syntax above, you get the following output: Within-Subjects Factors Measure:MEASURE_1 Pretest_Posttest Dependent Variable 1 Pre_GREv 2 Post_GREv Descriptive Statistics Mean Std. Deviation N Pre_GREv Post_GREv of 49

42 T his tab;e is not relevant for our purposes. Multivariate Tests b Effect Partial Eta Value F Hypothesis df Error df Sig. Squared Pretest_Posttest Pillai's Trace a Wilks' Lambda a Hotelling's Trace a Roy's Largest Root a Measure:MEASURE_1 Mauchly's Test of Sphericity b This table lists the outcome of a 'sphericity' test, w hich is similar to homogeneity of variance. If sphericity is violated, it can be an issue. Within Subjects Effect Mauchly's W Square df Sig. Epsilon a Approx. Chi- Greenhouse- Geisser Huynh-Feldt Lower-bound Pretest_Posttest T his is the ANOVA summary table. Measure:MEASURE_1 Tests of Within-Subjects Effects Source Type III Sum Mean Partial Eta of Squares df Square F Sig. Squared Pretest_Posttest Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Error(Pretest_Posttest) Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound of 49

43 Measure:MEASURE_1 Tests of Within-Subjects Contrasts Source Pretest_Posttest Type III Sum of Squares df Mean Square F Sig. Pretest_Posttest Level 2 vs. Level Error(Pretest_Posttest) Level 2 vs. Level This table lists the results of the post hoc tests betw een levels of the independent variable. Measure:MEASURE_1 Transformed Variable:Average Source Type III Sum of Tests of Between-Subjects Effects Squares df Mean Square F Sig. Partial Eta Squared Intercept 8.561E E Error This table lists ANOVA results for any betw een subjects factors, w hich w e did not have in this analysis. The table Tests for Within Subjects Effects is the output of the ANOVA summary table. I have highlighted the relevant portions of the table in yellow. The terms associated with the effect of the independent variable (between group variability) are in the rows headed by Pretest_Posttest. The terms associated with the error variance are in the rows headed by Error(Pretest_Posttest). The information in most of the columns should be self explanatory. To determine whether the influence of the independent variable is statistically significant, look to the column labeled Sig. This is the p-value. If this value is less than your chosen alpha level (α =.05 or less), then the independent variable had a statistically significant influence on the dependent variable, which is the case (p <.001). Because the influence of the independent variable is statistically significant, you can conclude that the mean difference between Pre_GREv and Post_GREv is statistically significant. You can find the mean for each level of the independent variable in the Descriptive Statistics table. The table Tests of Within- Subjects Contrasts lists the post hoc test results of each comparisons between levels of the independent variable. Because there is only two levels of the independent variable, there is only one possible comparison (Level 1 vs. Level 2). 43 of 49

44 7. Chi Square 7.1 Cross-Tabulation Procedure (Factorial Chi-Square) The PASW procedure for requesting a factorial chi-square analysis (a chi-square analysis with two or more independent variables in the design) is done by way of PASWs CROSSTABS (cross-tabulation) procedure. Cross-tabulation is the process of creating a contingency table from two or more independent variables. You can have PASW create a contingency table for several independent variables but without actually conducting the chi-square analysis. From the data set, let's say that we want to know whether the n = 240 subjects in the study are equally distributed across the levels of the independent variables college class (Coll_Class) and college major (Coll_Maj). The syntax below presents the basic set of sub-commands needed to have PASW carry out a chi-square analysis though the cross-tabulation procedure: 1 CROSSTABS 2 /TABLES=Coll_Class BY Coll_Maj 3 /FORMAT=AVALUE TABLES 4 /STATISTICS=CHISQ PHI 5 /CELLS=COUNT EXPECTED. The /TABLES sub-command on Line 2 lists the independent variables being set up in the contingency table. One independent variable has to go before BY and the other goes after the BY, but it is not terribly important which one goes where. The /FORMAT sub-command on Line 3 is tells PASW in what format the output should be presented. In this case, tabled form (TABLE) and in the table the entries should appear in ascending order (AVALUE). The /STATISTICS sub-command on Line 4 is where you request the chi-square analysis (CHISQ); if you do not include this sub-command, PASW will not perform the analysis. I have also included PHI request, which has PASW calculate Cramer's C and the Phi Coefficient as measures of effect size. The /CELLS sub-command on Line 5 tells PASW what information to include in each cell of the cross-tabulation table. In this case, PASW is being told to include the observed frequency (COUNT) and the expected frequency (EXPECTED). If you run the syntax about, you get the following output: T his table tells you the total number of subjects/cases included (240), and whether any appear to be missing. Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent Coll_Class * Coll_Maj % 0.0% % 44 of 49

45 Coll_Class * Coll_Maj Crosstabulation This is the cross-tabulation table w ith college majors listed in columns and college classes in row s. The values in each cell are the observed and expected frequencies. Coll_Maj Total Psychology History Biology Communications English Mathematics Coll_Class Freshmen Count Expected Count Sophomore Count Expected Count Junior Count Expected Count Senior Count Expected Count Total Count Expected Count Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square a Likelihood Ratio Linear-by-Linear Association N of Valid Cases 240 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is Symmetric Measures Value Approx. Sig. Nominal by Nominal Phi Cramer's V N of Valid Cases 240 This table reports the results of the chi-square analysis. You use the terms in the Pearson Chi-Square row. This table reports Cramer's C (V) and the Phi coefficient measures of effect size. The Chi-Square Tests table above, included information relevant for determining whether there is a significant difference between the observed and expected frequencies. Use the information in the row labeled Pearson Chi-Square. The number in the Value column (16.617) is the chi-square statistic. The value under the df column (15) are the degrees of freedom in the cross-tabulation table. The value under the Asymp. Sig. (2-sided) column (.342) is the p-value used to determine significance. If this value is less than your chosen alpha-level (α =.05 or less), then there is a significant difference between the observed and the expected frequencies. In this case,. Because.345 >.05, there is not a significant difference between the observed and the expected frequencies. 45 of 49

46 7.2 Oneway Chi-Square If you have only one independent variable and want to know whether a set of observed frequencies differ across the levels of the independent variable from what frequencies are expected, you do not use the CROSSTABS procedure from Section 7.1. The CROSSTABS procedure is used only when there are two or more independent variables. There is a separate chi-square procedure within PASWs non-parametric test (NPAR TESTS) for dealing with one independent variable. Say that we want to determine whether the observed frequencies across the four college classes differ from a set of frequencies expected by chance. The syntax below lists the sub-commands needed to run a oneway chi-square test to determine whether the observed frequencies in each college class differ from what frequency is expected for each college class: 1 NPAR TESTS 2 /CHISQUARE=Coll_Class 3 /EXPECTED=EQUAL 4 /MISSING ANALYSIS. The /CHISQURE sub-command on Line 2 tells PASW to perform a chi-square test across the levels of the independent variable Coll_Class listed after the equal sign. The /EXPECTED sub-command on Line 3 tells PASW how to calculate the expected frequencies. In this case the choice of EQUAL asks PASW to assume the expected frequency should be equal for each college class. Hence, with 240 students and four college classes, the expected frequency for each college class should be 240/4 = 60. Finally, the /MISSING sub-command on Line 4 tells PASW how to handle missing data, which is usually set to ANALYSIS, or LISTWISE. When you run the syntax above, you get the following output: Coll_Class Observed N Expected N Residual Freshmen Sophomore Junior Senior Total 240 This table lists each of the levels of the independent variable, the observed frequencies, the expected frequencies, and the difference betw een them. Test Statistics T his table lists the outcome of the chi-square test between the expected and observed frequencies. Coll_Class Chi-square a df 3 Asymp. Sig..769 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is of 49

47 From the Test Statistics table you can determine whether the observed frequencies significantly differ from the expected frequencies by examining the p-value in the Asymp. Sig. Row. If this value is less than your chosen alpha-level (generally α =.05 or less), then there is a significant difference between the observed and the expected frequencies. In this case, because.769 >.05, there is not a significant difference between the observed and the expected frequencies. 7.3 Goodness of Fit Test Requesting a goodness of fit test is virtually identical to requesting a chi-square analysis for one independent variable. Assume that in most research studies performed using college students, freshmen are most likely to participate, sophomores are second-most likely, juniors are third-most likely, and seniors are least likely. Thus, we may expect that 50% (.5) of the subjects in a study are freshmen, 25% (.25) are sophomores, 15% (.15) are juniors, and 10% (.1) are seniors. We want to run a goodness of fit test to determine whether the frequencies observed in each college class are consistent with these expected percentages. The syntax below lists the sub-commands needed to run a goodness of fit test to determine whether the frequencies observed in each college class are congruent with the predicted percentages above: 1 NPAR TESTS 2 /CHISQUARE=Coll_Class 3 /EXPECTED= /MISSING ANALYSIS. Notice that Lines 1, 2, and 4 are identical to the oneway chi-square conducted in Section 7.2; the only difference is the /EXPECTED sub-command on Line 3. The numbers after the equal sign are the expected proportions of freshmen (.5), sophomores (.25), juniors (.15), and seniors (.1) from above. For the goodness of fit test, you can use proportions as done here, or expected frequencies. Ehich would need to be determined. Importantly: the order of proportions must coincide with the dummy-codes assigned to the levels of the independent variable. That is, whichever level was dummy-coded as 1 would have it's expected proportion presented first, whichever level was dummy-coded as 2 would have it's expected proportion presented second, etc. In the data file, freshmen were coded 1, sophomores were coded 2, etc. When you run the syntax above, you get the following output: Coll_Class Observed N Expected N Residual Freshmen Sophomore Junior Senior Total 240 This table lists each of the levels of the independent variable, the observed frequencies, the expected frequencies, and the difference betw een them. The expected frequencies are obtained by multiplying the sample size (240) by the expected proportion of each level. 47 of 49

48 Test Statistics Chi-square Coll_Class a df 3 Asymp. Sig..000 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is This table lists the outcome of the chi-square test betw een the expected and observed frequencies. From the Test Statistics table you can determine whether the observed frequencies significantly differ from the expected frequencies by examining the p-value in the Asymp. Sig. Row. If this value is less than your chosen alpha-level (generally α =.05 or less), then there is a significant difference between the observed and the expected frequencies. In this case, because p <.001, there is a significant difference between the observed and the expected frequencies. 7.4 Alternative Method for Goodness of Fit Test There is an alternative procedure for requesting a goodness of fit test. Say that you know the numbers of freshmen (57), sophomores (66), juniors (63), and seniors (56) in the study, and want to run a goodness of fit test on those numbers, but have not set up an entire data file with all 240 subject cases. The screen shot below, shows a data file with the independent variable Coll_Class (1 = freshmen; 2 = sophomores; 3 = juniors; 4 = seniors) and the Observed frequencies in each class. We could also run a goodness of fit test (or a oneway chi-square) when data are set up in this manner. Figure 6: Observed frequencies in each college class. 48 of 49

49 The syntax below shows you how to conduct a chi-square goodness of fit test when the frequency data is set up in the manner shown in Figure 6: 1 WEIGHT BY Observed. 2 3 NPAR TESTS 4 /CHISQUARE=Coll_Class 5 /EXPECTED= /MISSING ANALYSIS. 7 8 WEIGHT OFF. The WEIGHT command on Line 1 is critical, as this tells PASW to weight each case (each level of the independent variable) by the amount listed in the Observed variable. The 'Observed' is the frequency data in the data file pictured in Figure 6. Thus, the WEIGHT BY command tells PASW to weight the freshmen by 57, the sophomores by 65, the juniors by 63, and the seniors by 55. This is basically telling PASW that each college class contains that many subjects. Lines 3 6 are identical to the goodness of fit procedure discussed in Section 7.3, and I will not elaborate on them here. The WEIGHT OFF command turns off the weighting factor after the analysis has been run. When you run the syntax above, you get the following output: Coll_Class Observed N Expected N Residual Freshmen Sophomore Junior Senior Total 240 Test Statistics Chi-square Coll_Class a df 3 Asymp. Sig..000 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is Notice that the output is exactly the same as in Section of 49