SPSS MULTIPLE IMPUTATION

Similar documents
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

IBM SPSS Missing Values 22

Imputing Missing Data using SAS

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

January 26, 2009 The Faculty Center for Teaching and Learning

Chapter 7 Factor Analysis SPSS

IBM SPSS Missing Values 20

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

SPSS Introduction. Yi Li

Directions for using SPSS

SPSS Guide: Regression Analysis

IBM SPSS Data Preparation 22

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Craig K. Enders Arizona State University Department of Psychology

A Brief Introduction to SPSS Factor Analysis

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

IBM SPSS Direct Marketing 23

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Package dsmodellingclient

IBM SPSS Direct Marketing 22

What s New in IBM SPSS Statistics 20

Data analysis process

Moderation. Moderation

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

IBM SPSS Bootstrapping 22

Using SPSS, Chapter 2: Descriptive Statistics

STATISTICA Formula Guide: Logistic Regression. Table of Contents

This chapter reviews the general issues involving data analysis and introduces

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Data Mining III: Numeric Estimation

Section 6: Data Analysis Guide Overview

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Data Analysis in SPSS. February 21, If you wish to cite the contents of this document, the APA reference for them would be

Additional sources Compilation of sources:

Logistic Regression.

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Please follow these guidelines when preparing your answers:

R2MLwiN Using the multilevel modelling software package MLwiN from R

8. Time Series and Prediction

Handling attrition and non-response in longitudinal data

Handling missing data in Stata a whirlwind tour

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

ANALYSIS OF TREND CHAPTER 5

Data exploration with Microsoft Excel: analysing more than one variable

APPLIED MISSING DATA ANALYSIS

IBM SPSS Neural Networks 22

An introduction to using Microsoft Excel for quantitative data analysis

SPSS Explore procedure

IBM SPSS Statistics for Beginners for Windows

SPSS: Getting Started. For Windows

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Multiple Imputation for Missing Data: A Cautionary Tale

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

SPSS INSTRUCTION CHAPTER 1

SPSS (Statistical Package for the Social Sciences)

A Basic Introduction to Missing Data

Gamma Distribution Fitting

SPSS-Applications (Data Analysis)

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Introduction Course in SPSS - Evening 1

SAS Software to Fit the Generalized Linear Model

C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3

Forecasting in STATA: Tools and Tricks

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Data Analysis Tools. Tools for Summarizing Data

Assignments Analysis of Longitudinal data: a multilevel approach

Using SAS to Create Sales Expectations for Everyday and Seasonal Products Ellebracht, Netherton, Gentry, Hallmark Cards, Inc.

SAS Analyst for Windows Tutorial

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

IBM SPSS Direct Marketing 19

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Chapter 23. Inferences for Regression

7 Generalized Estimating Equations

11. Analysis of Case-control Studies Logistic Regression

Weight of Evidence Module

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

eq5d: A command to calculate index values for the EQ-5D quality-of-life instrument

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

Introduction to SPSS 16.0

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Scatter Plot, Correlation, and Regression on the TI-83/84

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DISCRIMINANT FUNCTION ANALYSIS (DA)

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Simple Linear Regression, Scatterplots, and Bivariate Correlation

How to set the main menu of STATA to default factory settings standards

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)


Simple linear regression

Data Preparation in SPSS

Transcription:

SPSS MULTIPLE IMPUTATION IMPUTATION ALGORITHM The SPSS uses an MCMC algorithm known as fully conditional specification (FCS) or chained equations imputation The basic idea is to impute incomplete variables one at a time, using the filled-in variable from one step as a predictor in all subsequent steps SPSS uses linear regression for continuous variables and logistic regression for categorical variables

EATING DISORDER RISK DATA Questionnaire data from a study of eating disorder risk in a sample of 500 college-aged women Variables: Body mass index (BMI), 7 questionnaire items measuring body dissatisfaction, 6 questionnaire items measuring eating disorder risk, binary indicator of past sexual abuse history (0 = no abuse history, 1= abuse history) All questionnaire items measured on a 7-point Likert scale ANALYSIS MODEL Multiple regression model that predicts eating disorder risk (sum of 6 items) from abuse history and body dissatisfaction (sum of 7 items) Abuse EDR Scale ε BD Scale edrscale = B0 + B1(abuse) + B2(bdscale) + ε

DEFINING VARIABLES Incomplete variables must be defined as nominal or scale (i.e., continuous) prior to imputation SPSS applies linear imputation to scale variables and logistic (or multinomial logistic) regression to categorical variables Define variables in the Variable View tab or with syntax CATEGORICAL VARIABLES A multinomial logistic regression model for a Likert outcome has many parameters Imputation can be exceedingly slow (the ordinal imputation model in Mplus is much faster) Treating ordinal scales as continuous is often fine, but rounding imputed values to the nearest integer can introduce bias

VARIABLE VIEW TAB EXPLORATORY ANALYSIS Prior to imputing the data, run an exploratory analysis to assess MCMC convergence SPSS provides limited diagnostic information To implement my diagnostic macro program (described later), specify imputations = 2 and iterations = 1000

MULTIPLE IMPUTATION COMMAND VARIABLES TAB Select variables, specify the number of imputed data sets, and specify a file name for the imputed data To use the diagnostic macro, specify imputations = 2

METHOD TAB Select the algorithm (fully conditional specification), and specify the number of betweenimputation iterations To use the diagnostic macro, specify iterations = 1000 OUTPUT TAB Checking the Create iteration history button saves P-step means and standard deviations To use the diagnostic macro, save the parameters to a file named parameters.sav

SPSS SYNTAX */ OPEN DATA */. get file = 'c:\spss ex\eating risk data.sav'. dataset name rawdata window = front. */ DEFINE MEASUREMENT SCALE */. variable level bmi bds1 to bds7 edr1 to edr6 (scale) /abuse (nominal). */ PERFORM EXPLORATORY ANALYSIS */. dataset activate rawdata. multiple imputation bmi bds1 to bds7 edr1 to edr6 abuse /impute method = fcs maxiter = 1000 nimputations = 2 /outfile imputations = temp fcsiterations = 'c:\spss ex\parameters.sav'. DIAGNOSTIC INFORMATION (OR LACK THEREOF) SPSS provides limited diagnostic information The program saves P-step means and standard deviations to a file but does not produce diagnostic information I wrote an SPSS macro that creates trace plots and computes the potential scale reduction factor

DIAGNOSTIC MACRO!let!folder specifies the folder that contains parameters.sav!let!vars specifies the variables in parameters.sav define diagnosticmacro () */!FOLDER = FILE PATH TO THE FOLDER CONTAINING THE PARAMETERS.SAV FILE */. */!VARS = VARIABLES IN THE PARAMETERS.SAV FILE */.!let!folder = 'c:\spss ex'!let!vars = 'bmi bds1 bds2 bds4 bds7 edr2 edr3 edr5' TRACE PLOT OF THE BMI MEAN

TRACE PLOT OF THE BMI STANDARD DEVIATION POTENTIAL SCALE REDUCTION The macro computes the PSR for the P-step means and standard deviations after every 100 iterations These parameters often converge quickly than covariances (which SPSS does not provide) Be conservative when choosing the between-imputation interval based on the PSR The macro displays the maximum PSR for all parameters in a table and in a line graph

LINE GRAPH OF MAXIMUM PSR VALUES FINAL IMPUTATION RUN After establishing convergence, run MCMC a second time to generate the imputed data sets Imputation details for this example: 20 imputed data sets 200 iterations separating each imputed data set

VARIABLES TAB METHOD TAB

OUTPUT TAB IMPUTATION OUTPUT SPSS stacks the imputed data sets into a single file A variable named IMPUTATION_ differentiates the data sets The stacked file format is convenient because data manipulation tasks (e.g., computing new variables, recoding, etc.) need only be executed once The IMPUTATION_ variable plays an important role in the subsequent analyses

IMPUTED DATA COMPUTING SCALE SCORES Use the COMPUTE command once for each scale score Transformations are automatically applied to every data set because the imputed files are stacked

SPLIT FILE COMMAND Splitting the file by the IMPUTATION_ variable invokes the analysis and pooling procedures, if available ANALYZING MULTIPLY IMPUTED DATA Analyze the data as usual SPSS pools estimates for many common analyses, but not all The program is idiosyncratic in its application of the pooling formulas (e.g., in a regression analysis, SPSS pools the unstandardized coefficients but not the beta weights) The icon denotes a procedure that can accommodate multiply imputed data

SUPPORTED PROCEDURES SPSS SYNTAX */ PERFORM MULTIPLE IMPUTATION ANALYSIS */. dataset activate rawdata. multiple imputation bmi bds1 to bds7 edr1 to edr6 abuse /impute method = fcs maxiter = 200 nimputations = 20 /outfile imputations = imputed. */ COMPUTE SCALE SCORES WITHIN EACH DATA SET */ dataset activate imputed. compute bodydis = sum(bds1 to bds7). compute eatrisk = sum(edr1 to edr6). exe. */ SPLIT THE FILE BY VARIABLE IMPUTATION_ */ sort cases by imputation_. split file layered by imputation_.

SPSS OUTPUT SPSS reports the analysis results separately for each imputed data set even when it does not pool the estimates The pooled estimates and standard errors appear at the bottom of each table, when available Some estimates get pooled, some do not DESCRIPTIVES

CORRELATIONS REGRESSION