Craig K. Enders Arizona State University Department of Psychology
|
|
- Edwina Dennis
- 7 years ago
- Views:
Transcription
1 Craig K. Enders Arizona State University Department of Psychology
2 Topic Page Missing Data Patterns And Missing Data Mechanisms 1 Traditional Missing Data Techniques 7 Maximum Likelihood Estimation And Missing Data Handling 12 Maximum Likelihood Missing Data Handling In Mplus 21 Data Analysis Example: Means, Standard Deviations, and Correlations 27 Data Analysis Example: ANCOVA 29 Data Analysis Example: Repeated Measures ANOVA 31 Dealing With Nonnormal Missing Data 34 Data Analysis Example: Confirmatory Factor Analysis 37 Incorporating Auxiliary Variables Into Maximum Likelihood Analyses 42 Data Analysis Example: Scale Score Analysis With Auxiliary Variables 49
3 Topic Page Multiple Imputation: The Imputation Phase 53 Assessing The Convergence Of MCMC 57 Multiple Imputation: The Analysis and Pooling Phase 64 Multiple Imputation In Mplus 69 Data Analysis Example: Means, Standard Deviations, and Correlations 78 Data Analysis Example: ANCOVA 80 Data Analysis Example: Confirmatory Factor Analysis 82 Data Analysis Example: Scale Score Analysis 87 Data Analysis Example: Multilevel Model 91 Multiple Imputation In SPSS 96 Multiple Imputation In SAS 101 Planned Missing Data Designs 105
4 Discuss missing data theory and assumptions Briefly review traditional analysis approaches Introduce modern missing data handling methods: maximum likelihood (ML) estimation, and multiple imputation (MI) Illustrate software applications of ML and MI Introduce planned missing data designs Routine implementation of these new methods of addressing missing data [maximum likelihood and multiple imputation] will be one of the major changes in research over the next decade Steve West, former Editor of Psychological Methods, quoted in APA s Monitor On Psychology (2002, Vol. 33, p. 70) The number of ML and MI applications in behavioral science journals has increased dramatically in recent years Applied Missing Data Analysis Guilford Press, for additional information All data sets and analysis examples from the book are available for download 4 1
5 The missing data pattern describes the configuration of observed and the missing values in a data set The pattern describes the location of the holes in the data but says nothing about why the data are missing The missing data mechanism describes how an individual s propensity for missing data is related to other variables, if at all The likelihood of a missing value on Y may be related to other variables in the data or to the would-be values of Y; it is also possible that the propensity for missing data is unrelated to other variables A general pattern occurs when missing values are haphazardly dispersed throughout the data matrix Despite the seemingly random pattern, missingness may be systematic ML and MI are flexible approaches that can handle a general missing data pattern Y1 Y2 Y3 Y4 Planned missing data designs introduce intentional missing values (e.g., to reduce respondent burden and maximize resources) e.g., A four-wave longitudinal study where each case provides data at three of the four waves Y1 Y2 Y3 Y4 Theoretical foundations of modern missing data analyses described by Rubin (1976) Missing data mechanisms Missing completely at random (MCAR) Missing at random (MAR) Missing not at random (MNAR) Mechanisms describe how the propensity for a missing value on a variable Y relates to the data, if at all 2
6 For a variable Y, there are potentially two scores: The value of Y A binary variable R that denotes whether Y is observed (e.g., R = 0 if Y is observed, R = 1 if Y is missing) Similarly, there are two sets of parameter estimates: The parameters of substantive interest, (e.g., means, standard deviations, correlations) Nuisance parameters that describe the propensity for missing data, (e.g., logistic regression coefficients) Sometimes we only need to estimate, other times we must estimate and ; the missing data mechanism dictates this The probability of missing data on Y is unrelated to other measured variables and is unrelated to the values of Y itself What researchers think of as haphazard or flip of a coin missingness MCAR is strict because it says that the probability of missing data is unrelated to anything in the data The observed data are a simple random sample of the hypothetically complete data set Employees complete an IQ test during a job interview Supervisors rate job performance after 6-months Performance ratings are missing for no particular reason (e.g., maternity leave, spouse relocates, supervisor quits) IQ Scores Job Ratings (Hypothetical) Job Ratings (MCAR) MCAR is the only testable mechanism If MCAR holds, cases with missing values should be no different from the cases with complete data, on average To test, form an indicator variable that denotes missingness (e.g., 0 = complete, 1 = missing) Compare indicator groups on other variables, e.g., using an independent t test or Cohen s d effect size Multivariate extensions of this approach exist (e.g., Rubin s MCAR test) 3
7 Complete cases have an IQ mean of Missing cases have an IQ mean of Small differences between the two groups suggest haphazard missingness MCAR is plausible IQ Scores Job Ratings (MCAR) Missingness Indicator Confusing terminology because missingness is systematic MAR means that the probability of missing data on Y is related to some other measured variable but is unrelated to the wouldbe values of Y itself After controlling for other variables, there is no association between the propensity for missing data on Y and the would-be values of Y ML and MI require the MAR assumption Prospective employees complete an IQ test during a job interview IQ is a section measure, company does not hire applicants in the lower quartile Supervisors rate job performance after 6-months IQ Scores Job Ratings (Hypothetical) Job Ratings (MAR) The probability of missing data on Y is related to the values of Y itself The most problematic mechanism, can cause substantial bias Requires specialized analysis procedures (e.g., selection models, pattern mixture models) MNAR can also occur when the cause of missingness is a measured variable that is omitted from the analysis 4
8 IQ Scores Job Ratings (Hypothetical) Job Ratings (MNAR) Employees complete an IQ test during a job interview Supervisors rate job performance after 6-months Company terminates employees for poor performance prior to their evaluation IQ JP MCAR Z R IQ JP MAR Z R IQ JP MNAR Z R It is impossible to empirically differentiate MAR or MNAR Proving that the probability of missingness on Y is related (or unrelated) to Y requires the would-be values of Y It is only possible to provide evidence against MCAR Mean differences between the complete and the incomplete cases could be MAR or MNAR MAR is ultimately an untestable assumption Complete cases have an IQ mean of Missing cases have an IQ mean of Large differences between the two groups suggest systematic missingness MCAR is not plausible, the mechanism is MAR or MNAR IQ Scores Job Ratings (MAR) Missingness Indicator
9 Complete cases have an IQ mean of Missing cases have an IQ mean of Large differences between the two groups suggest systematic missingness MCAR is not plausible, the mechanism is MAR or MNAR IQ Scores Job Ratings (MNAR) Missingness Indicator Mechanisms serve as assumptions for a missing data analysis ML and MI assume MAR When MAR (or MCAR) holds, we can estimate the parameters of substantive interest without worrying about the parameters that dictate missingness MNAR analyses (e.g., selection models, pattern mixture models) require a submodel that explains why the data are missing MNAR analyses are difficult and do not necessarily perform better than MAR-based analyses ML and MI require the MAR assumption MAR is not automatically satisfied just because the causes/ correlates of missing data are measured variables MAR is satisfied on an analysis-by-analysis basis The correlates of missingness must be part of the statistical analysis or part of the imputation routine Researchers use simple regression to examine the association between self-esteem and risky sexual behavior in teens Only participants 16 years of age or older fill out the sexual behavior questionnaire MAR is only satisfied if age is included in the regression model Excluding age can produce an MNAR mechanism and can produce biased parameter estimates The correlation between age and sexual behavior dictates this bias 6
10 Researchers rarely know why data are missing At best, measured variables may be correlates of the true reasons for missingness An inclusive analysis strategy incorporates auxiliary variables that are (a) correlates of the incomplete variable or (b) correlates of missingness An inclusive strategy improves the chances of satisfying the MAR assumption Standard regression model (MNAR) Esteem Risky Sex Auxiliary variable regression model (MAR) Esteem Risky Sex e e Age Common traditional approaches, all of which can lead to substantial bias Listwise deletion Pairwise deletion Mean imputation Regression imputation 7
11 20 prospective employees take an IQ test during a job interview The company uses IQ as a selection measure and hires applicants who score above the median A supervisor rates job performance following a 6-month probationary period Performance ratings are missing for the employees who were never hired Job Performance IQ True MAR Job Performance IQ True MAR Job Performance Listwise deletion eliminates all cases with missing values, resulting in a complete data set Pairwise deletion eliminates cases on an analysis-by-analysis basis Discarding data reduces power Deletion methods assume an MCAR mechanism and will yield bias under MAR or MNAR IQ 8
12 Job Performance Excluded cases Replaces (i.e., imputes) missing values on Y with the average of the available Y scores Replacement values pile up at the mean, restricting variability Estimates are severely biased under any missing data mechanism This is the worst possible option IQ Job Performance Filled-in cases have R 2 = 0 Regression equations predict the incomplete variables from the complete variables Substituting observed data as X variables in the regression equation generates predicted values for the missing scores Imputed values fall directly on a regression surface, reducing variation in the data Measures of variation and association are biased IQ 9
13 Job Performance Filled-in cases have R 2 = 1 ^ JobPerf = B 0 + B 1 (IQ) Same as regression imputation, but adds a normal residual term to each predicted value Stochastic regression is the only traditional approach that assumes MAR This is the best of the bunch, but standard errors are too small This method is equivalent to multiple imputation with a single filled-in data set IQ Residual distributions around predicted scores Job Performance ^ JobPerf = B 0 + B 1 (IQ) Job Performance ^ JobPerf = B 0 + B 1 (IQ) IQ IQ 10
14 Job Performance 20 Filled-in cases Random residual terms IQ Analysis Method IQ M (SD) Performance M (SD) IQ-Perf Correlation Complete Data 100 (14.13) (2.68).54 Listwise Deletion (9.70) (2.71).44 Mean Imputation 100 (14.13) (1.87).21 Regression Imputation 100 (14.13) (2.43).72 Stochastic Regression 100 (14.13) (2.74) random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY
15 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY Maximum likelihood (ML) identifies the population parameter values that are most consistent with the raw data A likelihood (or log likelihood) function quantifies the discrepancy or fit of the data to the parameters The multivariate normal distribution is the starting point for ML estimation with continuous variables The height of the multivariate normal distribution 1 L i = ( 2 ) p/2 " 1/2 e#.5 yi#µ ( ) T " #1 ( y i #µ) This is key Plugging scores into Y yields a likelihood, L i L i is the relative probability of an individual s Y values, given the parameter estimates in µ and # 12
16 The likelihood value is largely driven by the squared z score to the right of the exponent L i = [scaling factor]e.5(z2 ) Smaller z scores reflect a better fit to the parameters, in the sense that the score is closer to the mean Small z scores = high likelihood = high probability = good fit Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 1 has the higher likelihood value This case is closer to the parameter values and thus has better fit to µ = 0 L 2 = L 1 = Y Y Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 2 now has the higher likelihood value This case is closer to the parameter values and has better fit to µ = -1 L 2 = Y 1 L 1 = Y Likelihoods are very small numbers; taking the natural log makes the math a bit more tractable log L i = p log( 2 ) log " 1 ( 2 y µ i ) T "1 ( y i µ) The log likelihood still quantifies the relative probability of a set of scores, but on a logarithmic scale 13
17 Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 1 has the higher log likelihood (i.e., relative probability) value L 2 =.064 logl 2 = Y 1 L 1 =.164 logl 1 = Y The log likelihood for an entire sample is the sum of the individual log likelihoods log L = log L i The log likelihood summarizes the fit of a sample to a normal distribution with a particular mean vector and covariance matrix ML uses the log likelihood to audition and choose among different plausible parameter values A sample of IQ scores from 20 job applicants Use ML to estimate the population mean Estimation strategy: Compute the sample log likelihood for different values of µ Identify the mean value that produces the highest log likelihood (i.e., best fit to the data; highest probability of producing the sample data) logl = ( ) = ID IQ logl ID IQ logl 1 78 "#$%&' "#($)' "#($)' *#%)+' *#)&(' *#&+,' *#&*&' *#$)"' *#$)"' *#$$-'
18 logl = ( ) = logl= ( ) = ID IQ logl ID IQ logl ID IQ logl ID IQ logl logl = ( ) = ID IQ logl ID IQ logl Mean logl µ = 100 produced the highest log likelihood (i.e., relative probability) µ = 100 has the highest probability of producing this sample of 20 cases µ = 100 is the maximum likelihood estimate (MLE) 15
19 The log likelihood function describes how the sample log likelihood changes between values of µ = 90 and 110 Log Likelihood Population Mean µ = 100 maximizes the log likelihood Population Mean The complete data log likelihood log L i = p log 2 2 ( ) 1 2 log " 1 2 ( y µ i )T "1 ( y i µ) The missing data log likelihood (also called FIML for full information maximum likelihood) log L i = p i log 2 2 ( ) 1 2 log " 1 ( 2 y i µ i ) T 1 ( y i µ i ) The missing data log likelihood has an i subscript on the parameter matrices, µ and # The size and content of these matrices can vary across cases depending on which variables are observed and missing The squared z score that determines each case s likelihood (i.e., fit) is computed using the parameters for which a case has data Consequently, ML uses all available data to estimate parameters; some cases contribute more information than others An analysis with three variables: Y 1, Y 2, and Y 3 The squared z score for the complete cases is based on all parameters log L i = K i (" 13 * 2 log $ * $ $ * )# y 1 y 2 y 3 % " ' $ ' $ ' $ & # µ 1 µ 2 µ 3 T 1 % + " ' % (" $ 13 ' * $ '- $ ' * $ ' - $ ' $ &, # $ &' * )# y 1 y 2 y 3 % " ' $ ' $ ' $ & # µ 1 µ 2 µ 3 % + '- '- ' - &, Squared z score 16
20 Cases with missing values on Y 2 would have the following log likelihood log L i = K i 1 2 log 11 (" 13.5* $ * )# $ y 1 y 3 T % " ' & ' µ % + " 1 $ '- 11 % $ 13 ' # $ µ 3 & '-, # $ & ' 1 (" * $ * )# $ Squared z score y 1 y 3 % " ' & ' µ % + 1 $ '- # $ µ 3 & '-, ML does not fill in the missing values ML uses the observed data to search for the parameters that yield the highest log likelihood (i.e., best fit to the observed data) Including the incomplete cases steers estimation toward a more accurate answer ML effectively borrows information from the observed data to estimate the parameters of the incomplete variables The squared z score is based on the observed data and the corresponding parameter estimates for Y 1 and Y 2 IQ Job Perf IQ Job Perf Estimate the mean job performance rating Deleting the incomplete cases produced an average of µ = (the true value is µ = 10.35) Including the IQ scores from the five incomplete cases should improve estimation The normal distribution is the key to understanding how ML missing data handling works 17
21 ML assumes that IQ and job performance ratings are normally distributed The normal distribution effectively constrains the plausible range of missing values For a given IQ value, some job performance ratings are more plausible than others Consider the incomplete cases with IQ = 85 and IQ = 78 The IQ scores provide information about the missing performance ratings Job Performance IQ 71 Most likely performance rating 9, given that IQ = Job Performance IQ IQ = Job Performance IQ IQ = 85 18
22 The 15 complete cases produced an average of µ = A case with an IQ score of 85 would likely have a performance rating of approximately 9 Based on this information, the job performance mean would be adjusted downward to account for the plausible (but missing) performance rating This adjustment is based solely on the observed IQ value, no imputation is necessary Job Performance IQ IQ = Most likely performance rating 8.2, given that IQ = 78 IQ IQ = 78 The 15 complete cases produced an average of µ = A case with an IQ score of 78 would likely have a performance rating of approximately 8.2 Based on this information, the job performance mean would be adjusted downward to account for the plausible (but missing) performance rating Again, this adjustment is based solely on the observed IQ value, no imputation is necessary Job Performance 77 19
23 Including the five incomplete cases adjusts the parameter values in a way that closely resembles the complete-data results Analysis Method IQ M (SD) Job Perf M (SD) IQ-Perf Correlation Complete Data 100 (14.13) (2.68).54 ML Missing Data (13.77) (2.87) random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY ML standard errors were much smaller 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY
24 Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions 33% of the cases have missing well-being scores, and 33% have missing satisfaction scores The mechanism is MCAR because the data are missing by design X contains complete variables (e.g., gender, IQ, etc.) To reduce costs, 33% of the well-being and job satisfaction scores were intentionally never collected X Well- Being Job Sat Multiple regression model that predicts job performance from psychological well-being and job satisfaction jobperf = B 0 + B 1 (wbeing) + B 2 (jobsat) + $% Well- Being Job Performance $ Job Satisfaction Planned missing data 21
25 The basic Mplus commands TITLE DATA VARIABLE ANALYSIS MODEL MODEL TEST OUTPUT Variable names must be 8 characters or less Denotes a comment line that the program ignores Commands end with : Subcommands end with ; Command lines must be less than 80 characters in length; wrap commands to the next line as needed Capitalization doesn t matter The TITLE command (optional) prints a title on output file TITLE: The title command is optional; mplus multiple regression program; The DATA command points Mplus to the location of the text data on the local drive Free format text files end in.dat or.txt and should include a placeholder for missing values DATA: Location of the data file; file = c:\amda Data\employee.dat ; 22
26 Omit the file path when the data file and the Mplus syntax file are located in the same folder The VARIABLE command lists the order of the variables, selects variables for analysis, and gives the missing value code DATA: Location of the data file; file = employee.dat; VARIABLE: Information about the contents of the data file; names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = wbeing jobsat jobperf; missing = all (-99); ANAYSIS specifies the estimator and other estimation details ANALYSIS: Specify the estimator; estimator = ml; The MODEL command specifies the analysis With complete data, you can use a bare-bones specification Mplus automatically estimates most of the necessary parameters (e.g., variances, means) MODEL: Regression model on means regressed on ; jobperf on wbeing jobsat; 23
27 With missing data on the predictor variables, it is necessary to specify the variances and covariances of the IVs This ensures that cases with missing predictor scores are included in the analysis MODEL: jobperf on wbeing jobsat; Regression; wbeing jobsat; Variances of IVs; wbeing with jobsat; Covariance between IVs; In ML analyses, Wald chi-square statistics are routinely used to test a set of parameters for significance = (" 0)2 SE 2 The Wald test is the ML analog of an F statistic in OLS regression With multiple parameters, the Wald test is expressed in matrices To perform the Wald omnibus test, attach labels to the parameters of interest in the MODEL command Among other things, MODEL TEST generates a Wald test for the specified hypotheses MODEL: (b1) and (b2) are labels needed for Wald test; jobperf on wbeing (b1); Jobperf on jobsat (b2); MODEL TEST: Wald test that both coefficients = 0; b1 and b2 are user-supplied labels from MODEL; b1 = 0; b2 = 0; 24
28 The OUTPUT command specifies additional information that appears in the Mplus output file OUTPUT: standardized gives beta weights and R-square; sampstat gives ML descriptives; patterns prints missing data patterns; standardized sampstat patterns; DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = wbeing jobsat jobperf; missing = all (-99); ANALYSIS: estimator = ml; MODEL: jobperf on wbeing (b1); jobperf on jobsat (b2); wbeing jobsat; wbeing with jobsat; MODEL TEST: b1 = 0; b2 = 0; OUTPUT: standardized sampstat patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) JOBPERF x x x WBEING x x JOBSAT x x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT
29 ESTIMATED SAMPLE STATISTICS Means JOBPERF WBEING JOBSAT Covariances JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT Correlations JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT The Wald statistic (a chi-square with 2 degrees of freedom) is akin to the omnibus F test in OLS regression Wald Test of Parameter Constraints Value Degrees of Freedom 2 P-Value Considered as a set, the two predictors explain significant variation in the dependent variable MODEL RESULTS Unstandardized Coefficients Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON WBEING JOBSAT WBEING WITH JOBSAT Means WBEING JOBSAT Two-Tailed Estimate S.E. Est./S.E. P-Value Intercepts JOBPERF Variances WBEING JOBSAT Residual Variances JOBPERF
30 STANDARDIZED MODEL RESULTS STDYX Standardization Beta Weights Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON WBEING JOBSAT R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value JOBPERF Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions Analysis: Obtain ML descriptive statistics for all quantitative variables (gender and turnover intentions are dummy codes) 27
31 DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = age tenure wbeing jobsat jobperf iq; missing = all (-99); ANALYSIS: estimator = ml; MODEL: [age tenure wbeing jobsat jobperf iq]; Means; age tenure wbeing jobsat jobperf iq; Variances; age tenure wbeing jobsat jobperf iq with age tenure wbeing jobsat jobperf iq; Covariances; OUTPUT: standardized; MODEL RESULTS Covariances Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value AGE WITH TENURE WBEING JOBSAT JOBPERF IQ TENURE WITH WBEING JOBSAT JOBPERF IQ Two-Tailed Estimate S.E. Est./S.E. P-Value WBEING WITH JOBSAT JOBPERF IQ JOBSAT WITH JOBPERF IQ JOBPERF WITH IQ Two-Tailed Estimate S.E. Est./S.E. P-Value Means AGE TENURE WBEING JOBSAT JOBPERF IQ Variances AGE TENURE WBEING JOBSAT JOBPERF IQ
32 STANDARDIZED MODEL RESULTS STDYX Standardization Correlations Two-Tailed Estimate S.E. Est./S.E. P-Value AGE WITH TENURE WBEING JOBSAT JOBPERF IQ TENURE WITH WBEING JOBSAT JOBPERF IQ Two-Tailed Estimate S.E. Est./S.E. P-Value WBEING WITH JOBSAT JOBPERF IQ JOBSAT WITH JOBPERF IQ JOBPERF WITH IQ Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions Analysis: Compare job performance means between employees that do and do not intend to quit in the next six months (the TURNOVER variable), while controlling for well-being, job satisfaction, and job tenure 29
33 Multiple regression provides a straightforward mechanism for estimating ANOVA models from between-group designs TURNOVER is dummy coded (0 = intend to stay, 1 = intend to quit in the next 6 months) jobperf = B 0 + B 1 (wbeing) + B 2 (jobsat) + B 3 (tenure) + B 4 (turnover) + $% Consistent with ANCOVA models, the three covariates are centered at their grand means 118 DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = jobperf tenure wbeing jobsat turnover; missing = all (-99); centering = grandmean(tenure wbeing jobsat); ANALYSIS: estimator = ml; MODEL: jobperf on tenure wbeing jobsat turnover; wbeing jobsat; Incomplete predictors; tenure wbeing jobsat turnover with tenure wbeing jobsat turnover; Covariances among IVs; OUTPUT: standardized sampstat; MODEL RESULTS Unstandardized Estimates Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON TENURE WBEING JOBSAT TURNOVER Intercepts JOBPERF Because the covariates are centered at their means, the intercept estimate (B 0 = 6.217) represents the adjusted mean for the group of employees that intend to stay on the job (TURNOVER = 0) The employees that intend to quit in the next six months (TURNOVER = 1) have a significantly lower job performance mean (B 4 = -.645, p <.001) 30
34 The STDY section standardizes only the dependent variable The estimate for the dummy variable predictor can be interpreted as a Cohen s d effect size (i.e., the adjusted means differ by.24 of a standard deviation unit) STDY Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON TENURE WBEING JOBSAT TURNOVER R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value JOBPERF Repeated measures data set containing six yearly assessments of antisocial behavior from 2000 children Variables: Gender (0 = male, 1 = female), six antisocial behavior scores Analysis: Compare change in the antisocial behavior averages across the six assessments 31
35 The Wald chi-square statistic can serve the same purpose as the omnibus F test from ANOVA The hypothesis for the Wald test specifies that the means are equal across time (i.e., the null hypothesis in ANOVA) The MODEL TEST command can implement the equality constraints on the means Unlike a standard repeated measures ANOVA, the subsequent analysis does not impose a covariance structure on the data (e.g., compound symmetry, sphericity) DATA: file = antisocial.dat; VARIABLE: names = female anti1 anti2 anti3 anti4 anti5 anti6; usevariables = anti1 - anti6; missing = all (-99); ANALYSIS: estimator = ml; MODEL: [anti1 - anti6] (ybar1 ybar6); Means with labels; anti1 - anti6 (var1 var6); Variances with labels; anti1 - anti6 with anti1 - anti6; Covariances; MODEL TEST: ybar1 = ybar2; ybar2 = ybar3; ybar3 = ybar4; ybar4 = ybar5; ybar5 = ybar6; All means set equal; OUTPUT: sampstat patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) ANTI1 x x x x x x ANTI2 x x x x x ANTI3 x x x x ANTI4 x x x ANTI5 x x ANTI6 x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage ANTI1 ANTI2 ANTI3 ANTI4 ANTI5 ANTI ANTI ANTI ANTI ANTI ANTI Covariance Coverage ANTI6 ANTI
36 The Wald statistic (a chi-square with 5 degrees of freedom) is akin to the omnibus F test in ANOVA Wald Test of Parameter Constraints Value Degrees of Freedom 5 P-Value The significant chi-square (& 2 = ) indicates that the null hypothesis of equal means is not supported MODEL RESULTS ML Means Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value Means ANTI ANTI ANTI ANTI ANTI ANTI Two-Tailed Estimate S.E. Est./S.E. P-Value Variances ANTI ANTI ANTI ANTI ANTI ANTI Among other things, the MODEL CONSTRAINT command can compute new parameters from existing estimates For example, the command can compute a standardized mean difference effect size (e.g., Cohen s d) Use the parameter labels to program the following equation d = (ybar1 - ybar6) / sqrt(var1) 33
37 Syntax MODEL CONSTRAINT: new (d); d = (ybar1 - ybar6)/sqrt(var1); Output Two-Tailed Estimate S.E. Est./S.E. P-Value New/Additional Parameters D Point estimates are relatively accurate As kurtosis increases relative to the normal curve Standard errors become too small Likelihood ratio tests become too large (i.e., Type I errors) As kurtosis decreases relative to the normal curve Standard errors become too large Likelihood ratio tests become too small (i.e., Type 2 errors) Standard errors Robust (i.e., sandwich estimator) standard errors Naïve bootstrapping Likelihood ratio test Rescaled test statistic (i.e., Satorra-Bentler chi-square) Bollen-Stine bootstrap Procedures are available in some SEM programs 34
38 Robust standard errors rescale the standard errors (up or down) according to the degree of nonnormality in the data The usual ML standard error is multiplied by a correction term that accounts for outlier scores (or lack thereof) Robust standard errors are useful because they can be used to correct the Wald test Implementing robust standard errors has no impact on the estimation routine or the resulting parameter estimates Treat the sample data as a miniature population and draw B samples (e.g., 1000) of size N with replacement Perform the statistical analysis on each bootstrap sample Save the parameter estimates from each analysis Treat the parameter estimates as data points and compute the standard deviation of each parameter The standard deviation of the estimates is the bootstrap standard error Sample Data Bootstrap 1 Bootstrap 2 Bootstrap 3 ID X Y ID X Y ID X Y ID X Y Bootstrap Sample 1 (N by k) Parameter estimate Sample Data (N by k) Bootstrap Sample 2 (N by k) etc Parameter estimate 2 S.E. = "' Empirical sampling distribution of 1000 parameter values 8 17? 9 17? ? 8 17? 8 17? Bootstrap Sample 1000 (N by k) Parameter estimate ? 10 19? 9 17? 10 19? 35
39 Bivariate data analysis N = 500 X has skewness = 0 and kurtosis = -1 Y has skewness = 0 and kurtosis = 4 A normal distribution has skewness and kurtosis = 0 Robust standard errors ANALYSIS: MLR specifies robust ML; estimator = mlr; Naïve bootstrap standard errors ANALYSIS: 2000 samples; (standard) = naïve bootstrap; bootstrap = 2000 (standard); As is often the case, the bootstrap and robust procedures produced similar standard errors Parameter Standard SE Robust SE Bootstrap SE µ X µ Y " 2 X " XY " 2 Y The likelihood ratio test does not follow a chi-square distribution when the data are nonnormal Two solutions: Rescale the test statistic up or down so that it approximates the correct sampling distribution (i.e., Satorra-Bentler) Leave the sample statistic intact, but use the bootstrap procedure to generate a new sampling distribution, from which a p-value is generated The likelihood ratio bootstrap (i.e., Bollen-Stine bootstrap) is a bit different than the naïve bootstrap 36
40 Questionnaire data from a study of eating disorder risk in a sample of 500 college-aged women Variables: Body mass index (BMI), 7 questionnaire items measuring body dissatisfaction, 6 questionnaire items measuring eating disorder risk, binary indicator of past sexual abuse history (0 = no abuse history, 1= abuse history) All questionnaire items measured on a 7-point Likert scale Analysis: Fit a one-factor CFA model to the six eating disorder risk items By definition, Likert scales violate the ML normality assumption (a normal distribution requires continuous variables) The questionnaire items have asymmetric distributions, with positive skewness (S ranging between.50 and 1.00) and kurtotis (K ranging between.20 and 1.00) An appropriate analysis should implement corrective procedures for non-normal data (e.g., robust standard errors, the bootstrap) Eating Disorder Risk EDR 1 EDR 2 EDR 3 EDR 4 EDR 5 EDR 6 e 1 e 2 e 3 e 4 e 5 e 6 37
41 TITLE: CFA with first factor loading constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: mlr = robust maximum likelihood; estimator = mlr; MODEL: edrisk by edr1 - edr6; OUTPUT: sampstat standardized patterns; By default, Mplus constrains the factor loading of the first indicator (EDR 1 ) to a value of 1 for identification It is also possible to estimate all loadings and constrain the latent factor s variance to 1 Place next to the factor name constrains its variance Placing an * after the loading instructs Mplus to estimate the loading TITLE: CFA with factor variance constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: mlr = robust maximum likelihood; estimator = mlr; MODEL: edrisk by edr1 - edr6*; * = estimate all loadings; edrisk@1; Constrain factor variance to 1; OUTPUT: sampstat standardized patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) EDR1 x x x x x x x x EDR2 x x x x EDR3 x x x x EDR4 x x x x x x x x EDR5 x x x x EDR6 x x x x x x x x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency
42 The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage EDR1 EDR2 EDR3 EDR4 EDR5 EDR EDR EDR EDR EDR EDR Covariance Coverage EDR6 EDR MODEL RESULTS Unstandardized Estimates Robust Std. Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR Indicator Means Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value Intercepts EDR EDR EDR EDR EDR EDR Two-Tailed Estimate S.E. Est./S.E. P-Value Variances EDRISK Residual Variances EDR EDR EDR EDR EDR EDR
43 STANDARDIZED MODEL RESULTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR All loadings are positive and statistically significant, with all standardized values exceeding.60 Measurement intercepts are a byproduct of missing data handling Mplus constrains the latent mean to 0, making the measurement intercepts equivalent to the variable means The MLR estimator gives the Satorra-Bentler rescaled chi-square Chi-Square Test of Model Fit Value 9.440* Degrees of Freedom 9 P-Value Scaling Correction Factor for MLR The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used for chi-square difference testing in the regular way. MLM, MLR and WLSM chi-square difference testing is described on the Mplus website. MLMV, WLSMV,and ULSMV difference testing is done using the DIFFTEST option. TITLE: CFA with first factor loading constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: estimator = ml; (residual) gives Bollen-Stine bootstrap; bootstrap = 2000 (residual); MODEL: edrisk by edr1 - edr6; OUTPUT: sampstat standardized patterns; 40
44 Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Maximum number of iterations for H Convergence criterion for H D-03 Number of bootstrap draws Requested 2000 Completed 2000 MODEL RESULTS Unstandardized Estimates Bootstrap Std. Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR The Bollen-Stine bootstrap gives the standard chisquare test and p-value along with a bootstrap p- value from an empirical bootstrap sampling distribution Chi-Square Test of Model Fit Value Degrees of Freedom 9 P-Value Bootstrap P-Value The positive kurtosis should cause normal-theory standard errors to be too small Both the robust and bootstrap standard errors corrected this downward bias (point estimates are identical in all analyses) Loading Standard ML Robust SE Bootstrap SE EDR 1 N/A N/A N/A EDR EDR EDR EDR EDR
45 ML assumes an MAR mechanism where the propensity for missing data on a Y is related to other variables, but not to the would-be values of Y itself MAR is not automatically satisfied if the causes/correlates of missingness are measured variables The correlates of missingness must be part of the statistical analyses, even if they are not of substantive interest Auxiliary variables (AVs) are ancillary variables that are not of substantive interest The variables are included in the analysis for the purposes of reducing bias and/or improving power Good AVs are correlates of missingness (i.e., potential causes of missing data) or correlates of the incomplete analysis variables Consider an educational study that examines the change in selfreport behavioral problems AVs that correlate with reasons for missingness: Socioeconomic status Student mobility (e.g., survey question asking how likely participant is to move) Standardized test scores AVs the correlate with self-report behavior scores: Disciplinary referrals, absenteeism, juvenile justice incidents Parental supervision Quality of home environment 42
46 A study examines a number of health-related behaviors (e.g., smoking, drinking, sexual activity) in teens The risky sexual behavior questionnaire is only administered to participants above the age of 15 The substantive analysis is a regression model that uses selfesteem to predict risky sexual behavior To satisfy MAR, age must be in the model, even though it is not of substantive interest The model below satisfies MAR, but it is this an undesirable solution Esteem Age The interpretation of the esteem slope becomes a partial regression coefficient Including AVs should not affect the substantive interpretation of the model parameters Risk $ Graham (2003) outlined two approaches for incorporating auxiliary variables into an ML analysis The saturated correlates model transmits information from the auxiliary variables to the analysis variables via a series of correlations Importantly, the model does not alter the substantive interpretation of the parameter estimates The saturated correlates model is easy to implement in SEM Rules for specifying the saturated correlates model with manifest (i.e., measured or observed) variables Correlate an AV with a) Manifest predictor variables b) Other auxiliary variables c) The residual terms of any outcome variables 43
47 X Y The AUXILIARY subcommand automatically implements the saturated correlates model The MODEL section does not mention the AVs AV1 VARIABLE: names = x y av1 av2; usevariables = x y; missing = all (-99); auxiliary = (m) av1 av2; AV2 Implement AVs manually via the MODEL commands Omit the AUXILIARY command with manual specification MODEL: Regression model parameters; y on x; x y; [x y]; AV model correlations; av1 with av2; x y with av1 av2; Rules for specifying the saturated correlates model with latent variables Correlate an AV with a) Manifest predictor variables b) Other auxiliary variables c) The residual terms of any manifest indicator variables The AVs should never correlate with a latent variable or its residual term 44
48 AV1 $ $ $ $ $ $ x1 x2 x3 y1 y2 y3 1 1 AV2 VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3 ; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Latent variable regression model parameters; x by x1 x2 x3; y by y1 y2 y3; y on x; X Y ( VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3 av1 av2; missing = all (-99); MODEL: Latent variable regression model parameters; x by x1 x2 x3; y by y1 y2 y3; y on x; AV model correlations; av1 with av2; x1 x2 x3 y1 y2 y3 with av1 av2; X1 AV1 AV2 Y1 ( Y2 ( 45
49 VARIABLE: names = x1 y1 y2 av1 av2; usevariables = x1 - y2; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Path model parameters; y1 on x1; y2 on y1; x1; VARIABLE: names = x1 y1 y2 av1 av2; usevariables = x1 - y2 av1 av2; missing = all (-99); MODEL: Path model parameters; y1 on x1; y2 on y1; x1; AV model correlations; av1 with av2; x1 y1 y2 with av1 av2; AV1 $ $ $ $ $ $ x1 x2 x3 y1 y2 y3 1 1 AV2 VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Factor model parameters; x by x1 x2 x3; y by y1 y2 y3; x with y; X Y 46
Analyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationMissing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center
Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know
More informationMissing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationAn introduction to modern missing data analyses
Journal of School Psychology 48 (2010) 5 37 An introduction to modern missing data analyses Amanda N. Baraldi, Craig K. Enders Arizona State University, United States Received 19 October 2009; accepted
More informationThe Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
More informationAnalyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest
Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationA Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University 77843-4225
Missing Data 1 Running head: DEALING WITH MISSING DATA A Review of Methods for Dealing with Missing Data Angela L. Cool Texas A&M University 77843-4225 Paper presented at the annual meeting of the Southwest
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationDealing with Missing Data
Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationReporting Statistics in Psychology
This document contains general guidelines for the reporting of statistics in psychology research. The details of statistical reporting vary slightly among different areas of science and also among different
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationCHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationConverting an SPSS Data File to Mplus. by Paul F. Tremblay September 2013
Converting an SPSS Data File to Mplus by Paul F. Tremblay September 2013 Types of Data Files There are two types of ASCII data files that can be considered. They are referred to as delimited (free) and
More informationA REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA
123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake
More informationCopyright 2010 The Guilford Press. Series Editor s Note
This is a chapter excerpt from Guilford Publications. Applied Missing Data Analysis, by Craig K. Enders. Copyright 2010. Series Editor s Note Missing data are a real bane to researchers across all social
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationMplus Tutorial August 2012
August 2012 Mplus for Windows: An Introduction Section 1: Introduction... 3 1.1. About this Document... 3 1.2. Introduction to EFA, CFA, SEM and Mplus... 3 1.3. Accessing Mplus... 3 Section 2: Latent Variable
More informationHow to choose an analysis to handle missing data in longitudinal observational studies
How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationChallenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out
Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance
More informationNonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten Missing Data Treatments
Brockmeier, Kromrey, & Hogarty Nonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten s Lantry L. Brockmeier Jeffrey D. Kromrey Kristine Y. Hogarty Florida A & M University
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationData Cleaning and Missing Data Analysis
Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS
More informationZHIYONG ZHANG AND LIJUAN WANG
PSYCHOMETRIKA VOL. 78, NO. 1, 154 184 JANUARY 2013 DOI: 10.1007/S11336-012-9301-5 METHODS FOR MEDIATION ANALYSIS WITH MISSING DATA ZHIYONG ZHANG AND LIJUAN WANG UNIVERSITY OF NOTRE DAME Despite wide applications
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationSpecification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger www.matildabayclub.net
Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger www.matildabayclub.net This document deals with the specification of a latent variable - in the framework
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationImputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More information2. Making example missing-value datasets: MCAR, MAR, and MNAR
Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More information1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More informationRe-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey
Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge
More informationCHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationMissing Data Techniques for Structural Equation Modeling
Journal of Abnormal Psychology Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 112, No. 4, 545 557 0021-843X/03/$12.00 DOI: 10.1037/0021-843X.112.4.545 Missing Data Techniques
More informationAnalysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationMultiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationDealing with Missing Data
Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationUNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationStephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: PRELIS User s Guide Table of contents INTRODUCTION... 1 GRAPHICAL USER INTERFACE... 2 The Data menu... 2 The Define Variables
More informationCHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationMissing Data. Paul D. Allison INTRODUCTION
4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)
More informationIntroduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group
Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers
More informationNOTES ON HLM TERMINOLOGY
HLML01cc 1 FI=HLML01cc NOTES ON HLM TERMINOLOGY by Ralph B. Taylor breck@rbtaylor.net All materials copyright (c) 1998-2002 by Ralph B. Taylor LEVEL 1 Refers to the model describing units within a grouping:
More informationGender Effects in the Alaska Juvenile Justice System
Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender
More informationIndividual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationRunning head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between
Computer Use and Academic Performance- PISA 1 Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE Using the U.S. PISA results to investigate the relationship between school computer use and student
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationOutline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationMultivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine
2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels
More informationStatistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl
Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
More informationTechnical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More information[This document contains corrections to a few typos that were found on the version available through the journal s web page]
Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,
More informationUNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)
UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design
More information