Craig K. Enders Arizona State University Department of Psychology

Size: px
Start display at page:

Download "Craig K. Enders Arizona State University Department of Psychology craig.enders@asu.edu"

Transcription

1 Craig K. Enders Arizona State University Department of Psychology

2 Topic Page Missing Data Patterns And Missing Data Mechanisms 1 Traditional Missing Data Techniques 7 Maximum Likelihood Estimation And Missing Data Handling 12 Maximum Likelihood Missing Data Handling In Mplus 21 Data Analysis Example: Means, Standard Deviations, and Correlations 27 Data Analysis Example: ANCOVA 29 Data Analysis Example: Repeated Measures ANOVA 31 Dealing With Nonnormal Missing Data 34 Data Analysis Example: Confirmatory Factor Analysis 37 Incorporating Auxiliary Variables Into Maximum Likelihood Analyses 42 Data Analysis Example: Scale Score Analysis With Auxiliary Variables 49

3 Topic Page Multiple Imputation: The Imputation Phase 53 Assessing The Convergence Of MCMC 57 Multiple Imputation: The Analysis and Pooling Phase 64 Multiple Imputation In Mplus 69 Data Analysis Example: Means, Standard Deviations, and Correlations 78 Data Analysis Example: ANCOVA 80 Data Analysis Example: Confirmatory Factor Analysis 82 Data Analysis Example: Scale Score Analysis 87 Data Analysis Example: Multilevel Model 91 Multiple Imputation In SPSS 96 Multiple Imputation In SAS 101 Planned Missing Data Designs 105

4 Discuss missing data theory and assumptions Briefly review traditional analysis approaches Introduce modern missing data handling methods: maximum likelihood (ML) estimation, and multiple imputation (MI) Illustrate software applications of ML and MI Introduce planned missing data designs Routine implementation of these new methods of addressing missing data [maximum likelihood and multiple imputation] will be one of the major changes in research over the next decade Steve West, former Editor of Psychological Methods, quoted in APA s Monitor On Psychology (2002, Vol. 33, p. 70) The number of ML and MI applications in behavioral science journals has increased dramatically in recent years Applied Missing Data Analysis Guilford Press, for additional information All data sets and analysis examples from the book are available for download 4 1

5 The missing data pattern describes the configuration of observed and the missing values in a data set The pattern describes the location of the holes in the data but says nothing about why the data are missing The missing data mechanism describes how an individual s propensity for missing data is related to other variables, if at all The likelihood of a missing value on Y may be related to other variables in the data or to the would-be values of Y; it is also possible that the propensity for missing data is unrelated to other variables A general pattern occurs when missing values are haphazardly dispersed throughout the data matrix Despite the seemingly random pattern, missingness may be systematic ML and MI are flexible approaches that can handle a general missing data pattern Y1 Y2 Y3 Y4 Planned missing data designs introduce intentional missing values (e.g., to reduce respondent burden and maximize resources) e.g., A four-wave longitudinal study where each case provides data at three of the four waves Y1 Y2 Y3 Y4 Theoretical foundations of modern missing data analyses described by Rubin (1976) Missing data mechanisms Missing completely at random (MCAR) Missing at random (MAR) Missing not at random (MNAR) Mechanisms describe how the propensity for a missing value on a variable Y relates to the data, if at all 2

6 For a variable Y, there are potentially two scores: The value of Y A binary variable R that denotes whether Y is observed (e.g., R = 0 if Y is observed, R = 1 if Y is missing) Similarly, there are two sets of parameter estimates: The parameters of substantive interest, (e.g., means, standard deviations, correlations) Nuisance parameters that describe the propensity for missing data, (e.g., logistic regression coefficients) Sometimes we only need to estimate, other times we must estimate and ; the missing data mechanism dictates this The probability of missing data on Y is unrelated to other measured variables and is unrelated to the values of Y itself What researchers think of as haphazard or flip of a coin missingness MCAR is strict because it says that the probability of missing data is unrelated to anything in the data The observed data are a simple random sample of the hypothetically complete data set Employees complete an IQ test during a job interview Supervisors rate job performance after 6-months Performance ratings are missing for no particular reason (e.g., maternity leave, spouse relocates, supervisor quits) IQ Scores Job Ratings (Hypothetical) Job Ratings (MCAR) MCAR is the only testable mechanism If MCAR holds, cases with missing values should be no different from the cases with complete data, on average To test, form an indicator variable that denotes missingness (e.g., 0 = complete, 1 = missing) Compare indicator groups on other variables, e.g., using an independent t test or Cohen s d effect size Multivariate extensions of this approach exist (e.g., Rubin s MCAR test) 3

7 Complete cases have an IQ mean of Missing cases have an IQ mean of Small differences between the two groups suggest haphazard missingness MCAR is plausible IQ Scores Job Ratings (MCAR) Missingness Indicator Confusing terminology because missingness is systematic MAR means that the probability of missing data on Y is related to some other measured variable but is unrelated to the wouldbe values of Y itself After controlling for other variables, there is no association between the propensity for missing data on Y and the would-be values of Y ML and MI require the MAR assumption Prospective employees complete an IQ test during a job interview IQ is a section measure, company does not hire applicants in the lower quartile Supervisors rate job performance after 6-months IQ Scores Job Ratings (Hypothetical) Job Ratings (MAR) The probability of missing data on Y is related to the values of Y itself The most problematic mechanism, can cause substantial bias Requires specialized analysis procedures (e.g., selection models, pattern mixture models) MNAR can also occur when the cause of missingness is a measured variable that is omitted from the analysis 4

8 IQ Scores Job Ratings (Hypothetical) Job Ratings (MNAR) Employees complete an IQ test during a job interview Supervisors rate job performance after 6-months Company terminates employees for poor performance prior to their evaluation IQ JP MCAR Z R IQ JP MAR Z R IQ JP MNAR Z R It is impossible to empirically differentiate MAR or MNAR Proving that the probability of missingness on Y is related (or unrelated) to Y requires the would-be values of Y It is only possible to provide evidence against MCAR Mean differences between the complete and the incomplete cases could be MAR or MNAR MAR is ultimately an untestable assumption Complete cases have an IQ mean of Missing cases have an IQ mean of Large differences between the two groups suggest systematic missingness MCAR is not plausible, the mechanism is MAR or MNAR IQ Scores Job Ratings (MAR) Missingness Indicator

9 Complete cases have an IQ mean of Missing cases have an IQ mean of Large differences between the two groups suggest systematic missingness MCAR is not plausible, the mechanism is MAR or MNAR IQ Scores Job Ratings (MNAR) Missingness Indicator Mechanisms serve as assumptions for a missing data analysis ML and MI assume MAR When MAR (or MCAR) holds, we can estimate the parameters of substantive interest without worrying about the parameters that dictate missingness MNAR analyses (e.g., selection models, pattern mixture models) require a submodel that explains why the data are missing MNAR analyses are difficult and do not necessarily perform better than MAR-based analyses ML and MI require the MAR assumption MAR is not automatically satisfied just because the causes/ correlates of missing data are measured variables MAR is satisfied on an analysis-by-analysis basis The correlates of missingness must be part of the statistical analysis or part of the imputation routine Researchers use simple regression to examine the association between self-esteem and risky sexual behavior in teens Only participants 16 years of age or older fill out the sexual behavior questionnaire MAR is only satisfied if age is included in the regression model Excluding age can produce an MNAR mechanism and can produce biased parameter estimates The correlation between age and sexual behavior dictates this bias 6

10 Researchers rarely know why data are missing At best, measured variables may be correlates of the true reasons for missingness An inclusive analysis strategy incorporates auxiliary variables that are (a) correlates of the incomplete variable or (b) correlates of missingness An inclusive strategy improves the chances of satisfying the MAR assumption Standard regression model (MNAR) Esteem Risky Sex Auxiliary variable regression model (MAR) Esteem Risky Sex e e Age Common traditional approaches, all of which can lead to substantial bias Listwise deletion Pairwise deletion Mean imputation Regression imputation 7

11 20 prospective employees take an IQ test during a job interview The company uses IQ as a selection measure and hires applicants who score above the median A supervisor rates job performance following a 6-month probationary period Performance ratings are missing for the employees who were never hired Job Performance IQ True MAR Job Performance IQ True MAR Job Performance Listwise deletion eliminates all cases with missing values, resulting in a complete data set Pairwise deletion eliminates cases on an analysis-by-analysis basis Discarding data reduces power Deletion methods assume an MCAR mechanism and will yield bias under MAR or MNAR IQ 8

12 Job Performance Excluded cases Replaces (i.e., imputes) missing values on Y with the average of the available Y scores Replacement values pile up at the mean, restricting variability Estimates are severely biased under any missing data mechanism This is the worst possible option IQ Job Performance Filled-in cases have R 2 = 0 Regression equations predict the incomplete variables from the complete variables Substituting observed data as X variables in the regression equation generates predicted values for the missing scores Imputed values fall directly on a regression surface, reducing variation in the data Measures of variation and association are biased IQ 9

13 Job Performance Filled-in cases have R 2 = 1 ^ JobPerf = B 0 + B 1 (IQ) Same as regression imputation, but adds a normal residual term to each predicted value Stochastic regression is the only traditional approach that assumes MAR This is the best of the bunch, but standard errors are too small This method is equivalent to multiple imputation with a single filled-in data set IQ Residual distributions around predicted scores Job Performance ^ JobPerf = B 0 + B 1 (IQ) Job Performance ^ JobPerf = B 0 + B 1 (IQ) IQ IQ 10

14 Job Performance 20 Filled-in cases Random residual terms IQ Analysis Method IQ M (SD) Performance M (SD) IQ-Perf Correlation Complete Data 100 (14.13) (2.68).54 Listwise Deletion (9.70) (2.71).44 Mean Imputation 100 (14.13) (1.87).21 Regression Imputation 100 (14.13) (2.43).72 Stochastic Regression 100 (14.13) (2.74) random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY

15 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD AMI RI SRI µ X µ Y " 2 X " 2 Y " XY r XY Maximum likelihood (ML) identifies the population parameter values that are most consistent with the raw data A likelihood (or log likelihood) function quantifies the discrepancy or fit of the data to the parameters The multivariate normal distribution is the starting point for ML estimation with continuous variables The height of the multivariate normal distribution 1 L i = ( 2 ) p/2 " 1/2 e#.5 yi#µ ( ) T " #1 ( y i #µ) This is key Plugging scores into Y yields a likelihood, L i L i is the relative probability of an individual s Y values, given the parameter estimates in µ and # 12

16 The likelihood value is largely driven by the squared z score to the right of the exponent L i = [scaling factor]e.5(z2 ) Smaller z scores reflect a better fit to the parameters, in the sense that the score is closer to the mean Small z scores = high likelihood = high probability = good fit Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 1 has the higher likelihood value This case is closer to the parameter values and thus has better fit to µ = 0 L 2 = L 1 = Y Y Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 2 now has the higher likelihood value This case is closer to the parameter values and has better fit to µ = -1 L 2 = Y 1 L 1 = Y Likelihoods are very small numbers; taking the natural log makes the math a bit more tractable log L i = p log( 2 ) log " 1 ( 2 y µ i ) T "1 ( y i µ) The log likelihood still quantifies the relative probability of a set of scores, but on a logarithmic scale 13

17 Scores for two cases: Case 1: Y 1 = 0, Y 2 = -.5 Case 2: Y 1 = -1, Y 2 = -1.5 Case 1 has the higher log likelihood (i.e., relative probability) value L 2 =.064 logl 2 = Y 1 L 1 =.164 logl 1 = Y The log likelihood for an entire sample is the sum of the individual log likelihoods log L = log L i The log likelihood summarizes the fit of a sample to a normal distribution with a particular mean vector and covariance matrix ML uses the log likelihood to audition and choose among different plausible parameter values A sample of IQ scores from 20 job applicants Use ML to estimate the population mean Estimation strategy: Compute the sample log likelihood for different values of µ Identify the mean value that produces the highest log likelihood (i.e., best fit to the data; highest probability of producing the sample data) logl = ( ) = ID IQ logl ID IQ logl 1 78 "#$%&' "#($)' "#($)' *#%)+' *#)&(' *#&+,' *#&*&' *#$)"' *#$)"' *#$$-'

18 logl = ( ) = logl= ( ) = ID IQ logl ID IQ logl ID IQ logl ID IQ logl logl = ( ) = ID IQ logl ID IQ logl Mean logl µ = 100 produced the highest log likelihood (i.e., relative probability) µ = 100 has the highest probability of producing this sample of 20 cases µ = 100 is the maximum likelihood estimate (MLE) 15

19 The log likelihood function describes how the sample log likelihood changes between values of µ = 90 and 110 Log Likelihood Population Mean µ = 100 maximizes the log likelihood Population Mean The complete data log likelihood log L i = p log 2 2 ( ) 1 2 log " 1 2 ( y µ i )T "1 ( y i µ) The missing data log likelihood (also called FIML for full information maximum likelihood) log L i = p i log 2 2 ( ) 1 2 log " 1 ( 2 y i µ i ) T 1 ( y i µ i ) The missing data log likelihood has an i subscript on the parameter matrices, µ and # The size and content of these matrices can vary across cases depending on which variables are observed and missing The squared z score that determines each case s likelihood (i.e., fit) is computed using the parameters for which a case has data Consequently, ML uses all available data to estimate parameters; some cases contribute more information than others An analysis with three variables: Y 1, Y 2, and Y 3 The squared z score for the complete cases is based on all parameters log L i = K i (" 13 * 2 log $ * $ $ * )# y 1 y 2 y 3 % " ' $ ' $ ' $ & # µ 1 µ 2 µ 3 T 1 % + " ' % (" $ 13 ' * $ '- $ ' * $ ' - $ ' $ &, # $ &' * )# y 1 y 2 y 3 % " ' $ ' $ ' $ & # µ 1 µ 2 µ 3 % + '- '- ' - &, Squared z score 16

20 Cases with missing values on Y 2 would have the following log likelihood log L i = K i 1 2 log 11 (" 13.5* $ * )# $ y 1 y 3 T % " ' & ' µ % + " 1 $ '- 11 % $ 13 ' # $ µ 3 & '-, # $ & ' 1 (" * $ * )# $ Squared z score y 1 y 3 % " ' & ' µ % + 1 $ '- # $ µ 3 & '-, ML does not fill in the missing values ML uses the observed data to search for the parameters that yield the highest log likelihood (i.e., best fit to the observed data) Including the incomplete cases steers estimation toward a more accurate answer ML effectively borrows information from the observed data to estimate the parameters of the incomplete variables The squared z score is based on the observed data and the corresponding parameter estimates for Y 1 and Y 2 IQ Job Perf IQ Job Perf Estimate the mean job performance rating Deleting the incomplete cases produced an average of µ = (the true value is µ = 10.35) Including the IQ scores from the five incomplete cases should improve estimation The normal distribution is the key to understanding how ML missing data handling works 17

21 ML assumes that IQ and job performance ratings are normally distributed The normal distribution effectively constrains the plausible range of missing values For a given IQ value, some job performance ratings are more plausible than others Consider the incomplete cases with IQ = 85 and IQ = 78 The IQ scores provide information about the missing performance ratings Job Performance IQ 71 Most likely performance rating 9, given that IQ = Job Performance IQ IQ = Job Performance IQ IQ = 85 18

22 The 15 complete cases produced an average of µ = A case with an IQ score of 85 would likely have a performance rating of approximately 9 Based on this information, the job performance mean would be adjusted downward to account for the plausible (but missing) performance rating This adjustment is based solely on the observed IQ value, no imputation is necessary Job Performance IQ IQ = Most likely performance rating 8.2, given that IQ = 78 IQ IQ = 78 The 15 complete cases produced an average of µ = A case with an IQ score of 78 would likely have a performance rating of approximately 8.2 Based on this information, the job performance mean would be adjusted downward to account for the plausible (but missing) performance rating Again, this adjustment is based solely on the observed IQ value, no imputation is necessary Job Performance 77 19

23 Including the five incomplete cases adjusts the parameter values in a way that closely resembles the complete-data results Analysis Method IQ M (SD) Job Perf M (SD) IQ-Perf Correlation Complete Data 100 (14.13) (2.68).54 ML Missing Data (13.77) (2.87) random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY ML standard errors were much smaller 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets 1000 random samples of N = 250 from a bivariate normal distribution with 50% missing data on Y Average parameter estimates from the 1000 data sets Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY Parameter True Value LD ML µ X µ Y " 2 X " 2 Y " XY

24 Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions 33% of the cases have missing well-being scores, and 33% have missing satisfaction scores The mechanism is MCAR because the data are missing by design X contains complete variables (e.g., gender, IQ, etc.) To reduce costs, 33% of the well-being and job satisfaction scores were intentionally never collected X Well- Being Job Sat Multiple regression model that predicts job performance from psychological well-being and job satisfaction jobperf = B 0 + B 1 (wbeing) + B 2 (jobsat) + $% Well- Being Job Performance $ Job Satisfaction Planned missing data 21

25 The basic Mplus commands TITLE DATA VARIABLE ANALYSIS MODEL MODEL TEST OUTPUT Variable names must be 8 characters or less Denotes a comment line that the program ignores Commands end with : Subcommands end with ; Command lines must be less than 80 characters in length; wrap commands to the next line as needed Capitalization doesn t matter The TITLE command (optional) prints a title on output file TITLE: The title command is optional; mplus multiple regression program; The DATA command points Mplus to the location of the text data on the local drive Free format text files end in.dat or.txt and should include a placeholder for missing values DATA: Location of the data file; file = c:\amda Data\employee.dat ; 22

26 Omit the file path when the data file and the Mplus syntax file are located in the same folder The VARIABLE command lists the order of the variables, selects variables for analysis, and gives the missing value code DATA: Location of the data file; file = employee.dat; VARIABLE: Information about the contents of the data file; names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = wbeing jobsat jobperf; missing = all (-99); ANAYSIS specifies the estimator and other estimation details ANALYSIS: Specify the estimator; estimator = ml; The MODEL command specifies the analysis With complete data, you can use a bare-bones specification Mplus automatically estimates most of the necessary parameters (e.g., variances, means) MODEL: Regression model on means regressed on ; jobperf on wbeing jobsat; 23

27 With missing data on the predictor variables, it is necessary to specify the variances and covariances of the IVs This ensures that cases with missing predictor scores are included in the analysis MODEL: jobperf on wbeing jobsat; Regression; wbeing jobsat; Variances of IVs; wbeing with jobsat; Covariance between IVs; In ML analyses, Wald chi-square statistics are routinely used to test a set of parameters for significance = (" 0)2 SE 2 The Wald test is the ML analog of an F statistic in OLS regression With multiple parameters, the Wald test is expressed in matrices To perform the Wald omnibus test, attach labels to the parameters of interest in the MODEL command Among other things, MODEL TEST generates a Wald test for the specified hypotheses MODEL: (b1) and (b2) are labels needed for Wald test; jobperf on wbeing (b1); Jobperf on jobsat (b2); MODEL TEST: Wald test that both coefficients = 0; b1 and b2 are user-supplied labels from MODEL; b1 = 0; b2 = 0; 24

28 The OUTPUT command specifies additional information that appears in the Mplus output file OUTPUT: standardized gives beta weights and R-square; sampstat gives ML descriptives; patterns prints missing data patterns; standardized sampstat patterns; DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = wbeing jobsat jobperf; missing = all (-99); ANALYSIS: estimator = ml; MODEL: jobperf on wbeing (b1); jobperf on jobsat (b2); wbeing jobsat; wbeing with jobsat; MODEL TEST: b1 = 0; b2 = 0; OUTPUT: standardized sampstat patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) JOBPERF x x x WBEING x x JOBSAT x x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT

29 ESTIMATED SAMPLE STATISTICS Means JOBPERF WBEING JOBSAT Covariances JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT Correlations JOBPERF WBEING JOBSAT JOBPERF WBEING JOBSAT The Wald statistic (a chi-square with 2 degrees of freedom) is akin to the omnibus F test in OLS regression Wald Test of Parameter Constraints Value Degrees of Freedom 2 P-Value Considered as a set, the two predictors explain significant variation in the dependent variable MODEL RESULTS Unstandardized Coefficients Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON WBEING JOBSAT WBEING WITH JOBSAT Means WBEING JOBSAT Two-Tailed Estimate S.E. Est./S.E. P-Value Intercepts JOBPERF Variances WBEING JOBSAT Residual Variances JOBPERF

30 STANDARDIZED MODEL RESULTS STDYX Standardization Beta Weights Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON WBEING JOBSAT R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value JOBPERF Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions Analysis: Obtain ML descriptive statistics for all quantitative variables (gender and turnover intentions are dummy codes) 27

31 DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = age tenure wbeing jobsat jobperf iq; missing = all (-99); ANALYSIS: estimator = ml; MODEL: [age tenure wbeing jobsat jobperf iq]; Means; age tenure wbeing jobsat jobperf iq; Variances; age tenure wbeing jobsat jobperf iq with age tenure wbeing jobsat jobperf iq; Covariances; OUTPUT: standardized; MODEL RESULTS Covariances Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value AGE WITH TENURE WBEING JOBSAT JOBPERF IQ TENURE WITH WBEING JOBSAT JOBPERF IQ Two-Tailed Estimate S.E. Est./S.E. P-Value WBEING WITH JOBSAT JOBPERF IQ JOBSAT WITH JOBPERF IQ JOBPERF WITH IQ Two-Tailed Estimate S.E. Est./S.E. P-Value Means AGE TENURE WBEING JOBSAT JOBPERF IQ Variances AGE TENURE WBEING JOBSAT JOBPERF IQ

32 STANDARDIZED MODEL RESULTS STDYX Standardization Correlations Two-Tailed Estimate S.E. Est./S.E. P-Value AGE WITH TENURE WBEING JOBSAT JOBPERF IQ TENURE WITH WBEING JOBSAT JOBPERF IQ Two-Tailed Estimate S.E. Est./S.E. P-Value WBEING WITH JOBSAT JOBPERF IQ JOBSAT WITH JOBPERF IQ JOBPERF WITH IQ Data set containing scores from 480 employees on eight workrelated variables Variables: Age, gender, job tenure, IQ, psychological wellbeing, job satisfaction, job performance, and turnover intentions Analysis: Compare job performance means between employees that do and do not intend to quit in the next six months (the TURNOVER variable), while controlling for well-being, job satisfaction, and job tenure 29

33 Multiple regression provides a straightforward mechanism for estimating ANOVA models from between-group designs TURNOVER is dummy coded (0 = intend to stay, 1 = intend to quit in the next 6 months) jobperf = B 0 + B 1 (wbeing) + B 2 (jobsat) + B 3 (tenure) + B 4 (turnover) + $% Consistent with ANCOVA models, the three covariates are centered at their grand means 118 DATA: file = employee.dat; VARIABLE: names = id age tenure female wbeing jobsat jobperf turnover iq; usevariables = jobperf tenure wbeing jobsat turnover; missing = all (-99); centering = grandmean(tenure wbeing jobsat); ANALYSIS: estimator = ml; MODEL: jobperf on tenure wbeing jobsat turnover; wbeing jobsat; Incomplete predictors; tenure wbeing jobsat turnover with tenure wbeing jobsat turnover; Covariances among IVs; OUTPUT: standardized sampstat; MODEL RESULTS Unstandardized Estimates Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON TENURE WBEING JOBSAT TURNOVER Intercepts JOBPERF Because the covariates are centered at their means, the intercept estimate (B 0 = 6.217) represents the adjusted mean for the group of employees that intend to stay on the job (TURNOVER = 0) The employees that intend to quit in the next six months (TURNOVER = 1) have a significantly lower job performance mean (B 4 = -.645, p <.001) 30

34 The STDY section standardizes only the dependent variable The estimate for the dummy variable predictor can be interpreted as a Cohen s d effect size (i.e., the adjusted means differ by.24 of a standard deviation unit) STDY Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value JOBPERF ON TENURE WBEING JOBSAT TURNOVER R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value JOBPERF Repeated measures data set containing six yearly assessments of antisocial behavior from 2000 children Variables: Gender (0 = male, 1 = female), six antisocial behavior scores Analysis: Compare change in the antisocial behavior averages across the six assessments 31

35 The Wald chi-square statistic can serve the same purpose as the omnibus F test from ANOVA The hypothesis for the Wald test specifies that the means are equal across time (i.e., the null hypothesis in ANOVA) The MODEL TEST command can implement the equality constraints on the means Unlike a standard repeated measures ANOVA, the subsequent analysis does not impose a covariance structure on the data (e.g., compound symmetry, sphericity) DATA: file = antisocial.dat; VARIABLE: names = female anti1 anti2 anti3 anti4 anti5 anti6; usevariables = anti1 - anti6; missing = all (-99); ANALYSIS: estimator = ml; MODEL: [anti1 - anti6] (ybar1 ybar6); Means with labels; anti1 - anti6 (var1 var6); Variances with labels; anti1 - anti6 with anti1 - anti6; Covariances; MODEL TEST: ybar1 = ybar2; ybar2 = ybar3; ybar3 = ybar4; ybar4 = ybar5; ybar5 = ybar6; All means set equal; OUTPUT: sampstat patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) ANTI1 x x x x x x ANTI2 x x x x x ANTI3 x x x x ANTI4 x x x ANTI5 x x ANTI6 x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage ANTI1 ANTI2 ANTI3 ANTI4 ANTI5 ANTI ANTI ANTI ANTI ANTI ANTI Covariance Coverage ANTI6 ANTI

36 The Wald statistic (a chi-square with 5 degrees of freedom) is akin to the omnibus F test in ANOVA Wald Test of Parameter Constraints Value Degrees of Freedom 5 P-Value The significant chi-square (& 2 = ) indicates that the null hypothesis of equal means is not supported MODEL RESULTS ML Means Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value Means ANTI ANTI ANTI ANTI ANTI ANTI Two-Tailed Estimate S.E. Est./S.E. P-Value Variances ANTI ANTI ANTI ANTI ANTI ANTI Among other things, the MODEL CONSTRAINT command can compute new parameters from existing estimates For example, the command can compute a standardized mean difference effect size (e.g., Cohen s d) Use the parameter labels to program the following equation d = (ybar1 - ybar6) / sqrt(var1) 33

37 Syntax MODEL CONSTRAINT: new (d); d = (ybar1 - ybar6)/sqrt(var1); Output Two-Tailed Estimate S.E. Est./S.E. P-Value New/Additional Parameters D Point estimates are relatively accurate As kurtosis increases relative to the normal curve Standard errors become too small Likelihood ratio tests become too large (i.e., Type I errors) As kurtosis decreases relative to the normal curve Standard errors become too large Likelihood ratio tests become too small (i.e., Type 2 errors) Standard errors Robust (i.e., sandwich estimator) standard errors Naïve bootstrapping Likelihood ratio test Rescaled test statistic (i.e., Satorra-Bentler chi-square) Bollen-Stine bootstrap Procedures are available in some SEM programs 34

38 Robust standard errors rescale the standard errors (up or down) according to the degree of nonnormality in the data The usual ML standard error is multiplied by a correction term that accounts for outlier scores (or lack thereof) Robust standard errors are useful because they can be used to correct the Wald test Implementing robust standard errors has no impact on the estimation routine or the resulting parameter estimates Treat the sample data as a miniature population and draw B samples (e.g., 1000) of size N with replacement Perform the statistical analysis on each bootstrap sample Save the parameter estimates from each analysis Treat the parameter estimates as data points and compute the standard deviation of each parameter The standard deviation of the estimates is the bootstrap standard error Sample Data Bootstrap 1 Bootstrap 2 Bootstrap 3 ID X Y ID X Y ID X Y ID X Y Bootstrap Sample 1 (N by k) Parameter estimate Sample Data (N by k) Bootstrap Sample 2 (N by k) etc Parameter estimate 2 S.E. = "' Empirical sampling distribution of 1000 parameter values 8 17? 9 17? ? 8 17? 8 17? Bootstrap Sample 1000 (N by k) Parameter estimate ? 10 19? 9 17? 10 19? 35

39 Bivariate data analysis N = 500 X has skewness = 0 and kurtosis = -1 Y has skewness = 0 and kurtosis = 4 A normal distribution has skewness and kurtosis = 0 Robust standard errors ANALYSIS: MLR specifies robust ML; estimator = mlr; Naïve bootstrap standard errors ANALYSIS: 2000 samples; (standard) = naïve bootstrap; bootstrap = 2000 (standard); As is often the case, the bootstrap and robust procedures produced similar standard errors Parameter Standard SE Robust SE Bootstrap SE µ X µ Y " 2 X " XY " 2 Y The likelihood ratio test does not follow a chi-square distribution when the data are nonnormal Two solutions: Rescale the test statistic up or down so that it approximates the correct sampling distribution (i.e., Satorra-Bentler) Leave the sample statistic intact, but use the bootstrap procedure to generate a new sampling distribution, from which a p-value is generated The likelihood ratio bootstrap (i.e., Bollen-Stine bootstrap) is a bit different than the naïve bootstrap 36

40 Questionnaire data from a study of eating disorder risk in a sample of 500 college-aged women Variables: Body mass index (BMI), 7 questionnaire items measuring body dissatisfaction, 6 questionnaire items measuring eating disorder risk, binary indicator of past sexual abuse history (0 = no abuse history, 1= abuse history) All questionnaire items measured on a 7-point Likert scale Analysis: Fit a one-factor CFA model to the six eating disorder risk items By definition, Likert scales violate the ML normality assumption (a normal distribution requires continuous variables) The questionnaire items have asymmetric distributions, with positive skewness (S ranging between.50 and 1.00) and kurtotis (K ranging between.20 and 1.00) An appropriate analysis should implement corrective procedures for non-normal data (e.g., robust standard errors, the bootstrap) Eating Disorder Risk EDR 1 EDR 2 EDR 3 EDR 4 EDR 5 EDR 6 e 1 e 2 e 3 e 4 e 5 e 6 37

41 TITLE: CFA with first factor loading constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: mlr = robust maximum likelihood; estimator = mlr; MODEL: edrisk by edr1 - edr6; OUTPUT: sampstat standardized patterns; By default, Mplus constrains the factor loading of the first indicator (EDR 1 ) to a value of 1 for identification It is also possible to estimate all loadings and constrain the latent factor s variance to 1 Place next to the factor name constrains its variance Placing an * after the loading instructs Mplus to estimate the loading TITLE: CFA with factor variance constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: mlr = robust maximum likelihood; estimator = mlr; MODEL: edrisk by edr1 - edr6*; * = estimate all loadings; edrisk@1; Constrain factor variance to 1; OUTPUT: sampstat standardized patterns; SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) EDR1 x x x x x x x x EDR2 x x x x EDR3 x x x x EDR4 x x x x x x x x EDR5 x x x x EDR6 x x x x x x x x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency Pattern Frequency

42 The covariance coverage matrix gives the proportion of complete cases on each variable or variable pair PROPORTION OF DATA PRESENT Covariance Coverage EDR1 EDR2 EDR3 EDR4 EDR5 EDR EDR EDR EDR EDR EDR Covariance Coverage EDR6 EDR MODEL RESULTS Unstandardized Estimates Robust Std. Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR Indicator Means Standard Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value Intercepts EDR EDR EDR EDR EDR EDR Two-Tailed Estimate S.E. Est./S.E. P-Value Variances EDRISK Residual Variances EDR EDR EDR EDR EDR EDR

43 STANDARDIZED MODEL RESULTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR All loadings are positive and statistically significant, with all standardized values exceeding.60 Measurement intercepts are a byproduct of missing data handling Mplus constrains the latent mean to 0, making the measurement intercepts equivalent to the variable means The MLR estimator gives the Satorra-Bentler rescaled chi-square Chi-Square Test of Model Fit Value 9.440* Degrees of Freedom 9 P-Value Scaling Correction Factor for MLR The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used for chi-square difference testing in the regular way. MLM, MLR and WLSM chi-square difference testing is described on the Mplus website. MLMV, WLSMV,and ULSMV difference testing is done using the DIFFTEST option. TITLE: CFA with first factor loading constrained to 1; DATA: file = eatingrisk.dat; VARIABLE: names = abuse bmi bds1 - bds7 edr1 - edr6; usevariables = edr1 - edr6; missing = all (-99); ANALYSIS: estimator = ml; (residual) gives Bollen-Stine bootstrap; bootstrap = 2000 (residual); MODEL: edrisk by edr1 - edr6; OUTPUT: sampstat standardized patterns; 40

44 Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Maximum number of iterations for H Convergence criterion for H D-03 Number of bootstrap draws Requested 2000 Completed 2000 MODEL RESULTS Unstandardized Estimates Bootstrap Std. Error z Test Two-Tailed Estimate S.E. Est./S.E. P-Value EDRISK BY EDR EDR EDR EDR EDR EDR The Bollen-Stine bootstrap gives the standard chisquare test and p-value along with a bootstrap p- value from an empirical bootstrap sampling distribution Chi-Square Test of Model Fit Value Degrees of Freedom 9 P-Value Bootstrap P-Value The positive kurtosis should cause normal-theory standard errors to be too small Both the robust and bootstrap standard errors corrected this downward bias (point estimates are identical in all analyses) Loading Standard ML Robust SE Bootstrap SE EDR 1 N/A N/A N/A EDR EDR EDR EDR EDR

45 ML assumes an MAR mechanism where the propensity for missing data on a Y is related to other variables, but not to the would-be values of Y itself MAR is not automatically satisfied if the causes/correlates of missingness are measured variables The correlates of missingness must be part of the statistical analyses, even if they are not of substantive interest Auxiliary variables (AVs) are ancillary variables that are not of substantive interest The variables are included in the analysis for the purposes of reducing bias and/or improving power Good AVs are correlates of missingness (i.e., potential causes of missing data) or correlates of the incomplete analysis variables Consider an educational study that examines the change in selfreport behavioral problems AVs that correlate with reasons for missingness: Socioeconomic status Student mobility (e.g., survey question asking how likely participant is to move) Standardized test scores AVs the correlate with self-report behavior scores: Disciplinary referrals, absenteeism, juvenile justice incidents Parental supervision Quality of home environment 42

46 A study examines a number of health-related behaviors (e.g., smoking, drinking, sexual activity) in teens The risky sexual behavior questionnaire is only administered to participants above the age of 15 The substantive analysis is a regression model that uses selfesteem to predict risky sexual behavior To satisfy MAR, age must be in the model, even though it is not of substantive interest The model below satisfies MAR, but it is this an undesirable solution Esteem Age The interpretation of the esteem slope becomes a partial regression coefficient Including AVs should not affect the substantive interpretation of the model parameters Risk $ Graham (2003) outlined two approaches for incorporating auxiliary variables into an ML analysis The saturated correlates model transmits information from the auxiliary variables to the analysis variables via a series of correlations Importantly, the model does not alter the substantive interpretation of the parameter estimates The saturated correlates model is easy to implement in SEM Rules for specifying the saturated correlates model with manifest (i.e., measured or observed) variables Correlate an AV with a) Manifest predictor variables b) Other auxiliary variables c) The residual terms of any outcome variables 43

47 X Y The AUXILIARY subcommand automatically implements the saturated correlates model The MODEL section does not mention the AVs AV1 VARIABLE: names = x y av1 av2; usevariables = x y; missing = all (-99); auxiliary = (m) av1 av2; AV2 Implement AVs manually via the MODEL commands Omit the AUXILIARY command with manual specification MODEL: Regression model parameters; y on x; x y; [x y]; AV model correlations; av1 with av2; x y with av1 av2; Rules for specifying the saturated correlates model with latent variables Correlate an AV with a) Manifest predictor variables b) Other auxiliary variables c) The residual terms of any manifest indicator variables The AVs should never correlate with a latent variable or its residual term 44

48 AV1 $ $ $ $ $ $ x1 x2 x3 y1 y2 y3 1 1 AV2 VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3 ; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Latent variable regression model parameters; x by x1 x2 x3; y by y1 y2 y3; y on x; X Y ( VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3 av1 av2; missing = all (-99); MODEL: Latent variable regression model parameters; x by x1 x2 x3; y by y1 y2 y3; y on x; AV model correlations; av1 with av2; x1 x2 x3 y1 y2 y3 with av1 av2; X1 AV1 AV2 Y1 ( Y2 ( 45

49 VARIABLE: names = x1 y1 y2 av1 av2; usevariables = x1 - y2; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Path model parameters; y1 on x1; y2 on y1; x1; VARIABLE: names = x1 y1 y2 av1 av2; usevariables = x1 - y2 av1 av2; missing = all (-99); MODEL: Path model parameters; y1 on x1; y2 on y1; x1; AV model correlations; av1 with av2; x1 y1 y2 with av1 av2; AV1 $ $ $ $ $ $ x1 x2 x3 y1 y2 y3 1 1 AV2 VARIABLE: names = x1 x2 x3 y1 y2 y3 av1 av2; usevariables = x1 - y3; missing = all (-99); auxiliary = (m) av1 av2; MODEL: Factor model parameters; x by x1 x2 x3; y by y1 y2 y3; x with y; X Y 46

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

An introduction to modern missing data analyses

An introduction to modern missing data analyses Journal of School Psychology 48 (2010) 5 37 An introduction to modern missing data analyses Amanda N. Baraldi, Craig K. Enders Arizona State University, United States Received 19 October 2009; accepted

More information

The Latent Variable Growth Model In Practice. Individual Development Over Time

The Latent Variable Growth Model In Practice. Individual Development Over Time The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University 77843-4225

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University 77843-4225 Missing Data 1 Running head: DEALING WITH MISSING DATA A Review of Methods for Dealing with Missing Data Angela L. Cool Texas A&M University 77843-4225 Paper presented at the annual meeting of the Southwest

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Reporting Statistics in Psychology

Reporting Statistics in Psychology This document contains general guidelines for the reporting of statistics in psychology research. The details of statistical reporting vary slightly among different areas of science and also among different

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Converting an SPSS Data File to Mplus. by Paul F. Tremblay September 2013

Converting an SPSS Data File to Mplus. by Paul F. Tremblay September 2013 Converting an SPSS Data File to Mplus by Paul F. Tremblay September 2013 Types of Data Files There are two types of ASCII data files that can be considered. They are referred to as delimited (free) and

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

Copyright 2010 The Guilford Press. Series Editor s Note

Copyright 2010 The Guilford Press. Series Editor s Note This is a chapter excerpt from Guilford Publications. Applied Missing Data Analysis, by Craig K. Enders. Copyright 2010. Series Editor s Note Missing data are a real bane to researchers across all social

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Mplus Tutorial August 2012

Mplus Tutorial August 2012 August 2012 Mplus for Windows: An Introduction Section 1: Introduction... 3 1.1. About this Document... 3 1.2. Introduction to EFA, CFA, SEM and Mplus... 3 1.3. Accessing Mplus... 3 Section 2: Latent Variable

More information

How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance

More information

Nonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten Missing Data Treatments

Nonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten Missing Data Treatments Brockmeier, Kromrey, & Hogarty Nonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten s Lantry L. Brockmeier Jeffrey D. Kromrey Kristine Y. Hogarty Florida A & M University

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

More information

ZHIYONG ZHANG AND LIJUAN WANG

ZHIYONG ZHANG AND LIJUAN WANG PSYCHOMETRIKA VOL. 78, NO. 1, 154 184 JANUARY 2013 DOI: 10.1007/S11336-012-9301-5 METHODS FOR MEDIATION ANALYSIS WITH MISSING DATA ZHIYONG ZHANG AND LIJUAN WANG UNIVERSITY OF NOTRE DAME Despite wide applications

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger www.matildabayclub.net

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger www.matildabayclub.net Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger www.matildabayclub.net This document deals with the specification of a latent variable - in the framework

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

2. Making example missing-value datasets: MCAR, MAR, and MNAR

2. Making example missing-value datasets: MCAR, MAR, and MNAR Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Missing Data Techniques for Structural Equation Modeling

Missing Data Techniques for Structural Equation Modeling Journal of Abnormal Psychology Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 112, No. 4, 545 557 0021-843X/03/$12.00 DOI: 10.1037/0021-843X.112.4.545 Missing Data Techniques

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: PRELIS User s Guide Table of contents INTRODUCTION... 1 GRAPHICAL USER INTERFACE... 2 The Data menu... 2 The Define Variables

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Missing Data. Paul D. Allison INTRODUCTION

Missing Data. Paul D. Allison INTRODUCTION 4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)

More information

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers

More information

NOTES ON HLM TERMINOLOGY

NOTES ON HLM TERMINOLOGY HLML01cc 1 FI=HLML01cc NOTES ON HLM TERMINOLOGY by Ralph B. Taylor breck@rbtaylor.net All materials copyright (c) 1998-2002 by Ralph B. Taylor LEVEL 1 Refers to the model describing units within a grouping:

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between

Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between Computer Use and Academic Performance- PISA 1 Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE Using the U.S. PISA results to investigate the relationship between school computer use and student

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

[This document contains corrections to a few typos that were found on the version available through the journal s web page] Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information