Comparison of Estimation Methods for Complex Survey Data Analysis
|
|
|
- Peregrine McCormick
- 9 years ago
- Views:
Transcription
1 Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA Bengt Muthen, UCLA, 2023 Moore Hall, Los Angeles, CA This research was supported by SBIR grant R43 AA from NIAAA to Muthen & Muthen. The authors are thankful to Stephe Du Toit for assistance with the LISREL software package. 1
2 Abstract Recently structural equation modeling software packages have implemented more accurate statistical methodology for analyzing complex survey data. The computational algorithms however vary across the packages and produce different results even for simple models. In this note we conduct simulation studies to compare the performance of the methods implemented in Mplus and LISREL. The Mplus algorithm produced more accurate results. 2
3 Recently several structural equation modeling and multilevel software packages have implemented more accurate statistical methodology for analyzing complex survey data. Despite these improvements large differences in the results obtained from different packages are being reported in practical applications, see Chantala and Suchindran (2006) for example. In this note we conduct simulation studies to evaluate the performance of the methods implemented in Mplus 4.1 and LISREL 8.8. Mplus is published by Muthen & Muthen and LISREL is published by Scientific Software International. First we conduct a simulation study on a two-level random effect regression model estimated from data that includes within level sampling weights. In a second simulation study we evaluate the performance of the chi-square test statistic for complex survey data using a simple bivariate mean, variance and covariance model. TWO-LEVEL REGRESSION For analyzing two-level models with complex survey data Mplus 4.1 implements the multilevel pseudo maximum likelihood (MPML) estimation method, see Asparouhov (2006) and Asparouhov and Muthen (2006). This method allows us to estimate twolevel models when the data is obtained from a multistage stratified sampling design and the sampling units at each sampling level are selected with unequal probabilities. Sampling weights at both the within and the between level can be used with this estimator. The sampling weights on the cluster (between) level are obtained by w j = 1 p j (1) where p j is the probability that cluster j is included in the sample. weights on the individual (within) level are obtained by The sampling w ji = 1 p i j (2) where p i j is the probability that individual i in cluster j is selected, given that cluster j is selected. Using the unscaled within level weights in the estimation method can lead to bias in the parameter estimates. A number of different scaling methods have been proposed in Pfeffermann et al. (1998). The choice of scaling method affects the 3
4 parameter estimates to some extent. The simulation studies in Pfeffermann et alt. (1998) and Asparouhov (2004) conclude that scaling to cluster sample size tends to have the most robust performance. With this scaling method the scaled within level weights wji are obtained by w ji = w ji n j i w ji where n j is the size of cluster j. Note that i w ji = n j. Mplus 4.1 uses the total combined weight variable, which is the product of the within and the between level weight variables (3) w j w ji. (4) LISREL 8.8 implements the PWIGLS method described in Pfeffermann et alt. (1998) using the scaling to cluster sample size for the within level weights as well. We conduct a simulation study on a two-level regression model with a normally distributed dependent variable Y and two normally distributed independent variables X and Z. The covariate Z has a fixed effect on Y while the covariate X has a random effect on Y. This two-level regression model is described as follows Y ji = α j + β j X ji + γz ji + ε ji (5) where α j and β j are normally distributed cluster level random effects with means α = 0.5 and β = 0.1 and variances ψ α = 1 and ψ β = 0.2 and covariance ρ = 0.3. The residual effect ε ij is a mean zero independent normal random variable with variance θ = 1. The covariates X ji is generated from a normal distribution with mean 3 and variance 2 while Z ji is generated from a standard normal distribution. The fixed effect γ is set at 0.5. The model has a total of seven parameters. We generate 100 samples of size Each sample has 1000 clusters of size 25. To introduce unequal probability sampling on the within level we retain each observation in the sample with probability p i j = Exp( Y ij /2). (6) For all observations in the sample we compute the weight variable as w ji = 1 p i j = 1 + Exp( Y ij /2). (7) 4
5 Consequently we rescale the within level weights using formula (3). To introduce unequal probability sampling on the between level we retain clusters in the sample with probability p j = Exp( α j ). (8) For all clusters in the sample we compute the between level weight as w ji = 1 p j = 1 + Exp( α j ). (9) We estimate model (5) for each sample using Mplus and LISREL. Within the LISREL software package this kind of models are estimated by the MULTILEV module. Table 1 contains the bias, the mean squared errors (MSE) and the confidence interval coverage for both software packages. The Mplus bias for all parameters is very close to 0, however the LISREL bias is relatively large for the α and ψ α parameters. When conducting the simulation study with informative selection on the within level only or on the between level only the parameter estimates and standard errors between Mplus and LISREL are identical. The differences reported in Table 1 occur only when we use sampling weights at both levels. The LISREL bias is also directly affected by the informativeness of the selection on the between level. The stronger the association between α j and the probability of selection the bigger the bias is. This fact also explains why only the mean and the variance parameters α j have this bias. If the selection on the between level was associated with β j we would see this bias for the mean and variance of β j. The LISREL bias also resulted in larger MSE when compared to Mplus MSE. The coverage probabilities were overall better in Mplus although both packages were far from the nominal 95% probability. The ratio between the standard deviation of the parameter estimates and the standard errors were close to 1 in both programs. This means that the drop in the coverage is caused primarily by the bias in the parameter estimates, which tends to disappear as the number of clusters in the sample and the cluster sample sizes increase. Even though in our simulation study the results obtained with Mplus were somewhat more accurate than those obtained with LISREL, there is no guarantee that this will be the case for other simulation studies or in specific practical applications. When 5
6 the data is obtained via simple random sampling the maximum likelihood estimator (MLE) is known to be the most accurate estimator at least when the sample size is sufficiently large. Consequently most software packages are based on the MLE and the applied researchers are accustomed to obtaining the same results from different statistical packages. When the data is obtained from a complex survey design however, there is no one estimator that is always more accurate than all other estimators. Such most accurate estimator does not exist even for the most basic estimation problems with sampling weights. Consider for example the case when the sampling weights are noninformative. An estimator that completely ignores the weights will be more accurate than an estimator that facilitates the weights. However this will not be the case if the weights are informative. Because there is no one estimator that is the most accurate in all cases, the applied researchers should not expect to obtain identical results from different software packages since the packages could be based on different estimators. In cases when the software packages show critical differences, the applied researcher should conduct a simulation study similar to the one described in this note to evaluate the accuracy of the different packages. Note however that even if all software packages show identical results, these results may still not be very accurate. One example is the case of uninformative sampling weights. Thus the applied researcher should always include sampling weights analysis as an essential part of their overall data analysis. Stephen Du Toit communicated to the authors that it is possible to obtain the results given by Mplus within LISREL as well by using the total weight (4) as the within level weight and not using between level weights. Indeed using this approach LISREL will produce results very close to those obtained in Mplus. CHI-SQUARE ADJUSTMENTS In structural equation modeling chi-square tests are used to evaluate the overall fit of the model. Typically the chi-square test of fit is simply the likelihood ratio test (LRT) for the structural model against the unrestricted means, variance and covariance model. The test statistic is computed as twice the difference between the log-likelihoods of the two nested models. When we analyze complex survey data with a single or multilevel 6
7 Table 1: Bias and MSE of parameter estimates for two-level regression. Para- True Mplus LISREL Mplus LISREL Mplus LISREL meter Value Bias Bias MSE MSE Coverage Coverage α β γ ψ α ψ β ρ θ model we use the weighted pseudo log-likelihood. Using the pseudo log-likelihood we can again perform the LRT, however the distribution of this test statistic is no longer a chi-square distribution. This distribution depends on the entire sampling design, including the sampling weights, the stratification and the cluster sampling. In Asparouhov and Muthen (2005) we describe an adjustment of the single level LRT statistic which takes into account the sampling design and produces a test statistic which has approximately a chi-square distribution. This adjustment is constructed similarly to the adjustments of the Yuan-Bentler (2000) and the Satorra-Bentler (1988) robust chi-square tests for mean and variance structures. Similar first and second order adjustments are described also in Rao-Thomas (1989) for contingency tables. Consider a general hypothesis testing for two nested models M 1 and M 2. Let θ i be the true parameter values and ˆθ i the parameters estimates for model M i that maximize the pseudo log-likelihood function L i. Let d i be the number of parameters in model M i. The adjusted LRT statistic given in Asparouhov and Muthen (2005) is T = c 2(L 1 L 2 ), (10) 7
8 where c is the correction factor where L i and L i c = d 1 d 2 T r((l 1 ) 1 V ar(l 1 )) T r((l 2 ) 1 V ar(l 2 )) (11) are the first and second derivatives of the pseudo log-likelihoods. The effect of the sampling design on the correction factor is summarized in the score variance terms V ar(l i ). The statistic T has approximately a chi-square distribution with d 1 d 2 degrees of freedom. The components T r((l i ) 1 V ar(l i )) are easily available since they are part of the asymptotic covariance for the parameter estimates. adjusted LRT test performs well also in multilevel analysis of complex survey data, see Asparouhov and Muthen (2006). When complex survey data is analyzed in Mplus 4.1 the chi-square test of fit are automatically adjusted using formula (10). Note however that LRT testing is of interest not only to conduct chi-square test of fit but also for testing between two nested models. For many advanced latent variable models such as random slope or mixture models there is no naturally defined unconstrained model that can be used to evaluate model fit. In such cases it is important to be able to conduct LRT testing between competing nested models. The When we analyze complex survey data it is important to use the adjusted LRT. Mplus 4.1 computes the following log-likelihood correction factors for every likelihood based model estimation c i = T r((l i ) 1 V ar(l i )) d i. (12) Using the log-likelihood correction factors c 1 and c 2 for two nested models one can compute the LRT correction factor c = d 1 d 2 c 1 d 1 c 2 d 2. (13) The LRT adjustment corrects not only for complex survey designs but also for nonnormality and other distributional misspecifications. Thus the likelihood correction factors have now enabled us to conduct robust chi-square testing not just for models that have well defined chi-square test of fit but for any latent variable models. An alternative LRT adjustment has been proposed and implemented in LISREL 8.8. As described in the LISREL documentation (2005) accompanying the software 8
9 package, the adjustment is given again by equation (10) but the correction factor c is computed by d2 c = T r((l 2 ) 1 V ar(l (14) 2 )). This formula can also be found in Stapleton (2006). This adjustment is available in LISREL for single level models. We explore the differences between the adjustments implemented in Mplus and LISREL with a simple simulation study. We generate a target population of size 5000 with two observed variables Y 1 and Y 2 from a bivariate normal distribution with means µ 1 = µ 2 = 0, variances ψ 1 = ψ 2 = 1 and covariance ρ = 0. We reorder the target population so that the values of Y 1 are in ascending order. Clusters of size 10 are then constructed as follows. The first 10 observations are placed in cluster 1, the next 10 observations are placed in cluster 2, etc. The target population then contains 500 clusters. We select 100 samples from the target population by cluster sampling, i.e., for each sample we select at random L clusters and use all observations from that cluster. Thus the sample size is 10L. Using the entire target population we estimate the population values µ 1 = 0.018, µ 2 = 0.014, ψ 1 = 1.011, ψ 2 = and ρ = The LRT is used to test between the following two models, the saturated model where all 5 parameters are estimated and a restricted model where the parameters µ 1, ψ 1 and ρ are fixed to their population values. Since the model restrictions are correct the LRT test should have a rejection rate of approximately 5%. The test between the two models has 3 degrees of freedom and thus the mean value of the LRT statistic should be approximately 3. Table 2 shows the rejection rates for the three LRT statistics, the Mplus LRT adjustment, the LISREL LRT adjustment and the unadjusted LRT. Tables 3 shows the average values of these test statistics. It is clear from these results that the Mplus LRT adjustment performs very well in all cases, the rejection rates are close to the nominal 5% value and the average test statistic values is close to 3. In contrast the LISREL LRT adjustment and the unadjusted LRT produced incorrectly large rejection rates and inflated test statistic values. In some cases the Mplus approach (11) and the LISREL approach (14) will produce the same results. If the effect of the complex sampling design is similar across all 9
10 Table 2: LRT Rejection Rates Test L=50 L=100 L=200 Mplus LRT adjustment 10% 5% 6% LISREL LRT adjustment 66% 67% 65% Unadjusted LRT 68% 69% 67% Table 3: LRT Average Values Test L=50 L=100 L=200 Mplus LRT adjustment LISREL LRT adjustment Unadjusted LRT variables and parameters in the model then c 1 c 2, which in turn implies that formulas (11)and (14) produce the same result. CONCLUSION In this note we conducted simple simulation studies to evaluate the estimation methods available in Mplus and LISREL for complex survey data analysis. We found substantial differences between the two software packages. In our simulations studies the results obtained in Mplus 4.1 were more accurate than those obtained in LISREL 8.8. The simulation studies presented here have limited implications in practice. There are a number of factors that have a substantial impact on the quality of the estimation, see Asparouhov (2006). Four known factors in order of importance are the cluster sample size, the informativeness of the within level weights, the ICC (intra class correlation) and the UWE (unequal weighting effect). This study was not intended to give a complete account on all possible situations but to provide a simple comparison in 10
11 somewhat artificial settings to evaluate the methodological differences implemented in the two software packages. In a specific practical situation it is still unclear what the best estimation approach is. In Asparouhov (2006) a six step procedure is recommended as an optimal estimation strategy, however the procedure does not cover all possible practical aspects. For example the situation of large UWE such as the one found in Chantala and Suchindran (2006) is not covered. Additional simulation studies should be conducted to evaluate the quality of the estimation techniques. In addition, simulation study procedure based on the actual data should be developed to incorporate more data specific features. 11
12 1 References Asparouhov, T. (2006). General Multilevel Modeling with Sampling Weights. Communications in Statistics: Theory and Methods, Volume 35, Number 3, pp (22). Asparouhov, T. and Muthen, B. (2005). Multivariate Statistical Modeling with Survey Data. Proceedings of the Federal Committee on Statistical Methodology (FCSM) Research Conference. Asparouhov, T. and Muthen, B. (2006). Multilevel Modeling of Complex Survey Data. Proceedings of the American Statistical Association, Seattle, WA: American Statistical Association. Chantala, K. Suchindran, C. (2006) Adjusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages. Proceedings of the American Statistical Association, Seattle, WA: American Statistical Association. LISREL Documetation, Analysis of Structural Equation Models for Continuous Random Variables in the Case of Complex Survey Data (2005). Pfeffermann, D.; Skinner, C.J.; Holmes, D.J.; Goldstein, H.; Rasbash, J. (1998) Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society, Series B, # 60, Rao, J. N. K., & Thomas, D. R. (1989). Chi-Square Tests for Contingency Table. In Analysis of Complex Surveys (eds. C.J.Skinner, D.Holt and T.M.F. Smith) , Wiley. 12
13 Satorra, A., & Bentler, P.M. (1988). Scaling Corrections for Chi-Square Statistics in Covariance Structure Analysis. Proceedings of the Business and Economic Statistics Section of the American Statistical Association, Stapleton, L. (2006) An Assessment of Practical Solutions for Structural Equation Modeling with Complex Sample Data. Structural Equation Modeling, 13, No. 1, Yuan, K., & Bentler, P. M. (2000) Three Likelihood-Based Methods for Mean and Covariance Structure Analysis With Nonnormal Missing Data. Sociological Methodology 30,
Simple Second Order Chi-Square Correction
Simple Second Order Chi-Square Correction Tihomir Asparouhov and Bengt Muthén May 3, 2010 1 1 Introduction In this note we describe the second order correction for the chi-square statistic implemented
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
Multilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
lavaan: an R package for structural equation modeling
lavaan: an R package for structural equation modeling Yves Rosseel Department of Data Analysis Belgium Utrecht April 24, 2012 Yves Rosseel lavaan: an R package for structural equation modeling 1 / 20 Overview
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS
Examples: Exploratory Factor Analysis CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Exploratory factor analysis (EFA) is used to determine the number of continuous latent variables that are needed to
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Analyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University [email protected] based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
APPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Probability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: SIMPLIS Files Table of contents SIMPLIS SYNTAX FILES... 1 The structure of the SIMPLIS syntax file... 1 $CLUSTER command... 4
Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study
STRUCTURAL EQUATION MODELING, 14(4), 535 569 Copyright 2007, Lawrence Erlbaum Associates, Inc. Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation
Approaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group
Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
The Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Specifications for this HLM2 run
One way ANOVA model 1. How much do U.S. high schools vary in their mean mathematics achievement? 2. What is the reliability of each school s sample mean as an estimate of its true population mean? 3. Do
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Vladislav Beresovsky ([email protected]), Janey Hsiao National Center for Health Statistics, CDC
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Use of deviance statistics for comparing models
A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Psychology 209. Longitudinal Data Analysis and Bayesian Extensions Fall 2012
Instructor: Psychology 209 Longitudinal Data Analysis and Bayesian Extensions Fall 2012 Sarah Depaoli ([email protected]) Office Location: SSM 312A Office Phone: (209) 228-4549 (although email will
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
No. 92. R. P. Chakrabarty Bureau of the Census
THE SURVEY OF INCOME AND PROGRAM PARTICIPATION MULTIVARIATE ANALYSIS BY USERS OF SIPP MICRO-DATA FILES No. 92 R. P. Chakrabarty Bureau of the Census U. S. Department of Commerce BUREAU OF THE CENSUS Acknowledgments
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
The Basic Two-Level Regression Model
2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach.
Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach. César Calderón a, Enrique Moral-Benito b, Luis Servén a a The World Bank b CEMFI International conference on Infrastructure Economics
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Power and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Package EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
Pearson's Correlation Tests
Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures
Methods of Psychological Research Online 003, Vol.8, No., pp. 3-74 Department of Psychology Internet: http://www.mpr-online.de 003 University of Koblenz-Landau Evaluating the Fit of Structural Equation
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
Multiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
Two Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA
m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,
Modelling the Scores of Premier League Football Matches
Modelling the Scores of Premier League Football Matches by: Daan van Gemert The aim of this thesis is to develop a model for estimating the probabilities of premier league football outcomes, with the potential
UNIVERSITY OF WAIKATO. Hamilton New Zealand
UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Introduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
Prediction for Multilevel Models
Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
