Multilevel Modeling of Complex Survey Data
|
|
|
- Morris McLaughlin
- 9 years ago
- Views:
Transcription
1 Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics 2007 West Coast Stata Users Group Meeting Marina del Rey, October 2007 GLLAMM p.1
2 Outline Model-based and design based inference Multilevel models and pseudolikelihood Pseudo maximum likelihood estimation for U.S. PISA 2000 data Scaling of level-1 weights Simulation study Conclusions GLLAMM p.2
3 Multistage sampling: U.S. PISA 2000 data Program for International Student Assessment (PISA): Assess and compare 15 year old students reading, math, etc. Three-stage survey with different probabilities of selection Stage 1: Geographic areas k sampled Stage 2: Schools j =1,...,n (2) sampled with different probabilities π j (taking into account school non-response) Stage 3: Students i=1,...,n (1) j conditional probabilities π i j sampled from school j, with Probability that student i from school j is sampled: π ij = π i j π j GLLAMM p.3
4 Model-based and design-based inference Model-based inference: Target of inference is parameter β in infinite population (parameter of data generating mechanism or statistical model) called superpopulation parameter Consistent estimator (assuming simple random sampling) such as maximum likelihood estimator (MLE) yields estimate β Design-based inference: Target of inference is statistic in finite population (FP), e.g., mean score y FP of all 15-year olds in LA Student who had a π ij = 1/5 chance of being sampled represents w ij = 1/π ij = 5 similar students in finite population Estimate of finite population mean (Horvitz-Thompson): ŷ FP = Similar for proportions, totals, etc. 1 ij w w ij y ij ij ij GLLAMM p.4
5 Model-based inference for complex surveys Target of inference is superpopulation parameter β View finite population as simple random sample from superpopulation (or as realization from model) MLE β FP using finite population treated as target (consistent for β) Design-based estimator of β FP applied to complex survey data Replace usual log likelihood by weighted log likelihood, giving pseudo maximum likelihood estimator (PMLE) If PMLE is consistent for β FP, then it is consistent for β GLLAMM p.5
6 Multilevel modeling: Levels Levels of a multilevel model can correspond to stages of a multistage survey Level-1: Elementary units i (stage 3), here students Level-2: Units j sampled in previous stage (stage 2), here schools Top-level: Units k sampled at stage 1 (primary sampling units), here areas However, not all levels used in the survey will be of substantive interest & there could be clustering not due to the survey design In PISA data, top level is geographical areas details are undisclosed, so not represented as level in multilevel model GLLAMM p.6
7 Two-level linear random intercept model Linear random intercept model for continuous y ij : y ij = β 0 + β 1 x 1ij + + β p x pij + ζ j + ǫ ij x 1ij,...,x pij are student-level and/or school-level covariates β 0,...,β p are regression coefficients ζ j N(0, ψ) are school-specific random intercepts, uncorrelated across schools and uncorrelated with covariates ǫ ij N(0, θ) are student-specific residuals, uncorrelated across students and schools, uncorrelated with ζ j and with covariates GLLAMM p.7
8 Two-level logistic random intercept model Logistic random intercept model for dichotomous y ij As generalized linear model logit[pr(y ij = 1 x ij )] = β 0 + β 1 x 1ij + + β p x pij + ζ j As latent response model y ij = β 0 + β 1 x 1ij + + β p x pij + ζ j + ǫ ij y ij = 1 if y ij > 0, y ij = 0 if y ij 0 ζ j N(0, ψ) are school-specific random intercepts, uncorrelated across schools and uncorrelated with covariates ǫ ij Logistic are student-specific residuals, uncorrelated across students and schools, uncorrelated with ζ j and with covariates GLLAMM p.8
9 Illustration of two-level linear and logistic random intercept model E(y ij x ij, ζ j ) = β 0 + β 1 x ij + ζ j Pr(y ij = 1 x ij, ζ j ) = exp(β 0+β 1 x ij +ζ j ) 1+exp(β 0 +β 1 x ij +ζ j ) E(yij xij, ζj) Pr(yij = 1 xij, ζj) x ij x ij GLLAMM p.9
10 Pseudolikelihood Usual marginal log likelihood (without weights) n (2) n (1) j n (2) n (1) j log f(y ij ζ j ) g(ζ j ) dζ j = log exp log f(y ij ζ j ) i=1 g(ζ j) dζ j j=1 } {{ } Pr(y j ζ j ) j=1 Log pseudolikelihood (with weights) n (2) n (1) j w j log exp w i j log f(y ij ζ j ) g(ζ j) dζ j j=1 i=1 i=1 Note: need w j = 1/π j, w i j = 1/π i j ; cannot use w ij = w i j w j Evaluate using adaptive quadrature, maximize using Newton-Raphson [Rabe-Hesketh et al., 2005] in gllamm GLLAMM p.10
11 Standard errors, taking into account survey design Conventional model-based standard errors not appropriate with sampling weights Sandwich estimator of standard errors (Taylor linearization) Cov( ϑ) = I 1 J I 1 J : Expectation of outer product of gradients, approximated using PSU contributions to gradients I: Expected information, approximated by observed information ( model-based standard errors obtained from I 1 ) Sandwich estimator accounts for Stratification at stage 1 Clustering at levels above highest level of multilevel model Implemented in gllamm with cluster() and robust options GLLAMM p.11
12 Analysis of U.S. PISA 2000 data Two-level (students nested in schools) logistic random intercept model for reading proficiency (dichotomous) PSUs are areas, sampling weights w i j for students and w j for schools provided Predictors: [Female]: Student is female (dummy) [ISEI]: International socioeconomic index [MnISEI]: School mean ISEI [Highschool]/ [College]: Highest education level by either parent is highschool/college (dummies) [English]: Test language (English) spoken at home (dummy) [Oneforeign]: One parent is foreign born (dummy) [Bothforeign]: Both parents are foreign born (dummy) GLLAMM p.12
13 Data structure and gllamm syntax in Stata Data strucure. list id_school wt2 wt1 mn_isei isei in 28/37, clean noobs id_school wt2 wt1 mn_isei isei gllamm syntax gllamm pass_read female isei mn_isei high_school college english one_for both_for, i(id_school) cluster(wvarstr) link(logit) family(binom) pweight(wt) adapt GLLAMM p.13
14 PISA 2000 estimates for multilevel regression model Unweighted Maximum likelihood Weighted Pseudo maximum likelihood Parameter Est (SE) Est (SE R ) (SE PSU R ) β 0 : [Constant] (0.539) (0.955) (0.738) β 1 : [Female] (0.103) (0.154) (0.161) β 2 : [ISEI] (0.003) (0.005) (0.004) β 3 : [MnISEI] (0.001) (0.016) (0.018) β 4 : [Highschool] (0.256) (0.477) (0.429) β 5 : [College] (0.255) (0.505) (0.543) β 6 : [English] (0.283) (0.382) (0.391) β 7 : [Oneforeign] (0.224) (0.274) (0.225) β 8 : [Bothforeign] (0.236) (0.326) (0.292) ψ (0.086) (0.124) (0.115) GLLAMM p.14
15 Problem with using weights in linear models Linear variance components model, constant cluster size n (1) j = n (1) y ij = β 0 + ζ j + ǫ ij, Var(ζ j ) = ψ, Var(ǫ ij ) = θ Assume sampling independent of ǫ ij, w i j = a > 1 for all i, j Get biased estimate of ψ: Weighted sum of squares due to clusters SSC w = j (y.j y.. ) 2 = j (ζ j ζ. ) 2 + j (ǫ ẉ j ǫ ẉ.) 2 = SSC Expectation of SSC w, same as expectation [ of unweighted SSC E(SSC w ) = (n (2) 1) ψ + θ ] n (1) Pseudo maximum likelihood estimator ψ PML = SSCw θ w > n (2) an ψ ML = SSC ML θ (1) n (2) n (1) GLLAMM p.15
16 Explanation for bias and anticipated results for logit/probit models Clusters appear bigger than they are (a times as big) Between-cluster variability in ǫ ẉ j an (1) This extra between-cluster variability in ǫ ẉ j greater than for clusters of size is attributed to ψ However, if sampling at level 1 stratified according to ǫ ij, e.g. π i j 0.25 if ǫ ij > if ǫ ij 0 variance of ǫ ẉ j decreases, and upward bias of ψ PML decreases Bias decreases as n (1) increases In logit/probit models, anticipate that β PML increases when increases; therefore biased estimates of β ψ PML GLLAMM p.16
17 Solution: Scaling of weights? Scaling method 1 [Longford,1995, 1996; Pfeffermann et al., 1998] wi j = i w i j w i j i w2 i j so that i w i j = i w 2 i j In linear model example with sampling independent of ǫ ij, no bias egen sum_w = sum(w), by(id_school) egen sum_wsq = sum(wˆ2), by(id_school) generate wt1 = w*sum_w/sum_wsq Scaling method 2 [Pfeffermann et al., 1998] w i j = n(1) j i w w i j i j so that i w i j = n(1) j In line with intuition (clusters do not appear bigger than they are) egen nj = count(w), by(id_school) generate wt1 = w*nj/sum_w GLLAMM p.17
18 Simulations Dichotomous random intercept logistic regression (500 clusters, N j units per cluster in FP), with yij = }{{} 1 + }{{} 1 x 1j + }{{} 1 x 2ij + ζ j + ǫ ij, ψ = 1 β 0 β 1 β 2 Stage 1: Sample clusters with probabilities 0.25 if ζ j > 1 π j 0.75 if ζ j 1 Stage 2: Sample units with probabilities 0.25 if ǫ ij > 0 π i j 0.75 if ǫ ij 0 Vary N j from 5 to 100, 100 datasets per condition, 12-point adaptive quadrature GLLAMM p.18
19 Results for N j = 5 True Unweighted Weighted Pseudo maximum likelihood Parameter value ML Raw Method 1 Method 2 Model parameters: Conditional effects β (0.11) (0.19) (0.16) (0.15) β (0.18) (0.32) (0.26) (0.26) β (0.22) (0.35) (0.25) (0.26) ψ (0.37) (0.21) (0.31) (0.30) GLLAMM p.19
20 Effect of level-1 stratification method (N j = 10) (1) Strata based on sign of ǫ ij (2) Strata based on sign of ξ ij, Cor(ǫ ij, ξ ij ) = 0.5 (3) Strata based on sign of ξ ij, Cor(ǫ ij, ξ ij ) = 0 True Raw Method 1 Parameter value (1) (2) (3) (1) (2) (3) β (0.16) (0.16) (0.21) (0.14) (0.13) (0.16) β (0.23) (0.26) (0.30) (0.20) (0.23) (0.25) β (0.20) (0.21) (0.25) (0.16) (0.17) (0.19) ψ (0.13) (0.15) (0.15) (0.34) (0.24) (0.16) GLLAMM p.20
21 Simulation results for pseudo maximum likelihood estimation Little bias for ψ when N j 50 (cluster sizes in sample n (1) j 25) For smaller cluster sizes: Raw level-1 weights produce positive bias for ψ Scaling methods 1 and 2 overcorrect positive bias for ψ apparently due to stratification based on sign of ǫ ij Inflation of β estimates whenever positive bias for ψ Good coverage using sandwich estimator (1000 simulations) for N j = 50 GLLAMM p.21
22 Conclusions Pseudo maximum likelihood estimation allows for stratification, clustering, and weighting Three common methods for scaling level-1 weights: no scaling, scaling method 1, scaling method 2 Inappropriate scaling can lead to biased estimates If clusters are sufficiently large, little bias similar results with all three scaling methods If level-1 weights based on variables strongly associated with outcome, use no scaling If level-1 weights based on variables not associated with outcome, use method 1 For intermediate situations, use method 2? GLLAMM p.22
23 References Longford, N. T. (1995). Models for Uncertainty in Educational Testing. New York: Springer. Longford, N. T. (1996). Model-based variance estimation in surveys with stratified clustered designs. Australian Journal of Statistics, 38, Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society, Series B, 60, GLLAMM p.23
24 References: Our relevant work Rabe-Hesketh, S. & Skrondal, A. (2006). Multilevel modeling of complex survey data. Journal of the Royal Statistical Society (Series A) 169, Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics 128, Skrondal, A. & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman & Hall/ CRC. gllamm and manual from GLLAMM p.24
Prediction for Multilevel Models
Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
Comparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
gllamm companion for Contents
gllamm companion for Rabe-Hesketh, S. and Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (3rd Edition). Volume I: Continuous Responses. College Station, TX: Stata Press. Contents
A General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group
Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers
Approaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Chapter 19 Statistical analysis of survey data. Abstract
Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of
10 Dichotomous or binary responses
10 Dichotomous or binary responses 10.1 Introduction Dichotomous or binary responses are widespread. Examples include being dead or alive, agreeing or disagreeing with a statement, and succeeding or failing
The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?
The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT
Chapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
Electronic Theses and Dissertations UC Riverside
Electronic Theses and Dissertations UC Riverside Peer Reviewed Title: Bayesian and Non-parametric Approaches to Missing Data Analysis Author: Yu, Yao Acceptance Date: 01 Series: UC Riverside Electronic
New SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
Introduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
Stat 9100.3: Analysis of Complex Survey Data
Stat 9100.3: Analysis of Complex Survey Data 1 Logistics Instructor: Stas Kolenikov, [email protected] Class period: MWF 1-1:50pm Office hours: Middlebush 307A, Mon 1-2pm, Tue 1-2 pm, Thu 9-10am.
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice
Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Clarifying Some Issues in the Regression Analysis of Survey Data
Survey Research Methods (2007) http://w4.ub.uni-konstanz.de/srm Vol. 1, No. 1, pp. 11-18 c European Survey Research Association Clarifying Some Issues in the Regression Analysis of Survey Data Phillip
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Multiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona [email protected] http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Analyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University [email protected] based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
Problem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Models for Longitudinal and Clustered Data
Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations
Descriptive Methods Ch. 6 and 7
Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational
Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Power and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
Lecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
Statistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
Logit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data
Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear
Multinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers
Journal of Traumatic Stress, Vol. 2, No., October 2008, pp. 440 447 ( C 2008) Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers
Note on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
Parametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Visualization of Complex Survey Data: Regression Diagnostics
Visualization of Complex Survey Data: Regression Diagnostics Susan Hinkins 1, Edward Mulrow, Fritz Scheuren 3 1 NORC at the University of Chicago, 11 South 5th Ave, Bozeman MT 59715 NORC at the University
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Vladislav Beresovsky ([email protected]), Janey Hsiao National Center for Health Statistics, CDC
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Hierarchical Logistic Regression Modeling with SAS GLIMMIX Jian Dai, Zhongmin Li, David Rocke University of California, Davis, CA
Hierarchical Logistic Regression Modeling with SAS GLIMMIX Jian Dai, Zhongmin Li, David Rocke University of California, Davis, CA ABSTRACT Data often have hierarchical or clustered structures, such as
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America Abstract Complex sample survey designs deviate from simple random sampling,
Modelling the Scores of Premier League Football Matches
Modelling the Scores of Premier League Football Matches by: Daan van Gemert The aim of this thesis is to develop a model for estimating the probabilities of premier league football outcomes, with the potential
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
From the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command
Forecast Forecast is the linear function with estimated coefficients T T + h = b0 + b1timet + h Compute with predict command Compute residuals Forecast Intervals eˆ t = = y y t+ h t+ h yˆ b t+ h 0 b Time
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION A sample survey is a process for collecting
Module 4 - Multiple Logistic Regression
Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be
Example: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
Introduction to Path Analysis
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
