# Module 14: Missing Data Stata Practical

 To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES and MRC grant G Pre-requisites Stata version 12 or later. Online resources: Contents Introduction to the Youth Cohort Study dataset... 1 P14.1 The Model of Interest... 2 P14.2 Investigating Missingness... 2 P Investigating quantity and patterns of missingness... 2 P Investigating the missingness mechanism... 4 P14.3 Ad-hoc Methods... 7 P14.4 Complete Records Analysis... 8 P Complete records analysis results... 8 P Interpretation of complete records analysis... 9 P14.5 Multiple Imputation...12 P Imputation by Chained Equations in Stata P Analysing the multiple imputations P14.6 Inverse Probability Weighting...18 P Constructing the weights... 18

2 P Inverse probability weighted complete records analysis P14.7 Multilevel and Longitudinal Studies...23 P14.8 Summary and Conclusions...24 References Acknowledgements... 25

3 Introduction to the Youth Cohort Study dataset You will be analysing data from the Youth Cohort Study of England and Wales (YCS) 1. The YCS is a postal survey of young people. We will use data from the 1995 cohort, restricted to those young people who were at comprehensive schools (n=12,884) when the survey took place. Our analyses will focus on variables recording GCSE attainment, parental socio-economic class, gender, and ethnicity. In particular our interest will focus on models for the young person s GCSE attainment score, with the other variables as covariates or explanatory variables. Such a model is of interest in order to investigate differences in GCSE attainment between ethnic and social economic groups, relative to gender differences (Connolly 2006). Table 14.1 describes the variables included in the dataset. Table Variables contained in Youth Cohort Study dataset Variable name t0score2 gender t0ethnic Description and coding GCSE score truncated year 11 exam point score Gender (1 = boys, 0 = girls) Ethnicity 1 = white 4 = black 5 = Indian 6 = Pakistani 7 = Bangladeshi 9 = Other Asian 10 = other response t0parsc4 Parents National Statistics Socio-economic classification 1 = managerial & professional 2 = intermediate 3 = working 1 We thank the depositors of the Economic and Social Data Service (ESDS) data collection SN 5765 Youth Cohort Time Series for England, Wales and Scotland, , and the depositors of the constituent studies, for their permission to make these data available for teaching purposes. We also thank the ESDS (www.esds.ac.uk) for their assistance in obtaining these permissions and through whose website the data were made available to us. Centre for Multilevel Modelling,

4 The GCSE score is formed by assigning numerical scores to the grades obtained by a child at GCSE (A/A*=7 through to grade G=1), truncated at 12 grade A/A*s (giving a maximum score of 84). The original YCS data also contains a weight variable, based on the sampling scheme used in the survey. Since our aim here is to illustrate the missing data concepts and methods we have introduced, we do not use the weights in this analysis. We emphasise that the analyses shown here are intended to be illustrative of the missing data concepts and methods we have introduced, and should not be interpreted as a substantive analysis of these data. P14.1 The Model of Interest Throughout the practical we shall assume that our model of interest is the linear regression of GCSE score on gender, ethnicity and parental SEC. Ordinarily we would fit this model in Stata using: xi: regress t0score2 gender i.t0ethnic i.t0parsc4 We will keep this model of interest in mind when investigating missingness in the variables and when considering how to handle any missing values. P14.2 Investigating Missingness In this section we investigate missingness in the YCS data. Load 10.2.dta into memory and open the do-file for this lesson: From within the LEMMA Learning Environment Go to Module 14: Missing Data, and scroll down to Stata Datasets and Do-files Click 14.2.dta to open the dataset P Investigating quantity and patterns of missingness We begin by investigating how many missing values there are in the variables included in the dataset, using Stata s misstable summarize command: Centre for Multilevel Modelling,

5 . misstable summarize t0score2 t0parsc4 t0ethnic gender Unique Obs< Variable Obs=. Obs>. Obs<. values Min Max t0score , t0parsc4 1,576 11, t0ethnic , We first note that the gender variable has not been included in the output this is because the variable has no missing values. Next we see that the parental SEC has the most missing values (1,576), with GCSE score and ethnicity having fewer missing values. Next we examine the patterns of missingness in these three variables. We use the misstable patterns command to tabulate which patterns of missingness occur and how frequently each pattern occurs:. misstable patterns t0score2 t0parsc4 t0ethnic gender, freq Missing-value patterns (1 means complete) Pattern Frequency , , ,884 Variables are (1) t0score2 (2) t0ethnic (3) t0parsc4 Centre for Multilevel Modelling,

6 The output from misstable patterns shows, for the specified variables, each pattern of missing data which occurs, ordered according to the frequency with which they occur. From the first row in the table, we see that there are 11,188 young people for whom all three variables (ethnicity, GCSE score, and parental SEC) are observed. The most common pattern which has some missing values is when GCSE score and ethnicity are observed but parental SEC is missing (n=1,422). The next most commonly occurring pattern is where GCSE score is observed but ethnicity and parental SEC are missing (n=104). We then see that all the other possible missingness patterns occur, but with smaller frequencies. P Investigating the missingness mechanism Since missingness occurs in three of the variables in the dataset, we can think of there being an underlying mechanism which determines missingness for each of the variables ethnicity, GCSE score, and parental SEC. Since the majority of missing values occur in the parental SEC variable, however, we shall focus on investigating missingness in this variable, since the analysis is likely most sensitive to assumptions concerning this. The other patterns of missing data we (implicitly) assume are either MCAR or possibly MAR given other observed values. From the output from misstable patterns, we saw that when parental SEC is missing, ethnicity and GCSE score are mostly observed. We can therefore investigate how missingness in parental SEC is related both to these two variables and to the fully observed variable gender. To investigate which variables are predictive of missingness in the parental SEC variable we first define a binary variable which indicates whether the parental occupation variable is observed (=1) or missing (=0): gen r_t0parsc4=(t0parsc4!=.) Next, we fit a logistic regression model for the variable r_t0parsc4, with gender as covariate (we could also have simply performed a chi-squared test):. xi: logistic r_t0parsc4 gender Logistic regression Number of obs = LR chi2(1) = 1.21 Prob > chi2 = Log likelihood = Pseudo R2 = r_t0parsc4 Odds Ratio Std. Err. z P> z [95% Conf. Interval] gender Centre for Multilevel Modelling,

7 This document is only the first few pages of the full version. To see the complete document please go to learning materials and register: The course is completely free. We ask for a few details about yourself for our research purposes only. We will not give any details to any other organisation unless it is with your express permission.

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

### How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

### Lecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007

Lecture 16: Logistic regression diagnostics, splines and interactions Sandy Eckel seckel@jhsph.edu 19 May 2007 1 Logistic Regression Diagnostics Graphs to check assumptions Recall: Graphing was used to

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### From this it is not clear what sort of variable that insure is so list the first 10 observations.

MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is

### Module 4 - Multiple Logistic Regression

Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be

### Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

### Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

### RESEARCH. Ethnicity and Degree Attainment. Stijn Broecke and Tom Nicholls Department for Education and Skills. Research Report RW92

RESEARCH Stijn Broecke and Tom Nicholls Department for Education and Skills Research Report RW92 Research Report No RW92 Stijn Broecke and Tom Nicholls Department for Education and Skills Contents Section

### Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

### Investigating the Accuracy of Predicted A Level Grades as part of 2009 UCAS Admission Process

RESEARCH PAPER NUMBER 37 Investigating the Accuracy of Predicted A Level Grades as part of 2009 UCAS Admission Process JUNE 2011 Authors: Nick Everett and Joanna Papageorgiou, UCAS The views expressed

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Technical Report on Response in the Teacher Survey in MCS 4 (Age 7)

Millennium Cohort Study Technical Report on Response in the Teacher Survey in MCS 4 (Age 7) Tarek Mostafa with contributions from Rachel Rosenberg November 2013 Centre for Longitudinal Studies Following

### Module 3 - Multiple Linear Regressions

Module 3 - Multiple Linear Regressions Objectives Understand the strength of Multiple linear regression (MLR) in untangling cause and effect relationships Understand how MLR can answer substantive questions

### Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS

BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended

### Simple Linear Regression One Binary Categorical Independent Variable

Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Editor Executive Editor Associate Editors Copyright Statement:

The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A & M University College Station, Texas 77843 979-845-3142 979-845-3144 FAX jnewton@stata-journal.com Associate Editors Christopher

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### User Guide for Diagnostic Test

User Guide for Diagnostic Test The Results Are likelihood ratios for the test results presented (or calculatable)? Will it help with my patients? Will the reproducibility of the test result and its interpretation

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge

### III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### CRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm

CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### STATA FUNDAMENTALS FOR MIDDLEBURY COLLEGE ECONOMICS STUDENTS

STATA FUNDAMENTALS FOR MIDDLEBURY COLLEGE ECONOMICS STUDENTS BY EMILY FORREST AUGUST 2008 CONTENTS INTRODUCTION STATA SYNTAX DATASET FILES OPENING A DATASET FROM EXCEL TO STATA WORKING WITH LARGE DATASETS

### Module 3: Multiple Regression Concepts

Contents Module 3: Multiple Regression Concepts Fiona Steele 1 Centre for Multilevel Modelling...4 What is Multiple Regression?... 4 Motivation... 4 Conditioning... 4 Data for multiple regression analysis...

### Uptake of GCSE and A-level subjects in England. by Ethnic Group Statistics Report Series No. 11

Uptake of GCSE and A-level subjects in England by Ethnic Group 2007 Statistics Report Series No. 11 Carmen L. Vidal Rodeiro July 2009 Research Division Statistics Group Assessment Research and Development

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Ping-Yin Kuan Department of Sociology Chengchi Unviersity Taiwan Presented

### How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### The attainment of Muslim pupils in England and Scotland and parental aspirations

The attainment of Muslim pupils in England and Scotland and parental aspirations Muslim Pupils Educational Experiences in England and Scotland: The project Literature and statistical review Family case

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### Child Weight. According to the HSE, in 2010, obesity prevalence among year olds was 18.3% (Table 2).

NOO data factsheet Child Weight July 2012 Key points Obesity among 2 10 year olds rose from 10.1% in 1995 to 14.6% in 2010 according to Health Survey for England (HSE) figures. There are growing indications

### Logistic Regression in Stata

Logistic Regression in Stata Danstan Bagenda, PhD MUSPH Danstan Bagenda, PhD, 1 Jan 2009 1 Logistic Regression in STATA The logistic regression programs in STATA use maximum likelihood estimation to generate

### Medical school applications in the UK

Medical school applications in the UK Wiji Arulampalam *, Robin A. Naylor, and Jeremy Smith University of Warwick December 25 Keywords: Medical students, Applications, Matching, endogenous selection. JEL

### From the help desk: hurdle models

The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

### BIOS 312: MODERN REGRESSION ANALYSIS

BIOS 312: MODERN REGRESSION ANALYSIS James C (Chris) Slaughter Department of Biostatistics Vanderbilt University School of Medicine james.c.slaughter@vanderbilt.edu biostat.mc.vanderbilt.edu/coursebios312

### Interaction Terms Vs. Interaction Effects in Logistic and Probit Regression

--------------------------------------- Background: In probit or logistic regressions, one can not base statistical inferences based on simply looking at the co-efficient and statistical significance of

### Wednesday PM. Multiple regression. Multiple regression in SPSS. Presentation of AM results Multiple linear regression. Logistic regression

Wednesday PM Presentation of AM results Multiple linear regression Simultaneous Stepwise Hierarchical Logistic regression Multiple regression Multiple regression extends simple linear regression to consider

### HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS

HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS Mike Kenward London School of Hygiene and Tropical Medicine Acknowledgements to James Carpenter (LSHTM) Geert Molenberghs (Universities of

### Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl

from Indirect Extracting from Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl from Indirect What is the effect of x on y? Which effect do I choose: average marginal or marginal

### Learning from Futuretrack: Dropout from higher education

BIS RESEARCH PAPER NO. 168 Learning from Futuretrack: Dropout from higher education MARCH 2014 1 Acknowledgments This report, written by Dr Andrew McCulloch and colleagues at the Higher Education Careers

### 2. Incidence, prevalence and duration of breastfeeding

2. Incidence, prevalence and duration of breastfeeding Key Findings Mothers in the UK are breastfeeding their babies for longer with one in three mothers still breastfeeding at six months in 2010 compared

### 11 Logistic Regression - Interpreting Parameters

11 Logistic Regression - Interpreting Parameters Let us expand on the material in the last section, trying to make sure we understand the logistic regression model and can interpret Stata output. Consider

### Practical For Session 8: Categorical Outcomes 22/11/2016

Practical For Session 8: Categorical Outcomes 22/11/2016 1 Practical For Session 8: Categorical Outcomes Datasets The datasets that you will use in this practical can be accessed via http from within

### Lecture 10: Logistical Regression II Multinomial Data. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 10: Logistical Regression II Multinomial Data Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Logit vs. Probit Review Use with a dichotomous dependent variable Need a link

### ECON Introductory Econometrics. Lecture 15: Binary dependent variables

ECON4150 - Introductory Econometrics Lecture 15: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

### Getting started with Stata

Getting started with Stata Stat 104: 9/3/09 The purpose of this tutorial is to learn how to download, install and use Stata for data manipulation, visualization and simple analysis. 1. Downloading and

### A secondary analysis of primary care survey data to explore differences in response by ethnicity.

A secondary analysis of primary care survey data to explore differences in response by ethnicity. A report commissioned by the National Association for Patient Participation Autumn 2006 1.1 Introduction

### From the help desk: Swamy s random-coefficients model

The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

### Weight of Evidence Module

Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

### Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Surveying Prisoner Crime Reduction (SPCR) Adjusting for Missing Data Technical Report

Surveying Prisoner Crime Reduction (SPCR) Adjusting for Missing Data Technical Report Ian Brunton-Smith, University of Surrey James Carpenter and Mike Kenward, London School of Hygiene and Tropical Medicine

### Introduction to structural equation modeling using the sem command

Introduction to structural equation modeling using the sem command Gustavo Sanchez Senior Econometrician StataCorp LP Mexico City, Mexico Gustavo Sanchez (StataCorp) November 13, 2014 1 / 33 Outline Outline

### Binary Diagnostic Tests Two Independent Samples

Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

### National Child Measurement Programme:

National Child Measurement Programme: Detailed Analysis of the 2007/08 National Dataset April 2009 National Child Measurement Programme: Detailed Analysis of the 2007/08 Dataset ii Authors Caroline Ridler

### Raul Cruz-Cano, HLTH653 Spring 2013

Multilevel Modeling-Logistic Schedule 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal) Introduction Just as with linear regression, logistic

### Exploratory Data Analysis for Mean Function

Exploratory Data Analysis for Mean Function I. Read-in Data & Explore Data Structure Analysis of Pigs Data Increase the memory to hold large data.. set memory 40m (40960k) Increase the number of variables

### The earnings and employment returns to A levels. A report to the Department for Education

The earnings and employment returns to A levels A report to the Department for Education February 2015 About is one of Europe's leading specialist economics and policy consultancies and has its head office

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Combining Multiple Imputation and Inverse Probability Weighting

Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman 1, Ian White 1, Andrew Copas 2,3, Leah Li 4 1 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Mapping and evaluating shadow education

Mapping and evaluating shadow education ESRC Research Project RES-000-23-0117 End of Award Report January 2005 Judith Ireson and Katie Rushforth Institute of Education, University of London Background

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

### NEET: Young People Not in Education, Employment or Training

NEET: Young People Not in Education, Employment or Training Standard Note: SN/EP/06705 Last updated: 13 December 2013 Author: Section: James Mirza-Davies Economic Policy and Statistics In the third quarter

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### 2. Making example missing-value datasets: MCAR, MAR, and MNAR

Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data

### Subject to Background

Subject to Background What promotes better achievement for bright but disadvantaged students? Pam Sammons, Katalin Toth & Kathy Sylva University of Oxford Department of Education March 2015 Improving social

### Nested Logit. Brad Jones 1. April 30, 2008. University of California, Davis. 1 Department of Political Science. POL 213: Research Methods

Nested Logit Brad 1 1 Department of Political Science University of California, Davis April 30, 2008 Nested Logit Interesting model that does not have IIA property. Possible candidate model for structured

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### Revised GCSE and equivalent results in England, 2014 to 2015

Revised GCSE and equivalent results in England, 2014 to 2015 SFR 01/2016, 21 January 2016 Attainment in the headline 5+ A*-C including English and maths measure is stable in 2015 Percentage of pupils achieving

### Some issues on the uptake of Modern Foreign Languages at GCSE

Some issues on the uptake of Modern Foreign Languages at GCSE Statistics Report Series No. 10 Carmen L. Vidal Rodeiro June 2009 Research Division Statistics Group Assessment Research and Development Cambridge

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Econometrics II. Lecture 9: Sample Selection Bias

Econometrics II Lecture 9: Sample Selection Bias Måns Söderbom 5 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom, www.soderbom.net.

### How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power

How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 391-9971 Fax: (310) 391-8971

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

Goodman, A; Goodman, R (2009) Strengths and Difficulties Questionnaire as a Dimensional Measure of Child Mental Health. Journal of the American Academy of Child and Adolescent Psychiatry, 48 (4). pp. 400-3.