# A General Approach to Variance Estimation under Imputation for Missing Survey Data

Save this PDF as:

Size: px
Start display at page:

Download "A General Approach to Variance Estimation under Imputation for Missing Survey Data"

## Transcription

1 A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey Sampling in honor of Jean-Claude Deville, Neuchâtel, Switzerland, June 24-26, 2009

2 Outline Item Nonresponse Deterministic imputation: Population model approach Imputed estimator Linearization variance estimator Examples: Domain estimation, Composite imputation Stochastic imputation: variance estimation Examples: Multiple imputation, binary response Simulation results Doubly robust approach Extensions

3 Survey Data Design features: clustering, stratification, unequal probability of selection Source of error: 1. Sampling errors 2. Non-sampling errors: Nonresponse (missing data) Noncoverage Measurement errors

4 Types of nonresponse Unit (or total) nonresponse: refusal, not-at-home Remedy: weight adjustment within classes Item nonresponse: sensitive item, answer not known, inconsistent answer Remedy: imputation (fill in missing data)

5 Advantages of imputation Complete data file: standard complete data methods Different analyses consistent with each other Reduce nonresponse bias Auxiliary x observed can be used to get good imputed values Same survey weight for all items

6 Commonly used imputation methods Marginal imputation methods: 1. Business surveys: Ratio, Regression, Nearest neighbor (NN) 2. Socio-economic surveys: Random donor (within classes), Stochastic ratio or regression, Fractional imputation (FI), Multiple imputation (MI)

7 Complete response set-up Population total: θ N = N i=1 y i NHT estimator: ˆθ n = i s d i y i where d i = π 1 i : design weight π i = inclusion probability = Pr (i s) Variance estimator: ˆV n = i s Ω ij y i y j j s Ω ij depends on joint inclusion probabilities π ij > 0

8 Deterministic imputation Population model approach (Deville and Särndal, 1994): E ζ (y i x i ) = m (x i, β 0 ) a i = 1 if y i observed when i s = 0 otherwise for i U = {1, 2,, N} MAR: Distribution of a i depends only on x i Imputed value: ŷ i = m(x i, ˆβ) ˆβ: unique solution of EE Û (β) = d i a i {y i m (x i, β)} h (x i, β) = 0 i s

9 Model specification Further model specification: Var ζ (y i x i ) = σ 2 q (x i, β 0 ) h (x i, β) = ṁ (x i, β) /q (x i, β) h i Examples: commonly used imputations 1. Ratio imputation: h i = 1 E ζ (y i x i ) = β 0 x i, Var ζ (y i x i ) = σ 2 x i 2. Linear regression imputation: h i = x i E ζ (y i x i ) = x i β 0, Var ζ (y i x i ) = σ 2 3. Logistic regression imputation (y i = 0 or 1): h i = x i log {m i / (1 m i )} = x i β 0, Var ζ (y i x i ) = m i (1 m i ) where m i = E ζ (y i x i )

10 Imputed estimator Imputed estimator of total θ N : ˆθ Id = } d i {a i y i + (1 a i ) m(x i, ˆβ) i s i s d i ỹ i Examples 1. Ratio imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) 1 ia i x i i s d ia i y i 2. Linear regression imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) ia i x i x i 1 i s d ia i x i y i 3. Logistic regression imputation: ˆβ is the solution to i s d i {y i m (x i, β)} x i = 0 Imputed estimator of domain total θ z = N i=1 z iy i : ˆθ I,z = i s d i z i ỹ i where z i = 1 if i D; z i = 0 otherwise.

11 Variance estimation Treating imputed values as if observed: Underestimation if ỹ i used in ˆV n for y i Methods that account for imputation: Adjusted jackknife: Rao and Shao (1992) Linearization (Pop. model): Deville and Särndal (1994) Fractional imputation method: Fuller and Kim (2005) Bootstrap: Shao and Sitter (1996) Reverse approach: Shao and Steel (1999)

12 Variance estimation (Cont d) Linearization method: Theorem 1 (Kim and Rao, 2009): Under regularity conditions, n 1/2 N 1 (ˆθId θ Id ) = o p (1) where θ Id = i s w i η i { η i = m (x i ; β 0 ) + a i 1 + c } h i {yi m (x i ; β 0 )}, { N } 1 N c = a i ṁ (x i ; β 0 ) h i (1 a i ) ṁ (x i ; β 0 ). i=1 Reference distribution: Joint distribution of population model and sampling mechanism, conditional on realized (x i, a i ) in the population. i=1

13 Variance estimation (Cont d) Reverse approach: 1. ˆV 1d = Ω ij ˆη i ˆη j i s j s 2. where ˆη i = η i ( ˆβ). ˆV 2d = i s ( ) 2 ( )} d i a i 1 + ĉ ĥ i {y i m x i ; ˆβ ˆV 2d valid even if V ζ (y i x i ) is misspecified. 3. Variance estimator of ˆθ Id ( θ Id ): ˆV d = ˆV 1d + ˆV 2d ˆV d approximately design-model unbiased. If the overall sampling rate negligible: ˆV d = ˆV1d

14 Variance estimation (Cont d) Domain estimation: 1. ˆθ I,z : design-model unbiased for θ z 2. Use ˆV 1d = Ω ij ˆη iz ˆη jz i s j s where ˆη iz = z i m(x i ; ˆβ) + a i {z i + ĉ zh i } { } y i m(x i ; ˆβ), ĉ z = { } 1 d i a i ṁ(x i ; ˆβ)ĥ i d i z i (1 a i ) ṁ(x i ; ˆβ) i s i s

15 Composite imputation x, y, z: z always observed Imputation model: s = s RR s RM s MR s MM θ N = N i=1 y i s RM : x observed and y missing s MM : x and y missing E ζ (y i x i, z i ) = β y x x i E ζ (x i z i ) = β x z x i Imputed estimator: ˆθ Id = ( ) d i y i + d i ˆβy x x i + i s +R i s RM i s MM d i ( ˆβy x ˆβx z z i )

16 Composite imputation (Cont d) ˆβ y x and ˆβ x z solutions of estimation equations: ( ) ( ) Û 1 βy x = yi β y x x i = 0 i S RR d i Û 2 ( βx z ) = i S R+ d i ( xi β x z z i ) = 0 Taylor linearization of the imputed estimator: ˆθ Id ( ˆβ) = ˆθ Id (β) ( ˆθ Id β ) ( Û β where Û = (Û1, Û 2 ) and β = ( βy x, β x z ). ) 1 Û (β)

17 Stochastic imputation y i = imputed value of y i such that Imputed estimator of θ N : E I (y i ) = m(x i, ˆβ) ˆm i ˆθ I = i s d i {a i y i + (1 a i ) y i } Variance estimator of ˆθ I : E I (ˆθI ) = ˆθ Id ˆV I = ˆV d + ˆV where ˆV = i s d 2 i (1 a i ) (y i ˆm i ) 2

18 Multiple imputation: Rubin y (1) i,..., y (M) i = imputed values of y i (M 2) ˆθ (k) I Imputed estimator = i s Rubin s variance estimator: { } d i a i y i + (1 a i ) y (k) ˆθ MI = M 1 M k=1 ˆθ (k) I ˆV R = W M + M + 1 M B M where W M is the average of M naive variance estimators and B M = (M 1) 1 M k=1 (ˆθ(k) I ˆθ ) 2 MI i

19 Multiple imputation (Cont d) ˆV R theoretically justified when ) ) V (ˆθId = V (ˆθn + V (ˆθId ˆθ ) n (A) (Congenialty assumption) ˆVR seriously biased if assumption (A) violated. (A) not satisfied for domain estimation when domains not specified at the imputation stage. Our proposal: ˆV MI = ˆV d + M 1 B M ˆVMI valid for ˆθ Id as well as ˆθ I,z without (A).

20 Binary response Model: y i x i Bernoulli {m i = m (x i, β 0 )} logit (m i ) = x i β; q (x i, β 0 ) = m i (1 m i ) q i ( ˆm i = m x i, ˆβ ) where ˆβ is the solution to d i a i {y i m (x i, β)} x i = 0 i s Stochastic hot deck imputation { yi 1 with prob ˆmi = 0 with prob 1 ˆm i ˆη i = ˆm i + a i (1 + ĉ x i ) (y i ˆm i ) ĉ = { i s d ia i ˆq i x i x i } 1 i s d i (1 a i ) ˆq i x i.

21 Binary response (Cont d) Fractional imputation (FI): Eliminate imputation variance V by FI M = 2 fractions: impute { yi 1 with fractional weight ˆmi = 0 with fractional weight 1 ˆm i Data file reports real values 1 and 0 with associated fractions ˆm i and 1 ˆm i. ˆθ FI = ˆθ Id : V eliminated Estimation of domain total and mean: ˆθ FI,z, ( i s d iz i ) 1 ˆθ FI,z

22 Binary response (Cont d) Multiple imputation (MI): { 1. Generate β N ( ˆβ, i s a ) } i ˆq i x i x i 1 2. Generate yi Bernoulli (mi ) with m i = m (x i, β ) 3. Repeat steps 1 and 2 independently M times.

23 Simulation Study : Binary response Finite population of size N = 10, 000 from x i N (3, 1) y i x i Bernoulli (m i ), where logit (m i ) = 0.5x i 2 z i Bernoulli (0.4) (z i : Domain indicator) SRS of size n = 100 x i and z i : always observed. y i subject to missing. Missing response mechanism a i Bernoulli (π i ) ; logit (π i ) = φ 0 + φ 1 (x i 3) + φ 2 x i 3 (a) φ 1 = 0, φ 2 = 0; (b) φ 1 = 1, φ 2 = 0; (c) φ 1 = 0, φ 2 = 1 φ 0 is determined to achieve 70% response rate. Two variance estimates of multiple imputation are computed.

24 Simulation Study (Cont d) Table: Relative bias (RB) of the Rubin s variance estimator (R) and proposed variance estimator (KR) for multiple imputation Parameter Response RB (%) Mechanism R KR Case Population Case Mean Case Case Domain Case Mean Case Conclusion: 1. KR has small RB in all cases 2. R leads to large RB in the case of domain mean: 28% to 34%

25 Doubly robust method Case 1: p i known (p i = probability of response) Let β be the solution to Û (β) = ( ) 1 d i a i 1 {y i m (x i, β)} h (x i, β) = 0 p i i s Imputed estimator: θ Id = i s d i {a i y i + (1 a i ) m(x i, β) } If 1 is an element of h i, then θ Id = { ( ai d i y i + 1 a ) } i m(x i, β) p i p i i s

26 Doubly robust method (Cont d) Properties of θ Id : 1. Under the assumed response model, E R ( θ Id ) = ˆθ n regardless of the choice of m(x i, β). 2. Under the imputation model, E ζ ( θ Id ˆθ n ) = 0. (1) and (2) imply that θ Id is doubly robust.

27 Doubly robust method (Cont d) Case 2: p i unknown (p i = p i (α)) Linearization variance estimator: Haziza and Rao (2006): linear regression imputation Deville (1999), Demnati and Rao (2004) approach: general case

28 Extensions Calibration estimators Davison and Sardy (2007): deterministic linear regression imputation, stratified SRS Pseudo-empirical likelihood intervals Other parameters

### A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

### Parametric fractional imputation for missing data analysis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Missing Data Dr Eleni Matechou

1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

### Missing Data in Quantitative Social Research

PSC Discussion Papers Series Volume 15 Issue 14 Article 1 10-1-2001 Missing Data in Quantitative Social Research S. Obeng-Manu Gyimah University of Western Ontario Follow this and additional works at:

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Combining Multiple Imputation and Inverse Probability Weighting

Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman 1, Ian White 1, Andrew Copas 2,3, Leah Li 4 1 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research

### Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Comparison of Imputation Methods in the Survey of Income and Program Participation

Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### From the help desk: hurdle models

The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

### Workpackage 11 Imputation and Non-Response. Deliverable 11.2

Workpackage 11 Imputation and Non-Response Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Missing data and net survival analysis Bernard Rachet

Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

### Multiple Choice Models II

Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

### 3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### Credit Risk Models: An Overview

Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Modern Methods for Missing Data

Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

### Fraternity & Sorority Academic Report Spring 2016

Fraternity & Sorority Academic Report Organization Overall GPA Triangle 17-17 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190-190 4 Phi Gamma Delta 85 3

### Classification Problems

Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Fraternity & Sorority Academic Report Fall 2015

Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 1-1 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Basic Statistcs Formula Sheet

Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based procedures are reviewed,

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Some of Statistics Canada s Contributions to Survey Methodology

2 Some of Statistics Canada s Contributions to Survey Methodology Jean-François Beaumont, Susie Fortier, Jack Gambino, Mike Hidiroglou, and Pierre Lavallée Statistics Canada, Ottawa, ON The conduct of

### Need for Sampling. Very large populations Destructive testing Continuous production process

Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Extreme Value Modeling for Detection and Attribution of Climate Extremes

Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.6 Comparing the

### Chapter 19 Statistical analysis of survey data. Abstract

Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Approaches for Analyzing Survey Data: a Discussion

Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata

ALLISON 1 TABLE OF CONTENTS 1. INTRODUCTION... 3 2. ASSUMPTIONS... 6 MISSING COMPLETELY AT RANDOM (MCAR)... 6 MISSING AT RANDOM (MAR)... 7 IGNORABLE... 8 NONIGNORABLE... 8 3. CONVENTIONAL METHODS... 10

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### Factorial experimental designs and generalized linear models

Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### The University of Kansas

All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report

University of Maryland Fraternity & Sorority Life Academic Report Academic and Population Statistics Population: # of Students: # of New Members: Avg. Size: Avg. GPA: % of the Undergraduate Population

### Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

### Monte Carlo Simulation

1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### ARMA, GARCH and Related Option Pricing Method

ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### Survey Data Analysis in Stata

Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of

### Erdős on polynomials

Erdős on polynomials Vilmos Totik University of Szeged and University of South Florida totik@mail.usf.edu Vilmos Totik (SZTE and USF) Polynomials 1 / * Erdős on polynomials Vilmos Totik (SZTE and USF)

### Imputation of missing data under missing not at random assumption & sensitivity analysis

Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,

### Double Sampling: What is it?

FOR 474: Forest Inventory Techniques Double Sampling What is it? Why is it Used? Double Sampling: What is it? In many cases in forestry it is too expensive or difficult to measure what you want (e.g. total

### ECON Introductory Econometrics. Lecture 15: Binary dependent variables

ECON4150 - Introductory Econometrics Lecture 15: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

### Heteroskedasticity and Weighted Least Squares

Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Risk Preferences and Demand Drivers of Extended Warranties

Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data

### ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Sovereign Defaults. Iskander Karibzhanov. October 14, 2014

Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations

### An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### arxiv:1206.6666v1 [stat.ap] 28 Jun 2012

The Annals of Applied Statistics 2012, Vol. 6, No. 2, 772 794 DOI: 10.1214/11-AOAS521 In the Public Domain arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 ANALYZING ESTABLISHMENT NONRESPONSE USING AN INTERPRETABLE

### Course 4 Examination Questions And Illustrative Solutions. November 2000

Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile first-order moving average model to a time series. The lag-one sample autocorrelation coefficient is 0.35.

### Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research

Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research Changbao Wu and Bixia Xu 1 Abstract The scale factor refers to an unknown size variable which affects some or all