A General Approach to Variance Estimation under Imputation for Missing Survey Data

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A General Approach to Variance Estimation under Imputation for Missing Survey Data"

Transcription

1 A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey Sampling in honor of Jean-Claude Deville, Neuchâtel, Switzerland, June 24-26, 2009

2 Outline Item Nonresponse Deterministic imputation: Population model approach Imputed estimator Linearization variance estimator Examples: Domain estimation, Composite imputation Stochastic imputation: variance estimation Examples: Multiple imputation, binary response Simulation results Doubly robust approach Extensions

3 Survey Data Design features: clustering, stratification, unequal probability of selection Source of error: 1. Sampling errors 2. Non-sampling errors: Nonresponse (missing data) Noncoverage Measurement errors

4 Types of nonresponse Unit (or total) nonresponse: refusal, not-at-home Remedy: weight adjustment within classes Item nonresponse: sensitive item, answer not known, inconsistent answer Remedy: imputation (fill in missing data)

5 Advantages of imputation Complete data file: standard complete data methods Different analyses consistent with each other Reduce nonresponse bias Auxiliary x observed can be used to get good imputed values Same survey weight for all items

6 Commonly used imputation methods Marginal imputation methods: 1. Business surveys: Ratio, Regression, Nearest neighbor (NN) 2. Socio-economic surveys: Random donor (within classes), Stochastic ratio or regression, Fractional imputation (FI), Multiple imputation (MI)

7 Complete response set-up Population total: θ N = N i=1 y i NHT estimator: ˆθ n = i s d i y i where d i = π 1 i : design weight π i = inclusion probability = Pr (i s) Variance estimator: ˆV n = i s Ω ij y i y j j s Ω ij depends on joint inclusion probabilities π ij > 0

8 Deterministic imputation Population model approach (Deville and Särndal, 1994): E ζ (y i x i ) = m (x i, β 0 ) a i = 1 if y i observed when i s = 0 otherwise for i U = {1, 2,, N} MAR: Distribution of a i depends only on x i Imputed value: ŷ i = m(x i, ˆβ) ˆβ: unique solution of EE Û (β) = d i a i {y i m (x i, β)} h (x i, β) = 0 i s

9 Model specification Further model specification: Var ζ (y i x i ) = σ 2 q (x i, β 0 ) h (x i, β) = ṁ (x i, β) /q (x i, β) h i Examples: commonly used imputations 1. Ratio imputation: h i = 1 E ζ (y i x i ) = β 0 x i, Var ζ (y i x i ) = σ 2 x i 2. Linear regression imputation: h i = x i E ζ (y i x i ) = x i β 0, Var ζ (y i x i ) = σ 2 3. Logistic regression imputation (y i = 0 or 1): h i = x i log {m i / (1 m i )} = x i β 0, Var ζ (y i x i ) = m i (1 m i ) where m i = E ζ (y i x i )

10 Imputed estimator Imputed estimator of total θ N : ˆθ Id = } d i {a i y i + (1 a i ) m(x i, ˆβ) i s i s d i ỹ i Examples 1. Ratio imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) 1 ia i x i i s d ia i y i 2. Linear regression imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) ia i x i x i 1 i s d ia i x i y i 3. Logistic regression imputation: ˆβ is the solution to i s d i {y i m (x i, β)} x i = 0 Imputed estimator of domain total θ z = N i=1 z iy i : ˆθ I,z = i s d i z i ỹ i where z i = 1 if i D; z i = 0 otherwise.

11 Variance estimation Treating imputed values as if observed: Underestimation if ỹ i used in ˆV n for y i Methods that account for imputation: Adjusted jackknife: Rao and Shao (1992) Linearization (Pop. model): Deville and Särndal (1994) Fractional imputation method: Fuller and Kim (2005) Bootstrap: Shao and Sitter (1996) Reverse approach: Shao and Steel (1999)

12 Variance estimation (Cont d) Linearization method: Theorem 1 (Kim and Rao, 2009): Under regularity conditions, n 1/2 N 1 (ˆθId θ Id ) = o p (1) where θ Id = i s w i η i { η i = m (x i ; β 0 ) + a i 1 + c } h i {yi m (x i ; β 0 )}, { N } 1 N c = a i ṁ (x i ; β 0 ) h i (1 a i ) ṁ (x i ; β 0 ). i=1 Reference distribution: Joint distribution of population model and sampling mechanism, conditional on realized (x i, a i ) in the population. i=1

13 Variance estimation (Cont d) Reverse approach: 1. ˆV 1d = Ω ij ˆη i ˆη j i s j s 2. where ˆη i = η i ( ˆβ). ˆV 2d = i s ( ) 2 ( )} d i a i 1 + ĉ ĥ i {y i m x i ; ˆβ ˆV 2d valid even if V ζ (y i x i ) is misspecified. 3. Variance estimator of ˆθ Id ( θ Id ): ˆV d = ˆV 1d + ˆV 2d ˆV d approximately design-model unbiased. If the overall sampling rate negligible: ˆV d = ˆV1d

14 Variance estimation (Cont d) Domain estimation: 1. ˆθ I,z : design-model unbiased for θ z 2. Use ˆV 1d = Ω ij ˆη iz ˆη jz i s j s where ˆη iz = z i m(x i ; ˆβ) + a i {z i + ĉ zh i } { } y i m(x i ; ˆβ), ĉ z = { } 1 d i a i ṁ(x i ; ˆβ)ĥ i d i z i (1 a i ) ṁ(x i ; ˆβ) i s i s

15 Composite imputation x, y, z: z always observed Imputation model: s = s RR s RM s MR s MM θ N = N i=1 y i s RM : x observed and y missing s MM : x and y missing E ζ (y i x i, z i ) = β y x x i E ζ (x i z i ) = β x z x i Imputed estimator: ˆθ Id = ( ) d i y i + d i ˆβy x x i + i s +R i s RM i s MM d i ( ˆβy x ˆβx z z i )

16 Composite imputation (Cont d) ˆβ y x and ˆβ x z solutions of estimation equations: ( ) ( ) Û 1 βy x = yi β y x x i = 0 i S RR d i Û 2 ( βx z ) = i S R+ d i ( xi β x z z i ) = 0 Taylor linearization of the imputed estimator: ˆθ Id ( ˆβ) = ˆθ Id (β) ( ˆθ Id β ) ( Û β where Û = (Û1, Û 2 ) and β = ( βy x, β x z ). ) 1 Û (β)

17 Stochastic imputation y i = imputed value of y i such that Imputed estimator of θ N : E I (y i ) = m(x i, ˆβ) ˆm i ˆθ I = i s d i {a i y i + (1 a i ) y i } Variance estimator of ˆθ I : E I (ˆθI ) = ˆθ Id ˆV I = ˆV d + ˆV where ˆV = i s d 2 i (1 a i ) (y i ˆm i ) 2

18 Multiple imputation: Rubin y (1) i,..., y (M) i = imputed values of y i (M 2) ˆθ (k) I Imputed estimator = i s Rubin s variance estimator: { } d i a i y i + (1 a i ) y (k) ˆθ MI = M 1 M k=1 ˆθ (k) I ˆV R = W M + M + 1 M B M where W M is the average of M naive variance estimators and B M = (M 1) 1 M k=1 (ˆθ(k) I ˆθ ) 2 MI i

19 Multiple imputation (Cont d) ˆV R theoretically justified when ) ) V (ˆθId = V (ˆθn + V (ˆθId ˆθ ) n (A) (Congenialty assumption) ˆVR seriously biased if assumption (A) violated. (A) not satisfied for domain estimation when domains not specified at the imputation stage. Our proposal: ˆV MI = ˆV d + M 1 B M ˆVMI valid for ˆθ Id as well as ˆθ I,z without (A).

20 Binary response Model: y i x i Bernoulli {m i = m (x i, β 0 )} logit (m i ) = x i β; q (x i, β 0 ) = m i (1 m i ) q i ( ˆm i = m x i, ˆβ ) where ˆβ is the solution to d i a i {y i m (x i, β)} x i = 0 i s Stochastic hot deck imputation { yi 1 with prob ˆmi = 0 with prob 1 ˆm i ˆη i = ˆm i + a i (1 + ĉ x i ) (y i ˆm i ) ĉ = { i s d ia i ˆq i x i x i } 1 i s d i (1 a i ) ˆq i x i.

21 Binary response (Cont d) Fractional imputation (FI): Eliminate imputation variance V by FI M = 2 fractions: impute { yi 1 with fractional weight ˆmi = 0 with fractional weight 1 ˆm i Data file reports real values 1 and 0 with associated fractions ˆm i and 1 ˆm i. ˆθ FI = ˆθ Id : V eliminated Estimation of domain total and mean: ˆθ FI,z, ( i s d iz i ) 1 ˆθ FI,z

22 Binary response (Cont d) Multiple imputation (MI): { 1. Generate β N ( ˆβ, i s a ) } i ˆq i x i x i 1 2. Generate yi Bernoulli (mi ) with m i = m (x i, β ) 3. Repeat steps 1 and 2 independently M times.

23 Simulation Study : Binary response Finite population of size N = 10, 000 from x i N (3, 1) y i x i Bernoulli (m i ), where logit (m i ) = 0.5x i 2 z i Bernoulli (0.4) (z i : Domain indicator) SRS of size n = 100 x i and z i : always observed. y i subject to missing. Missing response mechanism a i Bernoulli (π i ) ; logit (π i ) = φ 0 + φ 1 (x i 3) + φ 2 x i 3 (a) φ 1 = 0, φ 2 = 0; (b) φ 1 = 1, φ 2 = 0; (c) φ 1 = 0, φ 2 = 1 φ 0 is determined to achieve 70% response rate. Two variance estimates of multiple imputation are computed.

24 Simulation Study (Cont d) Table: Relative bias (RB) of the Rubin s variance estimator (R) and proposed variance estimator (KR) for multiple imputation Parameter Response RB (%) Mechanism R KR Case Population Case Mean Case Case Domain Case Mean Case Conclusion: 1. KR has small RB in all cases 2. R leads to large RB in the case of domain mean: 28% to 34%

25 Doubly robust method Case 1: p i known (p i = probability of response) Let β be the solution to Û (β) = ( ) 1 d i a i 1 {y i m (x i, β)} h (x i, β) = 0 p i i s Imputed estimator: θ Id = i s d i {a i y i + (1 a i ) m(x i, β) } If 1 is an element of h i, then θ Id = { ( ai d i y i + 1 a ) } i m(x i, β) p i p i i s

26 Doubly robust method (Cont d) Properties of θ Id : 1. Under the assumed response model, E R ( θ Id ) = ˆθ n regardless of the choice of m(x i, β). 2. Under the imputation model, E ζ ( θ Id ˆθ n ) = 0. (1) and (2) imply that θ Id is doubly robust.

27 Doubly robust method (Cont d) Case 2: p i unknown (p i = p i (α)) Linearization variance estimator: Haziza and Rao (2006): linear regression imputation Deville (1999), Demnati and Rao (2004) approach: general case

28 Extensions Calibration estimators Davison and Sardy (2007): deterministic linear regression imputation, stratified SRS Pseudo-empirical likelihood intervals Other parameters

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Missing Data Dr Eleni Matechou

Missing Data Dr Eleni Matechou 1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

More information

Missing Data in Quantitative Social Research

Missing Data in Quantitative Social Research PSC Discussion Papers Series Volume 15 Issue 14 Article 1 10-1-2001 Missing Data in Quantitative Social Research S. Obeng-Manu Gyimah University of Western Ontario Follow this and additional works at:

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Combining Multiple Imputation and Inverse Probability Weighting

Combining Multiple Imputation and Inverse Probability Weighting Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman 1, Ian White 1, Andrew Copas 2,3, Leah Li 4 1 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Comparison of Imputation Methods in the Survey of Income and Program Participation

Comparison of Imputation Methods in the Survey of Income and Program Participation Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

From the help desk: hurdle models

From the help desk: hurdle models The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

More information

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

Workpackage 11 Imputation and Non-Response. Deliverable 11.2

Workpackage 11 Imputation and Non-Response. Deliverable 11.2 Workpackage 11 Imputation and Non-Response Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

Multiple Choice Models II

Multiple Choice Models II Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

2. What are the theoretical and practical consequences of autocorrelation?

2. What are the theoretical and practical consequences of autocorrelation? Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

More information

Modern Methods for Missing Data

Modern Methods for Missing Data Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

More information

Fraternity & Sorority Academic Report Spring 2016

Fraternity & Sorority Academic Report Spring 2016 Fraternity & Sorority Academic Report Organization Overall GPA Triangle 17-17 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190-190 4 Phi Gamma Delta 85 3

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Fraternity & Sorority Academic Report Fall 2015

Fraternity & Sorority Academic Report Fall 2015 Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 1-1 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Basic Statistcs Formula Sheet

Basic Statistcs Formula Sheet Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based procedures are reviewed,

More information

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures. Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

More information

Some of Statistics Canada s Contributions to Survey Methodology

Some of Statistics Canada s Contributions to Survey Methodology 2 Some of Statistics Canada s Contributions to Survey Methodology Jean-François Beaumont, Susie Fortier, Jack Gambino, Mike Hidiroglou, and Pierre Lavallée Statistics Canada, Ottawa, ON The conduct of

More information

Need for Sampling. Very large populations Destructive testing Continuous production process

Need for Sampling. Very large populations Destructive testing Continuous production process Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis. The beginning of many types of regression Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Extreme Value Modeling for Detection and Attribution of Climate Extremes Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016

More information

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.6 Comparing the

More information

Chapter 19 Statistical analysis of survey data. Abstract

Chapter 19 Statistical analysis of survey data. Abstract Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract

More information

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

More information

Approaches for Analyzing Survey Data: a Discussion

Approaches for Analyzing Survey Data: a Discussion Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata

More information

TABLE OF CONTENTS ALLISON 1 1. INTRODUCTION... 3

TABLE OF CONTENTS ALLISON 1 1. INTRODUCTION... 3 ALLISON 1 TABLE OF CONTENTS 1. INTRODUCTION... 3 2. ASSUMPTIONS... 6 MISSING COMPLETELY AT RANDOM (MCAR)... 6 MISSING AT RANDOM (MAR)... 7 IGNORABLE... 8 NONIGNORABLE... 8 3. CONVENTIONAL METHODS... 10

More information

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i ) Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

The University of Kansas

The University of Kansas All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report

University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report University of Maryland Fraternity & Sorority Life Academic Report Academic and Population Statistics Population: # of Students: # of New Members: Avg. Size: Avg. GPA: % of the Undergraduate Population

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

More information

ARMA, GARCH and Related Option Pricing Method

ARMA, GARCH and Related Option Pricing Method ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September

More information

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212. Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of

More information

Erdős on polynomials

Erdős on polynomials Erdős on polynomials Vilmos Totik University of Szeged and University of South Florida totik@mail.usf.edu Vilmos Totik (SZTE and USF) Polynomials 1 / * Erdős on polynomials Vilmos Totik (SZTE and USF)

More information

Imputation of missing data under missing not at random assumption & sensitivity analysis

Imputation of missing data under missing not at random assumption & sensitivity analysis Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,

More information

Double Sampling: What is it?

Double Sampling: What is it? FOR 474: Forest Inventory Techniques Double Sampling What is it? Why is it Used? Double Sampling: What is it? In many cases in forestry it is too expensive or difficult to measure what you want (e.g. total

More information

ECON Introductory Econometrics. Lecture 15: Binary dependent variables

ECON Introductory Econometrics. Lecture 15: Binary dependent variables ECON4150 - Introductory Econometrics Lecture 15: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

More information

Heteroskedasticity and Weighted Least Squares

Heteroskedasticity and Weighted Least Squares Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

Risk Preferences and Demand Drivers of Extended Warranties

Risk Preferences and Demand Drivers of Extended Warranties Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data

More information

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Sovereign Defaults. Iskander Karibzhanov. October 14, 2014

Sovereign Defaults. Iskander Karibzhanov. October 14, 2014 Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

arxiv:1206.6666v1 [stat.ap] 28 Jun 2012

arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 The Annals of Applied Statistics 2012, Vol. 6, No. 2, 772 794 DOI: 10.1214/11-AOAS521 In the Public Domain arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 ANALYZING ESTABLISHMENT NONRESPONSE USING AN INTERPRETABLE

More information

Course 4 Examination Questions And Illustrative Solutions. November 2000

Course 4 Examination Questions And Illustrative Solutions. November 2000 Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile first-order moving average model to a time series. The lag-one sample autocorrelation coefficient is 0.35.

More information

Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research

Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research Changbao Wu and Bixia Xu 1 Abstract The scale factor refers to an unknown size variable which affects some or all

More information

The University of Kansas

The University of Kansas Fall 2011 Scholarship Report All Greek Summary Rank Chapter Name Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 *4 Gamma Phi Beta 3.28 4 Kappa Kappa Gamma 3.28 6 Pi Beta Phi

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information