A General Approach to Variance Estimation under Imputation for Missing Survey Data


 Charleen Terry
 1 years ago
 Views:
Transcription
1 A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey Sampling in honor of JeanClaude Deville, Neuchâtel, Switzerland, June 2426, 2009
2 Outline Item Nonresponse Deterministic imputation: Population model approach Imputed estimator Linearization variance estimator Examples: Domain estimation, Composite imputation Stochastic imputation: variance estimation Examples: Multiple imputation, binary response Simulation results Doubly robust approach Extensions
3 Survey Data Design features: clustering, stratification, unequal probability of selection Source of error: 1. Sampling errors 2. Nonsampling errors: Nonresponse (missing data) Noncoverage Measurement errors
4 Types of nonresponse Unit (or total) nonresponse: refusal, notathome Remedy: weight adjustment within classes Item nonresponse: sensitive item, answer not known, inconsistent answer Remedy: imputation (fill in missing data)
5 Advantages of imputation Complete data file: standard complete data methods Different analyses consistent with each other Reduce nonresponse bias Auxiliary x observed can be used to get good imputed values Same survey weight for all items
6 Commonly used imputation methods Marginal imputation methods: 1. Business surveys: Ratio, Regression, Nearest neighbor (NN) 2. Socioeconomic surveys: Random donor (within classes), Stochastic ratio or regression, Fractional imputation (FI), Multiple imputation (MI)
7 Complete response setup Population total: θ N = N i=1 y i NHT estimator: ˆθ n = i s d i y i where d i = π 1 i : design weight π i = inclusion probability = Pr (i s) Variance estimator: ˆV n = i s Ω ij y i y j j s Ω ij depends on joint inclusion probabilities π ij > 0
8 Deterministic imputation Population model approach (Deville and Särndal, 1994): E ζ (y i x i ) = m (x i, β 0 ) a i = 1 if y i observed when i s = 0 otherwise for i U = {1, 2,, N} MAR: Distribution of a i depends only on x i Imputed value: ŷ i = m(x i, ˆβ) ˆβ: unique solution of EE Û (β) = d i a i {y i m (x i, β)} h (x i, β) = 0 i s
9 Model specification Further model specification: Var ζ (y i x i ) = σ 2 q (x i, β 0 ) h (x i, β) = ṁ (x i, β) /q (x i, β) h i Examples: commonly used imputations 1. Ratio imputation: h i = 1 E ζ (y i x i ) = β 0 x i, Var ζ (y i x i ) = σ 2 x i 2. Linear regression imputation: h i = x i E ζ (y i x i ) = x i β 0, Var ζ (y i x i ) = σ 2 3. Logistic regression imputation (y i = 0 or 1): h i = x i log {m i / (1 m i )} = x i β 0, Var ζ (y i x i ) = m i (1 m i ) where m i = E ζ (y i x i )
10 Imputed estimator Imputed estimator of total θ N : ˆθ Id = } d i {a i y i + (1 a i ) m(x i, ˆβ) i s i s d i ỹ i Examples 1. Ratio imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) 1 ia i x i i s d ia i y i 2. Linear regression imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) ia i x i x i 1 i s d ia i x i y i 3. Logistic regression imputation: ˆβ is the solution to i s d i {y i m (x i, β)} x i = 0 Imputed estimator of domain total θ z = N i=1 z iy i : ˆθ I,z = i s d i z i ỹ i where z i = 1 if i D; z i = 0 otherwise.
11 Variance estimation Treating imputed values as if observed: Underestimation if ỹ i used in ˆV n for y i Methods that account for imputation: Adjusted jackknife: Rao and Shao (1992) Linearization (Pop. model): Deville and Särndal (1994) Fractional imputation method: Fuller and Kim (2005) Bootstrap: Shao and Sitter (1996) Reverse approach: Shao and Steel (1999)
12 Variance estimation (Cont d) Linearization method: Theorem 1 (Kim and Rao, 2009): Under regularity conditions, n 1/2 N 1 (ˆθId θ Id ) = o p (1) where θ Id = i s w i η i { η i = m (x i ; β 0 ) + a i 1 + c } h i {yi m (x i ; β 0 )}, { N } 1 N c = a i ṁ (x i ; β 0 ) h i (1 a i ) ṁ (x i ; β 0 ). i=1 Reference distribution: Joint distribution of population model and sampling mechanism, conditional on realized (x i, a i ) in the population. i=1
13 Variance estimation (Cont d) Reverse approach: 1. ˆV 1d = Ω ij ˆη i ˆη j i s j s 2. where ˆη i = η i ( ˆβ). ˆV 2d = i s ( ) 2 ( )} d i a i 1 + ĉ ĥ i {y i m x i ; ˆβ ˆV 2d valid even if V ζ (y i x i ) is misspecified. 3. Variance estimator of ˆθ Id ( θ Id ): ˆV d = ˆV 1d + ˆV 2d ˆV d approximately designmodel unbiased. If the overall sampling rate negligible: ˆV d = ˆV1d
14 Variance estimation (Cont d) Domain estimation: 1. ˆθ I,z : designmodel unbiased for θ z 2. Use ˆV 1d = Ω ij ˆη iz ˆη jz i s j s where ˆη iz = z i m(x i ; ˆβ) + a i {z i + ĉ zh i } { } y i m(x i ; ˆβ), ĉ z = { } 1 d i a i ṁ(x i ; ˆβ)ĥ i d i z i (1 a i ) ṁ(x i ; ˆβ) i s i s
15 Composite imputation x, y, z: z always observed Imputation model: s = s RR s RM s MR s MM θ N = N i=1 y i s RM : x observed and y missing s MM : x and y missing E ζ (y i x i, z i ) = β y x x i E ζ (x i z i ) = β x z x i Imputed estimator: ˆθ Id = ( ) d i y i + d i ˆβy x x i + i s +R i s RM i s MM d i ( ˆβy x ˆβx z z i )
16 Composite imputation (Cont d) ˆβ y x and ˆβ x z solutions of estimation equations: ( ) ( ) Û 1 βy x = yi β y x x i = 0 i S RR d i Û 2 ( βx z ) = i S R+ d i ( xi β x z z i ) = 0 Taylor linearization of the imputed estimator: ˆθ Id ( ˆβ) = ˆθ Id (β) ( ˆθ Id β ) ( Û β where Û = (Û1, Û 2 ) and β = ( βy x, β x z ). ) 1 Û (β)
17 Stochastic imputation y i = imputed value of y i such that Imputed estimator of θ N : E I (y i ) = m(x i, ˆβ) ˆm i ˆθ I = i s d i {a i y i + (1 a i ) y i } Variance estimator of ˆθ I : E I (ˆθI ) = ˆθ Id ˆV I = ˆV d + ˆV where ˆV = i s d 2 i (1 a i ) (y i ˆm i ) 2
18 Multiple imputation: Rubin y (1) i,..., y (M) i = imputed values of y i (M 2) ˆθ (k) I Imputed estimator = i s Rubin s variance estimator: { } d i a i y i + (1 a i ) y (k) ˆθ MI = M 1 M k=1 ˆθ (k) I ˆV R = W M + M + 1 M B M where W M is the average of M naive variance estimators and B M = (M 1) 1 M k=1 (ˆθ(k) I ˆθ ) 2 MI i
19 Multiple imputation (Cont d) ˆV R theoretically justified when ) ) V (ˆθId = V (ˆθn + V (ˆθId ˆθ ) n (A) (Congenialty assumption) ˆVR seriously biased if assumption (A) violated. (A) not satisfied for domain estimation when domains not specified at the imputation stage. Our proposal: ˆV MI = ˆV d + M 1 B M ˆVMI valid for ˆθ Id as well as ˆθ I,z without (A).
20 Binary response Model: y i x i Bernoulli {m i = m (x i, β 0 )} logit (m i ) = x i β; q (x i, β 0 ) = m i (1 m i ) q i ( ˆm i = m x i, ˆβ ) where ˆβ is the solution to d i a i {y i m (x i, β)} x i = 0 i s Stochastic hot deck imputation { yi 1 with prob ˆmi = 0 with prob 1 ˆm i ˆη i = ˆm i + a i (1 + ĉ x i ) (y i ˆm i ) ĉ = { i s d ia i ˆq i x i x i } 1 i s d i (1 a i ) ˆq i x i.
21 Binary response (Cont d) Fractional imputation (FI): Eliminate imputation variance V by FI M = 2 fractions: impute { yi 1 with fractional weight ˆmi = 0 with fractional weight 1 ˆm i Data file reports real values 1 and 0 with associated fractions ˆm i and 1 ˆm i. ˆθ FI = ˆθ Id : V eliminated Estimation of domain total and mean: ˆθ FI,z, ( i s d iz i ) 1 ˆθ FI,z
22 Binary response (Cont d) Multiple imputation (MI): { 1. Generate β N ( ˆβ, i s a ) } i ˆq i x i x i 1 2. Generate yi Bernoulli (mi ) with m i = m (x i, β ) 3. Repeat steps 1 and 2 independently M times.
23 Simulation Study : Binary response Finite population of size N = 10, 000 from x i N (3, 1) y i x i Bernoulli (m i ), where logit (m i ) = 0.5x i 2 z i Bernoulli (0.4) (z i : Domain indicator) SRS of size n = 100 x i and z i : always observed. y i subject to missing. Missing response mechanism a i Bernoulli (π i ) ; logit (π i ) = φ 0 + φ 1 (x i 3) + φ 2 x i 3 (a) φ 1 = 0, φ 2 = 0; (b) φ 1 = 1, φ 2 = 0; (c) φ 1 = 0, φ 2 = 1 φ 0 is determined to achieve 70% response rate. Two variance estimates of multiple imputation are computed.
24 Simulation Study (Cont d) Table: Relative bias (RB) of the Rubin s variance estimator (R) and proposed variance estimator (KR) for multiple imputation Parameter Response RB (%) Mechanism R KR Case Population Case Mean Case Case Domain Case Mean Case Conclusion: 1. KR has small RB in all cases 2. R leads to large RB in the case of domain mean: 28% to 34%
25 Doubly robust method Case 1: p i known (p i = probability of response) Let β be the solution to Û (β) = ( ) 1 d i a i 1 {y i m (x i, β)} h (x i, β) = 0 p i i s Imputed estimator: θ Id = i s d i {a i y i + (1 a i ) m(x i, β) } If 1 is an element of h i, then θ Id = { ( ai d i y i + 1 a ) } i m(x i, β) p i p i i s
26 Doubly robust method (Cont d) Properties of θ Id : 1. Under the assumed response model, E R ( θ Id ) = ˆθ n regardless of the choice of m(x i, β). 2. Under the imputation model, E ζ ( θ Id ˆθ n ) = 0. (1) and (2) imply that θ Id is doubly robust.
27 Doubly robust method (Cont d) Case 2: p i unknown (p i = p i (α)) Linearization variance estimator: Haziza and Rao (2006): linear regression imputation Deville (1999), Demnati and Rao (2004) approach: general case
28 Extensions Calibration estimators Davison and Sardy (2007): deterministic linear regression imputation, stratified SRS Pseudoempirical likelihood intervals Other parameters
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under TwoLevel Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under TwoLevel Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationMissing Data Dr Eleni Matechou
1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.
More informationMissing Data in Quantitative Social Research
PSC Discussion Papers Series Volume 15 Issue 14 Article 1 1012001 Missing Data in Quantitative Social Research S. ObengManu Gyimah University of Western Ontario Follow this and additional works at:
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationCombining Multiple Imputation and Inverse Probability Weighting
Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman 1, Ian White 1, Andrew Copas 2,3, Leah Li 4 1 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research
More informationUsing Repeated Measures Techniques To Analyze Clustercorrelated Survey Responses
Using Repeated Measures Techniques To Analyze Clustercorrelated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationReject Inference in Credit Scoring. JieMen Mok
Reject Inference in Credit Scoring JieMen Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationComparison of Imputation Methods in the Survey of Income and Program Participation
Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author
More informationproblem arises when only a nonrandom sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a nonrandom
More informationMultilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia RabeHesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 113 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationAnalyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationWorkpackage 11 Imputation and NonResponse. Deliverable 11.2
Workpackage 11 Imputation and NonResponse Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationHandling attrition and nonresponse in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 6372 Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationChapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 2729 July 2015 Missing data and net survival analysis Bernard Rachet General context Populationbased,
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More information2. What are the theoretical and practical consequences of autocorrelation?
Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationCredit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationDEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests
DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also
More informationModern Methods for Missing Data
Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years
More informationFraternity & Sorority Academic Report Spring 2016
Fraternity & Sorority Academic Report Organization Overall GPA Triangle 1717 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190190 4 Phi Gamma Delta 85 3
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationFraternity & Sorority Academic Report Fall 2015
Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 11 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 18831889 Note on the M Algorithm in Linear Regression Model JiXia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène JacqminGadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationBasic Statistcs Formula Sheet
Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only meanbased procedures are reviewed,
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationSome of Statistics Canada s Contributions to Survey Methodology
2 Some of Statistics Canada s Contributions to Survey Methodology JeanFrançois Beaumont, Susie Fortier, Jack Gambino, Mike Hidiroglou, and Pierre Lavallée Statistics Canada, Ottawa, ON The conduct of
More informationNeed for Sampling. Very large populations Destructive testing Continuous production process
Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VAaffiliated statisticians;
More informationImputing Missing Data using SAS
ABSTRACT Paper 32952015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationAn Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFFRiskLab, Madrid http://www.risklabmadrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationChapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
More informationExtreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
More informationChapter 9: Hypothesis Testing Sections
Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 TwoSided Alternatives 9.6 Comparing the
More informationChapter 19 Statistical analysis of survey data. Abstract
Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract
More informationEconometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England
Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND
More informationApproaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
More informationTABLE OF CONTENTS ALLISON 1 1. INTRODUCTION... 3
ALLISON 1 TABLE OF CONTENTS 1. INTRODUCTION... 3 2. ASSUMPTIONS... 6 MISSING COMPLETELY AT RANDOM (MCAR)... 6 MISSING AT RANDOM (MAR)... 7 IGNORABLE... 8 NONIGNORABLE... 8 3. CONVENTIONAL METHODS... 10
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) JulyDecember 2005, 249268 ISSN: 16962281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationThe University of Kansas
All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationTailDependence an Essential Factor for Correctly Measuring the Benefits of Diversification
TailDependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationUniversity of Maryland Fraternity & Sorority Life Spring 2015 Academic Report
University of Maryland Fraternity & Sorority Life Academic Report Academic and Population Statistics Population: # of Students: # of New Members: Avg. Size: Avg. GPA: % of the Undergraduate Population
More informationComparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
More informationMonte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.unihannover.de web: www.stochastik.unihannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationARMA, GARCH and Related Option Pricing Method
ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September
More informationVariance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.
Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationSurvey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of
More informationErdős on polynomials
Erdős on polynomials Vilmos Totik University of Szeged and University of South Florida totik@mail.usf.edu Vilmos Totik (SZTE and USF) Polynomials 1 / * Erdős on polynomials Vilmos Totik (SZTE and USF)
More informationImputation of missing data under missing not at random assumption & sensitivity analysis
Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,
More informationDouble Sampling: What is it?
FOR 474: Forest Inventory Techniques Double Sampling What is it? Why is it Used? Double Sampling: What is it? In many cases in forestry it is too expensive or difficult to measure what you want (e.g. total
More informationECON Introductory Econometrics. Lecture 15: Binary dependent variables
ECON4150  Introductory Econometrics Lecture 15: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability
More informationHeteroskedasticity and Weighted Least Squares
Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationRisk Preferences and Demand Drivers of Extended Warranties
Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data
More informationESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN TTYPE TESTS FOR COMPLEX DATA
m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN TTYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationSovereign Defaults. Iskander Karibzhanov. October 14, 2014
Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations
More informationAn extension of the factoring likelihood approach for nonmonotone missing data
An extension of the factoring likelihood approach for nonmonotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationarxiv:1206.6666v1 [stat.ap] 28 Jun 2012
The Annals of Applied Statistics 2012, Vol. 6, No. 2, 772 794 DOI: 10.1214/11AOAS521 In the Public Domain arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 ANALYZING ESTABLISHMENT NONRESPONSE USING AN INTERPRETABLE
More informationCourse 4 Examination Questions And Illustrative Solutions. November 2000
Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile firstorder moving average model to a time series. The lagone sample autocorrelation coefficient is 0.35.
More informationDeflator Selection and Generalized Linear Modelling in Marketbased Accounting Research
Deflator Selection and Generalized Linear Modelling in Marketbased Accounting Research Changbao Wu and Bixia Xu 1 Abstract The scale factor refers to an unknown size variable which affects some or all
More informationThe University of Kansas
Fall 2011 Scholarship Report All Greek Summary Rank Chapter Name Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 *4 Gamma Phi Beta 3.28 4 Kappa Kappa Gamma 3.28 6 Pi Beta Phi
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More information