A General Approach to Variance Estimation under Imputation for Missing Survey Data
|
|
|
- Charleen Terry
- 9 years ago
- Views:
Transcription
1 A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey Sampling in honor of Jean-Claude Deville, Neuchâtel, Switzerland, June 24-26, 2009
2 Outline Item Nonresponse Deterministic imputation: Population model approach Imputed estimator Linearization variance estimator Examples: Domain estimation, Composite imputation Stochastic imputation: variance estimation Examples: Multiple imputation, binary response Simulation results Doubly robust approach Extensions
3 Survey Data Design features: clustering, stratification, unequal probability of selection Source of error: 1. Sampling errors 2. Non-sampling errors: Nonresponse (missing data) Noncoverage Measurement errors
4 Types of nonresponse Unit (or total) nonresponse: refusal, not-at-home Remedy: weight adjustment within classes Item nonresponse: sensitive item, answer not known, inconsistent answer Remedy: imputation (fill in missing data)
5 Advantages of imputation Complete data file: standard complete data methods Different analyses consistent with each other Reduce nonresponse bias Auxiliary x observed can be used to get good imputed values Same survey weight for all items
6 Commonly used imputation methods Marginal imputation methods: 1. Business surveys: Ratio, Regression, Nearest neighbor (NN) 2. Socio-economic surveys: Random donor (within classes), Stochastic ratio or regression, Fractional imputation (FI), Multiple imputation (MI)
7 Complete response set-up Population total: θ N = N i=1 y i NHT estimator: ˆθ n = i s d i y i where d i = π 1 i : design weight π i = inclusion probability = Pr (i s) Variance estimator: ˆV n = i s Ω ij y i y j j s Ω ij depends on joint inclusion probabilities π ij > 0
8 Deterministic imputation Population model approach (Deville and Särndal, 1994): E ζ (y i x i ) = m (x i, β 0 ) a i = 1 if y i observed when i s = 0 otherwise for i U = {1, 2,, N} MAR: Distribution of a i depends only on x i Imputed value: ŷ i = m(x i, ˆβ) ˆβ: unique solution of EE Û (β) = d i a i {y i m (x i, β)} h (x i, β) = 0 i s
9 Model specification Further model specification: Var ζ (y i x i ) = σ 2 q (x i, β 0 ) h (x i, β) = ṁ (x i, β) /q (x i, β) h i Examples: commonly used imputations 1. Ratio imputation: h i = 1 E ζ (y i x i ) = β 0 x i, Var ζ (y i x i ) = σ 2 x i 2. Linear regression imputation: h i = x i E ζ (y i x i ) = x i β 0, Var ζ (y i x i ) = σ 2 3. Logistic regression imputation (y i = 0 or 1): h i = x i log {m i / (1 m i )} = x i β 0, Var ζ (y i x i ) = m i (1 m i ) where m i = E ζ (y i x i )
10 Imputed estimator Imputed estimator of total θ N : ˆθ Id = } d i {a i y i + (1 a i ) m(x i, ˆβ) i s i s d i ỹ i Examples 1. Ratio imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) 1 ia i x i i s d ia i y i 2. Linear regression imputation: m(x i, ˆβ) = x i ˆβ where ˆβ = ( i s d ) ia i x i x i 1 i s d ia i x i y i 3. Logistic regression imputation: ˆβ is the solution to i s d i {y i m (x i, β)} x i = 0 Imputed estimator of domain total θ z = N i=1 z iy i : ˆθ I,z = i s d i z i ỹ i where z i = 1 if i D; z i = 0 otherwise.
11 Variance estimation Treating imputed values as if observed: Underestimation if ỹ i used in ˆV n for y i Methods that account for imputation: Adjusted jackknife: Rao and Shao (1992) Linearization (Pop. model): Deville and Särndal (1994) Fractional imputation method: Fuller and Kim (2005) Bootstrap: Shao and Sitter (1996) Reverse approach: Shao and Steel (1999)
12 Variance estimation (Cont d) Linearization method: Theorem 1 (Kim and Rao, 2009): Under regularity conditions, n 1/2 N 1 (ˆθId θ Id ) = o p (1) where θ Id = i s w i η i { η i = m (x i ; β 0 ) + a i 1 + c } h i {yi m (x i ; β 0 )}, { N } 1 N c = a i ṁ (x i ; β 0 ) h i (1 a i ) ṁ (x i ; β 0 ). i=1 Reference distribution: Joint distribution of population model and sampling mechanism, conditional on realized (x i, a i ) in the population. i=1
13 Variance estimation (Cont d) Reverse approach: 1. ˆV 1d = Ω ij ˆη i ˆη j i s j s 2. where ˆη i = η i ( ˆβ). ˆV 2d = i s ( ) 2 ( )} d i a i 1 + ĉ ĥ i {y i m x i ; ˆβ ˆV 2d valid even if V ζ (y i x i ) is misspecified. 3. Variance estimator of ˆθ Id ( θ Id ): ˆV d = ˆV 1d + ˆV 2d ˆV d approximately design-model unbiased. If the overall sampling rate negligible: ˆV d = ˆV1d
14 Variance estimation (Cont d) Domain estimation: 1. ˆθ I,z : design-model unbiased for θ z 2. Use ˆV 1d = Ω ij ˆη iz ˆη jz i s j s where ˆη iz = z i m(x i ; ˆβ) + a i {z i + ĉ zh i } { } y i m(x i ; ˆβ), ĉ z = { } 1 d i a i ṁ(x i ; ˆβ)ĥ i d i z i (1 a i ) ṁ(x i ; ˆβ) i s i s
15 Composite imputation x, y, z: z always observed Imputation model: s = s RR s RM s MR s MM θ N = N i=1 y i s RM : x observed and y missing s MM : x and y missing E ζ (y i x i, z i ) = β y x x i E ζ (x i z i ) = β x z x i Imputed estimator: ˆθ Id = ( ) d i y i + d i ˆβy x x i + i s +R i s RM i s MM d i ( ˆβy x ˆβx z z i )
16 Composite imputation (Cont d) ˆβ y x and ˆβ x z solutions of estimation equations: ( ) ( ) Û 1 βy x = yi β y x x i = 0 i S RR d i Û 2 ( βx z ) = i S R+ d i ( xi β x z z i ) = 0 Taylor linearization of the imputed estimator: ˆθ Id ( ˆβ) = ˆθ Id (β) ( ˆθ Id β ) ( Û β where Û = (Û1, Û 2 ) and β = ( βy x, β x z ). ) 1 Û (β)
17 Stochastic imputation y i = imputed value of y i such that Imputed estimator of θ N : E I (y i ) = m(x i, ˆβ) ˆm i ˆθ I = i s d i {a i y i + (1 a i ) y i } Variance estimator of ˆθ I : E I (ˆθI ) = ˆθ Id ˆV I = ˆV d + ˆV where ˆV = i s d 2 i (1 a i ) (y i ˆm i ) 2
18 Multiple imputation: Rubin y (1) i,..., y (M) i = imputed values of y i (M 2) ˆθ (k) I Imputed estimator = i s Rubin s variance estimator: { } d i a i y i + (1 a i ) y (k) ˆθ MI = M 1 M k=1 ˆθ (k) I ˆV R = W M + M + 1 M B M where W M is the average of M naive variance estimators and B M = (M 1) 1 M k=1 (ˆθ(k) I ˆθ ) 2 MI i
19 Multiple imputation (Cont d) ˆV R theoretically justified when ) ) V (ˆθId = V (ˆθn + V (ˆθId ˆθ ) n (A) (Congenialty assumption) ˆVR seriously biased if assumption (A) violated. (A) not satisfied for domain estimation when domains not specified at the imputation stage. Our proposal: ˆV MI = ˆV d + M 1 B M ˆVMI valid for ˆθ Id as well as ˆθ I,z without (A).
20 Binary response Model: y i x i Bernoulli {m i = m (x i, β 0 )} logit (m i ) = x i β; q (x i, β 0 ) = m i (1 m i ) q i ( ˆm i = m x i, ˆβ ) where ˆβ is the solution to d i a i {y i m (x i, β)} x i = 0 i s Stochastic hot deck imputation { yi 1 with prob ˆmi = 0 with prob 1 ˆm i ˆη i = ˆm i + a i (1 + ĉ x i ) (y i ˆm i ) ĉ = { i s d ia i ˆq i x i x i } 1 i s d i (1 a i ) ˆq i x i.
21 Binary response (Cont d) Fractional imputation (FI): Eliminate imputation variance V by FI M = 2 fractions: impute { yi 1 with fractional weight ˆmi = 0 with fractional weight 1 ˆm i Data file reports real values 1 and 0 with associated fractions ˆm i and 1 ˆm i. ˆθ FI = ˆθ Id : V eliminated Estimation of domain total and mean: ˆθ FI,z, ( i s d iz i ) 1 ˆθ FI,z
22 Binary response (Cont d) Multiple imputation (MI): { 1. Generate β N ( ˆβ, i s a ) } i ˆq i x i x i 1 2. Generate yi Bernoulli (mi ) with m i = m (x i, β ) 3. Repeat steps 1 and 2 independently M times.
23 Simulation Study : Binary response Finite population of size N = 10, 000 from x i N (3, 1) y i x i Bernoulli (m i ), where logit (m i ) = 0.5x i 2 z i Bernoulli (0.4) (z i : Domain indicator) SRS of size n = 100 x i and z i : always observed. y i subject to missing. Missing response mechanism a i Bernoulli (π i ) ; logit (π i ) = φ 0 + φ 1 (x i 3) + φ 2 x i 3 (a) φ 1 = 0, φ 2 = 0; (b) φ 1 = 1, φ 2 = 0; (c) φ 1 = 0, φ 2 = 1 φ 0 is determined to achieve 70% response rate. Two variance estimates of multiple imputation are computed.
24 Simulation Study (Cont d) Table: Relative bias (RB) of the Rubin s variance estimator (R) and proposed variance estimator (KR) for multiple imputation Parameter Response RB (%) Mechanism R KR Case Population Case Mean Case Case Domain Case Mean Case Conclusion: 1. KR has small RB in all cases 2. R leads to large RB in the case of domain mean: 28% to 34%
25 Doubly robust method Case 1: p i known (p i = probability of response) Let β be the solution to Û (β) = ( ) 1 d i a i 1 {y i m (x i, β)} h (x i, β) = 0 p i i s Imputed estimator: θ Id = i s d i {a i y i + (1 a i ) m(x i, β) } If 1 is an element of h i, then θ Id = { ( ai d i y i + 1 a ) } i m(x i, β) p i p i i s
26 Doubly robust method (Cont d) Properties of θ Id : 1. Under the assumed response model, E R ( θ Id ) = ˆθ n regardless of the choice of m(x i, β). 2. Under the imputation model, E ζ ( θ Id ˆθ n ) = 0. (1) and (2) imply that θ Id is doubly robust.
27 Doubly robust method (Cont d) Case 2: p i unknown (p i = p i (α)) Linearization variance estimator: Haziza and Rao (2006): linear regression imputation Deville (1999), Demnati and Rao (2004) approach: general case
28 Extensions Calibration estimators Davison and Sardy (2007): deterministic linear regression imputation, stratified SRS Pseudo-empirical likelihood intervals Other parameters
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
Parametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
Comparison of Imputation Methods in the Survey of Income and Program Participation
Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author
Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
Multilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Missing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
Analyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University [email protected] based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
From the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
Multiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona [email protected] http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
Note on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
Introduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
Need for Sampling. Very large populations Destructive testing Continuous production process
Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
APPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
An Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Problem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
Workpackage 11 Imputation and Non-Response. Deliverable 11.2
Workpackage 11 Imputation and Non-Response Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
An extension of the factoring likelihood approach for non-monotone missing data
An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification
Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon
Extreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Chapter 19 Statistical analysis of survey data. Abstract
Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
Monte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: [email protected] web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
Chapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report
University of Maryland Fraternity & Sorority Life Academic Report Academic and Population Statistics Population: # of Students: # of New Members: Avg. Size: Avg. GPA: % of the Undergraduate Population
Approaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Factorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Course 4 Examination Questions And Illustrative Solutions. November 2000
Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile first-order moving average model to a time series. The lag-one sample autocorrelation coefficient is 0.35.
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Imputation of missing data under missing not at random assumption & sensitivity analysis
Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,
Sovereign Defaults. Iskander Karibzhanov. October 14, 2014
Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations
Differential privacy in health care analytics and medical research An interactive tutorial
Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could
Confidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
Erdős on polynomials
Erdős on polynomials Vilmos Totik University of Szeged and University of South Florida [email protected] Vilmos Totik (SZTE and USF) Polynomials 1 / * Erdős on polynomials Vilmos Totik (SZTE and USF)
Bayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models
Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Discussion. Seppo Laaksonen 1. 1. Introduction
Journal of Official Statistics, Vol. 23, No. 4, 2007, pp. 467 475 Discussion Seppo Laaksonen 1 1. Introduction Bjørnstad s article is a welcome contribution to the discussion on multiple imputation (MI)
New SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
Modeling the Implied Volatility Surface. Jim Gatheral Stanford Financial Mathematics Seminar February 28, 2003
Modeling the Implied Volatility Surface Jim Gatheral Stanford Financial Mathematics Seminar February 28, 2003 This presentation represents only the personal opinions of the author and not those of Merrill
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
Comparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
arxiv:1206.6666v1 [stat.ap] 28 Jun 2012
The Annals of Applied Statistics 2012, Vol. 6, No. 2, 772 794 DOI: 10.1214/11-AOAS521 In the Public Domain arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 ANALYZING ESTABLISHMENT NONRESPONSE USING AN INTERPRETABLE
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Logit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
Risk Preferences and Demand Drivers of Extended Warranties
Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data
Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
individualdifferences
1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,
Stat 9100.3: Analysis of Complex Survey Data
Stat 9100.3: Analysis of Complex Survey Data 1 Logistics Instructor: Stas Kolenikov, [email protected] Class period: MWF 1-1:50pm Office hours: Middlebush 307A, Mon 1-2pm, Tue 1-2 pm, Thu 9-10am.
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Monte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
ARMA, GARCH and Related Option Pricing Method
ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of
Chapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
Tests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA
m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America Abstract Complex sample survey designs deviate from simple random sampling,
