Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza


 Stanley Price
 2 years ago
 Views:
Transcription
1 Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
2 The problem Often in official statistics we have large data sets with many variables and many missing data. However we simply cannot delete incomplete records because this amounts to a substantial loss of costly collected data. In some cases the loss is completely at random (MCAR), i.e. the presence of missing values is unrelated to the values of the variables. In a real MCAR situation almost all methods, managing missing data, work fine
3 Missing data: Selective loss A more realistic hypothesis is to assume that the missing data are missing at random (MAR), that is, the probability that an observation is missing may depend on observed data but not on other missing values or nonobserved variables In some cases we cannot assume MAR, we have the Missing Not at Random mechanism (MNAR)
4 Single and Multiple Imputation Single imputation: we create one «completed dataset» This is the obvious choice if you have to distribute the dataset Multiple imputation: we create M «completed dataset» If we want to estimate a model, single imputations cannot reflect the uncertainty for the predictions of the unknown missing values and consequently the variances of the parameter estimates will be biased downward.
5 Recommendations from Eurostat In relation to the imputation: 1. The procedure applied to the data should preserve variation of and correlation between variables. Methods that incorporate error components into the imputed values shall be preferable to those that simply impute a predicted value. 2. Methods which take into account the correlation structure (or other characteristics of the joint distribution of the variables) shall be preferable to the marginal or univariate approach.
6 Properties of a good imputation method Be general enough to handle general nonmonotone patterns of missing data and mixed variable types preserve associations between variables having missing values preserve marginal distributions (means, variances and shape) Moreover, the inference procedures on these data should take account of uncertainty due to imputation Monotone missing NONMonotone missing
7 Multiple Imputation (Rubin 1987) This method obtains valid statistical inferences, that properly reflect the uncertainty due to missing values, for parameters tests and confidence intervals. Multiple imputation inference involves three distinct phases: The missing data are filled in m times to generate m complete data sets. The m complete data sets are analyzed by using standard procedures. The results from the m complete data sets are combined for the inference using rules that combine withinimputation and betweenimputation variability.
8 Problems with Multiple Imputation Absence of a complete datamatrix, which is convenient to have in many cases. Difficulties in analysing a large number of variables Difficulties in analysing mixed measurement level data Multinormality may be nonrealistic MI cannot consider constraints on the imputations MI cannot consider bounds or complex survey designs
9 IVEware and MICE Sequential Regression for Multiple Imputations, (Raghunathan et al. 2001) is implemented by IVEware software. A similar approach is used by MICE (Van Buuren et al., 2000) Multivariate Imputation by Chained Equations. They require specifying a conditional distribution for the missing data in each incomplete variable, under the assumption that a corresponding multivariate distribution exists It iterates over all conditionally specified imputation models. Advantages with respect to MI: the univariate problems are simpler than multivariate ones and it is possible to consider mixed measurement variables, bounds, constraints between variables, interactions.
10 Hotdeck imputation Hotdeck is an imputation method where a pool of donors is defined for each recipient, and a donor is drawn from the pool at random. The pool is defined so that it contains the subjects who are similar to the recipient. Benefits of hotdeck imputation: 1) imputations tend to be realistic since they are based on values observed elsewhere; 2) imputations will not be outside the range of possible values; 3) it is not necessary to define an explicit model for the distribution of the missing values; 4) It can analyze mixed measurement level variables Because of the simplicity of the hotdeck approach and these desirable properties, it is a popular method of imputation, especially in large sample survey settings where there is a large pool of donors.
11 Hotdeck imputation: Weakness Definition of a distance/dissimilarity between the units The definition is very difficult with many variables (curse of dimensionality) and mixed measurement level Relationships among the variables To maintain multivariate relationships, the donors should assign to the receiver all the missing variables. Some relationships can be distorted.
12 Predictive mean matching (PMM) It is a hotdeck imputation method where we try to overcome the difficulty to define a distance measure. Complete values Y obs are regressed on the set of observed variables, say X. Predicted values are calculated for all Y. Finally, Y mis values are imputed using Y obs values whose predicted values are similar. Bootstrap or Approximate Bayesian bootstrap are methods for incorporating parameter uncertainty into hotdeck imputation models
13 MIDAS (Siddique & Belin 2008) It is a multiple imputation using distance aided selection of donors which implements an iterative predictive mean matching hotdeck for imputing missing data. It can handle continuous and categorical data.
14 Imputation by Decision Trees Decision trees split the sample into more homogeneous subsamples and the variables analysed can be categorical or quantitative. We don t need to define a distance but the units in a leaf can be assumed very close, especially if we don t prune the tree. Moreover, the leafs are expression of the relationships between the target and the predictors. We can use this property to apply a hot deck approach without its weak points!!! Decision trees were proposed by many authors based on prediction. Di Ciaccio (2008), Burgette & Reiter (2010) proposed a Multiple Imputation via Sequential Regression Trees.
15 Algorithm MultiTree for single/multiple imputation 1. For all variables: initialize missing data by random Hot deck 2. Iterate (j=1 to num. of variables) 3. Set the variable j as the target variable 4. Select cases which do not have missing value for variable j 5. Estimate a big decision tree without pruning 6. For each missing value determine the corresponding leaf 7. Estimate missing values of variable j by random hot deck in the leafs 8. Update missing values of variable j. Go to next variable (step 3) 9. If d <0.001 or iterations> T then STOP else go to step (2) To introduce Multiple imputation, as step 0, we can select several bootstrap samples and carry out the analysis for each sample.
16 Simulation (single imputation) 5000 units 8 variables: X1X6 quantitative; A, B categorical A\B b1 b2 b3 b4 tot a (300) 1000 (300) (400) 3000 a tot ( cluster 1) A=a1, B=b1 or b2, X1X3 generated by Uniform(50;100), X4X6 linear combination of X1X3 ( cluster 2) (A=a2, B=b3 or b4) or (A=a1, B=b4) X1X6 generated by a MultiNormal with covariance matrix given by E(X i X j ) = 2 ρ ij with ρ = 0.5 e =30. We inserted missing values randomly in the variables X1X3 and B. In the table we show the number of missing values for each combination of the categorical variables (in red). Moreover, 1000 observations of B were set to missing (MNAR).
17 Results for the categorical variable B Distribution Categorical variable B Prediction
18 Results for the variable X1 Mean and standard deviation True Mean imputed by Multitree imputed by IVEWARE a1 b a2 b a1 b True std imputed by Multitree imputed by IVEWARE a1 b a2 b a1 b Distribution of X1 given A=a1 & B=b1
19 Comparison of Results: correlations True correlations, given A=a1 & B=b1 X2 X3 X4 X5 X6 X A1B1 X X X X X6 1 IVEWARE correlations, given A=a1 & B=b1 X2 X3 X4 X5 X6 X X X X X X6 1 Multitree correlations, given A=a1 & B=b1 X2 X3 X4 X5 X6 X X X X X X6 1
Missing Data Dr Eleni Matechou
1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VAaffiliated statisticians;
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationMissing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationData fusion with international large scale assessments: a case study using the OECD PISA and TALIS surveys
Kaplan and McCarty Largescale Assessments in Education 2013, 1:6 RESEARCH Open Access Data fusion with international large scale assessments: a case study using the OECD PISA and TALIS surveys David Kaplan
More informationDealing with Missing Data
Dealing with Missing Data Roch Giorgi email: roch.giorgi@univamu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 113 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationItem Imputation Without Specifying Scale Structure
Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete
More informationImputing Missing Data using SAS
ABSTRACT Paper 32952015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationDealing with Missing Data
Res. Lett. Inf. Math. Sci. (2002) 3, 153160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904
More informationStatistical matching: Experimental results and future research questions
Statistical matching: Experimental results and future research questions 2015 19 Ton de Waal Content 1. Introduction 4 2. Methods for statistical matching 5 2.1 Introduction to statistical matching 5 2.2
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationStatistical Analysis with Missing Data
Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationHandling attrition and nonresponse in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 6372 Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationMissing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center
Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationMissing Data. Katyn & Elena
Missing Data Katyn & Elena What to do with Missing Data Standard is complete case analysis/listwise dele;on ie. Delete cases with missing data so only complete cases are le> Two other popular op;ons: Mul;ple
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 2729 July 2015 Missing data and net survival analysis Bernard Rachet General context Populationbased,
More informationDealing with missing data: Key assumptions and methods for applied analysis
Technical Report No. 4 May 6, 2013 Dealing with missing data: Key assumptions and methods for applied analysis Marina SoleyBori msoley@bu.edu This paper was published in fulfillment of the requirements
More informationWorkpackage 11 Imputation and NonResponse. Deliverable 11.2
Workpackage 11 Imputation and NonResponse Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University
More informationAnalyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationMissing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sageereference.com/survey/article_n298.html] Missing Data An important indicator
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène JacqminGadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & Oneway
More information2. Making example missingvalue datasets: MCAR, MAR, and MNAR
Lecture 20 1. Types of missing values 2. Making example missingvalue datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data
More informationTitle: Categorical Data Imputation Using NonParametric or SemiParametric Imputation Methods
Masters by Coursework and Research Report Mathematical Statistics School of Statistics and Actuarial Science Title: Categorical Data Imputation Using NonParametric or SemiParametric Imputation Methods
More informationIBM SPSS Missing Values 20
IBM SPSS Missing Values 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 20 and to all
More informationComparison of Imputation Methods in the Survey of Income and Program Participation
Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author
More informationA THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA
A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects
More informationStatistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation
Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk
More informationA REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA
123 Kwantitatieve Methoden (1999), 62, 123138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake
More informationAnalysis of Longitudinal Data with Missing Values.
Analysis of Longitudinal Data with Missing Values. Methods and Applications in Medical Statistics. Ingrid Garli Dragset Master of Science in Physics and Mathematics Submission date: June 2009 Supervisor:
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationAn introduction to modern missing data analyses
Journal of School Psychology 48 (2010) 5 37 An introduction to modern missing data analyses Amanda N. Baraldi, Craig K. Enders Arizona State University, United States Received 19 October 2009; accepted
More informationSoftware Cost Estimation with Incomplete Data
890 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 10, OCTOBER 2001 Software Cost Estimation with Incomplete Data Kevin Strike, Khaled El Emam, and Nazim Madhavji AbstractÐThe construction of
More informationMISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)
MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,
More informationImputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%70% of the data points will have at least one missing attribute  data wastage if we ignore all records with a missing value Remaining data
More informationCHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA
CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA Hatice UENAL Institute of Epidemiology and Medical Biometry, Ulm University, Germany
More informationAnalysis of Incomplete Survey Data Multiple Imputation via Bayesian Bootstrap Predictive Mean Matching
Analysis of Incomplete Survey Data Multiple Imputation via Bayesian Bootstrap Predictive Mean Matching Dissertation zur Erlangung des akademischen Grades eines Doktors der Sozial und Wirtschaftswissenschaften
More informationarxiv:1301.2490v1 [stat.ap] 11 Jan 2013
The Annals of Applied Statistics 2012, Vol. 6, No. 4, 1814 1837 DOI: 10.1214/12AOAS555 c Institute of Mathematical Statistics, 2012 arxiv:1301.2490v1 [stat.ap] 11 Jan 2013 ADDRESSING MISSING DATA MECHANISM
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationGraduate Programs in Statistics
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationData Cleaning and Missing Data Analysis
Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS
More informationImputation of missing data under missing not at random assumption & sensitivity analysis
Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,
More informationA Review of Missing Data Treatment Methods
A Review of Missing Data Treatment Methods Liu Peng, Lei Lei Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, 200433, P.R. China ABSTRACT Missing data is a common
More informationAn Alternative Route to Performance Hypothesis Testing
EDHECRisk Institute 393400 promenade des Anglais 06202 Nice Cedex 3 Tel.: +33 (0)4 93 18 32 53 Email: research@edhecrisk.com Web: www.edhecrisk.com An Alternative Route to Performance Hypothesis Testing
More informationHow to choose an analysis to handle missing data in longitudinal observational studies
How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationHealth 2011 Survey: An overview of the design, missing data and statistical analyses examples
Health 2011 Survey: An overview of the design, missing data and statistical analyses examples Tommi Härkänen Department of Health, Functional Capacity and Welfare The National Institute for Health and
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationHCUP Methods Series Missing Data Methods for the NIS and the SID Report # 201501
HCUP Methods Series Contact Information: Healthcare Cost and Utilization Project (HCUP) Agency for Healthcare Research and Quality 540 Gaither Road Rockville, MD 20850 http://www.hcupus.ahrq.gov For Technical
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationMEASURING DISCLOSURE RISK AND AN EXAMINATION OF THE POSSIBILITIES OF USING SYNTHETIC DATA IN THE INDIVIDUAL INCOME TAX RETURN PUBLIC USE FILE
MEASURING DISCLOSURE RISK AND AN EXAMINATION OF THE POSSIBILITIES OF USING SYNTHETIC DATA IN THE INDIVIDUAL INCOME TAX RETURN PUBLIC USE FILE Sonya Vartivarian and John L. Czajka,ÃMathematica Policy Research,
More informationImputation of missing network data: Some simple procedures
Imputation of missing network data: Some simple procedures Mark Huisman Dept. of Psychology University of Groningen Abstract Analysis of social network data is often hampered by nonresponse and missing
More informationSensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS2702014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
More informationStatistical matching: a model based approach for data integration
ISSN 19770375 Methodologies and Working papers Statistical matching: a model based approach for data integration 2013 edition Methodologies and Working papers Statistical matching: a model based approach
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationMissing Data. Paul D. Allison INTRODUCTION
4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationVisualization of missing values using the Rpackage VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 80/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the Rpackage VIM M. Templ and P.
More informationMultiply imputing missing values in data sets with. generalised linear models
Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models Min Lee Robin Mitra School of Mathematics University of Southampton, Southampton,
More informationIn part 1 of this series, we provide a conceptual overview
Advanced Statistics: Missing Data in Clinical Research Part 2: Multiple Imputation Craig D. Newgard, MD, MPH, Jason S. Haukoos, MD, MS Abstract In part 1 of this series, the authors describe the importance
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationStep 5: Conduct Analysis. The CCA Algorithm
Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Logtransformed species abundances P Rownormalized species log abundances (chord distance) P Selected
More informationNonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten Missing Data Treatments
Brockmeier, Kromrey, & Hogarty Nonrandomly Missing Data in Multiple Regression Analysis: An Empirical Comparison of Ten s Lantry L. Brockmeier Jeffrey D. Kromrey Kristine Y. Hogarty Florida A & M University
More informationOn Treatment of the Multivariate Missing Data
On Treatment of the Multivariate Missing Data Peter J. Foster, Ahmed M. Mami & Ali M. Bala First version: 3 September 009 Research Report No. 3, 009, Probability and Statistics Group School of Mathematics,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationA General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More informationQuantitative Methods Workshop. Graphical Methods for Investigating Missing Data
Quantitative Methods Workshop Graphical Methods for Investigating Missing Data Graeme Hutcheson School of Education University of Manchester missing data data imputation missing data Data sets with missing
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationSilvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spsssa.com
SPSSSA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spsssa.com SPSSSA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationA Split Questionnaire Survey Design applied to German Media and Consumer Surveys
A Split Questionnaire Survey Design applied to German Media and Consumer Surveys Susanne Rässler, Florian Koller, Christine Mäenpää Lehrstuhl für Statistik und Ökonometrie Universität ErlangenNürnberg
More informationChallenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Dropout
Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Dropout Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 TwoWay ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationSome Practical Issues Related to the Integration of Data from Sample Surveys
ANALYSES Some Practical Issues Related to the Integration of Data from Sample Surveys Wojciech Roszka 1 Poznań University of Economics, Poznań, Poland Abstract The users of official statistics data expect
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationMultivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A WileyInterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
More information