Alternatives to logistic regression. Laura Rosella, PhD Scientist, Public Health Ontario
|
|
- Thomasina Bates
- 7 years ago
- Views:
Transcription
1 Alternatives to logistic regression Laura Rosella, PhD Scientist, Public Health Ontario
2 Acknowledgments Course: Categorical Data Analysis for Epidemiologic Studies (Course director: Laura Rosella, PhD) Dr. Marcelo Urquia, SMH 2
3 Objectives To understand the pros and cons of the logistic regression approach To discuss the appropriate use of logistic regression To identify alternatives to logistic regression and discuss their strengths and weaknesses To provide an example to walk-through the approaches Goal: Thoughtful use of logistic regression 3
4 Binomial Logistic Regression Model Binomial regression is based on the binomial distribution logit π y = ln π(y) 1 π(y)
5 Binomial Logistic Regression Model logit π y = ln π(y) 1 π(y) ODDS RATIO
6 Binomial Logistic Regression Model logit π y = ln π(y) 1 π(y) Logit (i.e. log-odds) function serves to bound outcome between and 1 LOGIT
7 ln π(y) 1 π(y) = α + βx Logistic regression is a linear model in the log-odds scale For x it is the linear increase in log-odds or the exponential increase in odds
8 Epi 101 Exposure Disease Present Disease Absent Present a b Absent c d Relative Risk (RR) = ( a a+b ) ( c c+d ) i.e. risk in exposed / risk in the unexposed Odds Ratio (OR) = (a b ) ( c d ad or ) bc i.e ratio of the odds of developing outcome in the exposed compared to the unexposed Consensus: relative risk is preferred over the odds ratio for most prospective investigations 8
9 The strengths of the logistic regression approach Logistic Regression can be applied to many different study designs (cohort, case-control, cross-sectional) The Odds Ratio (OR) provides a good approximation of the Relative Risk when the outcome is rare. Fairly easy to run using many different statistical software packages too easy? Multivariate
10 The problem with logistic regression The OR overestimates the Relative Risk when the outcome is common (rule of thumb > 10%) Despite advice on the rare event rate assumption consumers of health research literature often interpret the OR as a Relative Risk (RR), leading to its potential exaggeration Logistic regression became easy to use and very popular and there is a perception that alternative methods do not exist But there are easy and potentially more appropriate outcomes when you want to estimate relative risk
11 Example Relative Risk=2 at Prevalence among non-exposed=0.1, 0.2 and 0.3 Y=1 Y=0 Po 0.1 X= RR 2 X= OR Y=1 Y=0 Po 0.2 X= RR 2 X= OR Relative Risk=3 at Prevalence among non-exposed=0.1, 0.2 and 0.3 Y=1 Y=0 Po 0.1 X= RR 3 X= OR Y=1 Y=0 Po 0.2 X= RR 3 X= OR Y=1 Y=0 Po 0.3 X= RR 2 X= OR Y=1 Y=0 Po 0.3 X= RR 3 X= OR
12
13 Zhang & Yu s simple formula, JAMA 1998 Formula can be used to correct the adjusted OR derived from logistic regression to derive an treatment effect that better represents the true relative risk Zhang and Yu, 1998, JAMA
14 Limitations of Zhang and Yu s formula Trade-off between simplicity and precision Not very reliable in the presence of covariates produces Confidence Intervals narrower than they should be May slightly overestimate the RR when confounding exists Ignores covariance between the estimated incidence and estimated odds ratio SHOULD NOT BE USED ON AN ADJUSTED OR: Using the formula in this manner is incorrect and will produce a biased estimate when confounding is present
15 Other alternatives Log-Binomial regression Poisson regression (and Negative Binomial) Poisson with robust variance estimator (modified Poisson) Cox regression
16 Hypothetical working example WCGS cohort study; cohort of men in the 1960s followed up to study CVD risk factors Outcome: HBP (indicate if study participants have HBP at follow-up) Exposure: Obese Over = 1 if they were classified as obese at baseline, = 0 if not
17 proc freq data =talk; tables over*hbp/nopercent nocol relrisk; run; HBP at follow-up Total Obese Yes No Yes No Total The OR and RR for those who weren t classified as obese at baseline: OR = ( 49x2424)/(37x644) = 4.99 RR = (49/86)/(644/3068) = 2.71 HBP 22%
18 Logistic regression proc genmod data = talk descending; model hbp = over/ dist = binomial link = logit; estimate 'Beta' over 1-1/ exp; title1 Logistic Regression'; run; Contrast Estimate Results Estimate Confidence Limits Exp(Beta) proc logistic data = talk descending; model hbp = over; title1 'Logistic Regression'; run;
19 Log-Binomial Logistic Log binomial Logit: Log(P j /(1-P j ))=β o +β 1 X j Log: Log(P j )=β o +β 1 X j X=0 Log(P o /(1-P o ))= β o X=1 Log(P 1 /(1-P 1 ))=β o +β 1 X β 1 =Log(P 1 /(1-P 1 ))- Log(P o /(1-P o ))=Log(OR) X=0 Log(P o )= β o X=1 Log(P 1 )=β o +β 1 X β 1 =Log(P 1 )- Log(P o )=Log(RR) OR=e β1 RR=e β1
20 Log-binomial regression proc genmod data = talk descending; model hbp = over/ dist = binomial link = log; estimate 'Beta' over 1-1/ exp; title1 Log Binomial Regression'; run; Contrast Estimate Results Estimate Confidence Limits Exp(Beta)
21 Poisson Regression Model specifies the outcome log(rate) as a linear predictor of covariates Used when the outcomes of interest are rates (and rate ratios) Using a Poisson model without robust error variances will result in a confidence interval that is too wide (i.e. tends to overestimate the variance) 21
22 Poisson regression proc genmod data = talk descending; model hbp = over/ dist = poisson link = log; estimate 'Beta' over 1-1/ exp; title1 'Poisson Regression'; run; Contrast Estimate Results Estimate Confidence Limits Exp(Beta)
23 Poisson regression with robust variance (modified Poisson) proc genmod data = talk; class id; model hbp = over/ dist = poisson link = log; repeated subject = id/ type = unstr; estimate 'Beta' over 1-1/ exp; title1 'Poisson Regression Robust Variance'; run; Contrast Estimate Results Estimate Confidence Limits Exp(beta)
24 Cox regression data talk; set talk; time=1; run; proc phreg data=talk; model time*hbp(0)= over /rl; run; Analysis of Maximum Likelihood Estimates HazardRatio Confidence Limits hbp
25 Comparison (crude OR) Model Estimate (95% CI) Logistic regression OR: 4.99 (3.22, 7.71) Zhang and Yu s formula RR: 2.71 (2.20, 3.20) Log-binomial regression RR: 2.71 (2.23, 3.30) Poisson regression RR: 2.71 (2.03, 3.63) Poisson regression with robust variance RR: 2.71 (2.23, 3.39) Cox regression RR: 2.71 (2.03, 3.63)
26 Comparison (adjusted OR) McNutt et al, AJE 2003;157:
27 Pros and cons Alternative Pros Cons Zhang s and Yu formula Easy to use Ignores covariance, 10-15% bias in multivariable analyses. Underestimates CIs Log-binomial regression Natural approximation to binomial distribution Small standard error Poisson regression Poisson regression with robust variance (Modified Poisson) Cox regression Good approximation to binomial distribution when N is large Good approximation to binomial distribution when N is large Small standard error Good approximation to binomial distribution May result in convergence problems increase iterations or try modified Poisson Conservative CIs May estimate probabilities greater than 1 May estimate probabilities greater than 1 Does not estimate probabilities (no intercept)
28 What to do? If alternative regression methods are not feasible 1. Zhang and Yu s approximation (acknowledging the limitations) 2. Interpret OR as OR, not as RR If alternative regression methods are feasible 1. Log binomial regression 2. Modified Poisson regression (Robust variance) 3. Ordinary Poisson or Cox regression
29 Other consequences Etiologic fraction (EF). EF is the proportion of the cases that the exposure had played a causal role in its development EF = (I E I O )/I E, where I E =incidence in exposed and I O =incidence in non-exposed PAF = (I T I O )/I T, where I T =incidence in the population Also PAF = (P E *(RR-1))/(P E *(RR-1)+1), where P E =prevalence of the exposure in the population Ideally (i.e., in the absence of confounding, measurement error and ignorance), the sum of all EFs or PAFs is expected to be 1 (or 100%) Based on Risk, not odds! If OR are used instead of RR, EF and PAF may be inflated Use of OR may artefactually increase EF and PAFs
30 Why do we use odds-ratios in case-control studies?
31 Why do we use odds-ratios in case-control studies? Cohort Study Exposed Not Exposed (X) Disease Outcome (Y) In statistical terms Y is the random variable
32 Why do we use odds-ratios in case-control studies? Cohort Study Case Control Study Exposed Not Exposed (X) Disease Outcome (Y) Look back Disease Outcome (Y) In statistical terms Y is the random variable Exposed Not Exposed (X) In statistical terms X is the random variable
33 Why do we use odds-ratios in case-control studies? When sampling design is retrospective we can construct conditional distributions for the exposure (X) within the levels of the outcome variable We cannot estimate probabilities with this type of design... However the odds ratio can be computed the same way when it is defined as X given Y as it is for Y given X
34 Interpretations in case control versus cohort Interpretation of the regression coefficients (i.e. The log of the odds ratio) is identical In a case control study the intercept is not readily interpretable for epidemiology due to the nature of the sampling of the study Therefore the probability is also not directly interpretable
35 Thoughtful use of logistic regression In case control studies, it is an excellent choice because relative risk is not directly estimable In cohort or cross-sectional studies remember that: Odds Ratio is used as a surrogate of the relative risk (cohort) or prevalence rate ratio (cross-sectional) When the frequency of the outcome is high (e.g. > 10% or >20%) the odds ratio is biased (usually biased upwards) Consider alternative approaches and/or transformations of the odds ratio estimate
36 Further readings I Alternatives to logistic regression Zhang J, Yu KF. What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes JAMA Nov 18;280(19): Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol Aug 1;162(3): Epub 2005 Jun McNutt LA, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies and clinical trials of common outcomes.am J Epidemiol May 15;157(10): Zou G. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol Apr 1;159(7): UCLA Stat Computing > SAS > FAQ > How can I estimate relative risk in SAS using proc genmod for common outcomes in cohort studies?
37 About proper use of EF, PAF, etc. Further readings II Northridge ME. Public health methods--attributable risk as a link between causality and public health action. Am J Public Health Sep;85(9): Nice discussion about the interpretation and usefulness for public health Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health Jan;88(1): Presents appropriate formulae for unadjusted and adjusted RR, and for multicategory exposures
38 38
A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander Diaz-Quijano
1 A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander Diaz-Quijano Grupo Latinoamericano de Investigaciones Epidemiológicas, Organización Latinoamericana para el Fomento
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationCase-control studies. Alfredo Morabia
Case-control studies Alfredo Morabia Division d épidémiologie Clinique, Département de médecine communautaire, HUG Alfredo.Morabia@hcuge.ch www.epidemiologie.ch Outline Case-control study Relation to cohort
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationGuide to Biostatistics
MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading
More informationPrevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?
272 Occup Environ Med 1998;55:272 277 Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Mary Lou Thompson, J E Myers, D Kriebel Department of Biostatistics,
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationLesson 14 14 Outline Outline
Lesson 14 Confidence Intervals of Odds Ratio and Relative Risk Lesson 14 Outline Lesson 14 covers Confidence Interval of an Odds Ratio Review of Odds Ratio Sampling distribution of OR on natural log scale
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationStatistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationSTATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS
STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse
More informationCalculating the number needed to be exposed with adjustment for confounding variables in epidemiological studies
Journal of Clinical Epidemiology 55 (2002) 525 530 Calculating the number needed to be exposed with adjustment for confounding variables in epidemiological studies Ralf Bender*, Maria Blettner Department
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationAdvanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015
1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 Instructor: Joanne M. Garrett, PhD e-mail: joanne_garrett@med.unc.edu Class Notes: Copies of the class lecture slides
More informationDealing with Missing Data
Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
More informationA Population Based Risk Algorithm for the Development of Type 2 Diabetes: in the United States
A Population Based Risk Algorithm for the Development of Type 2 Diabetes: Validation of the Diabetes Population Risk Tool (DPoRT) in the United States Christopher Tait PhD Student Canadian Society for
More informationMissing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple
More informationModel Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationThe CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter
The CRM for ordinal and multivariate outcomes Elizabeth Garrett-Mayer, PhD Emily Van Meter Hollings Cancer Center Medical University of South Carolina Outline Part 1: Ordinal toxicity model Part 2: Efficacy
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationSAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates
SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates Martin Wolkewitz, Ralf Peter Vonberg, Hajo Grundmann, Jan Beyersmann, Petra Gastmeier,
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
More informationMultiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More informationRegression with a Binary Dependent Variable
Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,
More informationLogistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationBayes Theorem & Diagnostic Tests Screening Tests
Bayes heorem & Screening ests Bayes heorem & Diagnostic ests Screening ests Some Questions If you test positive for HIV, what is the probability that you have HIV? If you have a positive mammogram, what
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationDepartment/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department of Mathematics and Statistics Degree Level Expectations, Learning Outcomes, Indicators of
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationSP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
More informationAre you looking for the right interactions? Statistically testing for interaction effects with dichotomous outcome variables
Are you looking for the right interactions? Statistically testing for interaction effects with dichotomous outcome variables Updated 2-14-2012 for presentation to the Epi Methods group at Columbia Melanie
More informationStudy Design and Statistical Analysis
Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationImputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationMethods for Meta-analysis in Medical Research
Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationIS 30 THE MAGIC NUMBER? ISSUES IN SAMPLE SIZE ESTIMATION
Current Topic IS 30 THE MAGIC NUMBER? ISSUES IN SAMPLE SIZE ESTIMATION Sitanshu Sekhar Kar 1, Archana Ramalingam 2 1Assistant Professor; 2 Post- graduate, Department of Preventive and Social Medicine,
More informationAn Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims. Ashley Roberts. University of Cincinnati
Epidemiology Article Critique 1 Running head: Epidemiology Article Critique An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims Ashley Roberts University of Cincinnati
More informationLecture 1: Introduction to Epidemiology
Lecture 1: Introduction to Epidemiology Lecture 1: Introduction to Epidemiology Dankmar Böhning Department of Mathematics and Statistics University of Reading, UK Summer School in Cesme, May/June 2011
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCalculating Effect-Sizes
Calculating Effect-Sizes David B. Wilson, PhD George Mason University August 2011 The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationCertified in Public Health (CPH) Exam CONTENT OUTLINE
NATIONAL BOARD OF PUBLIC HEALTH EXAMINERS Certified in Public Health (CPH) Exam CONTENT OUTLINE April 2014 INTRODUCTION This document was prepared by the National Board of Public Health Examiners for the
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
More informationdata visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationStrategies for Identifying Students at Risk for USMLE Step 1 Failure
Vol. 42, No. 2 105 Medical Student Education Strategies for Identifying Students at Risk for USMLE Step 1 Failure Jira Coumarbatch, MD; Leah Robinson, EdS; Ronald Thomas, PhD; Patrick D. Bridge, PhD Background
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationComparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models
Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Prepared by Jim Gaetjens Presented to the Institute of Actuaries of Australia
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationIII. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis
III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as
More informationSample Size Planning, Calculation, and Justification
Sample Size Planning, Calculation, and Justification Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa
More informationConfounding in health research
Confounding in health research Part 1: Definition and conceptual issues Madhukar Pai, MD, PhD Assistant Professor of Epidemiology McGill University madhukar.pai@mcgill.ca 1 Why is confounding so important
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared
More informationPaper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI
Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationDiabetes Prevention in Latinos
Diabetes Prevention in Latinos Matthew O Brien, MD, MSc Assistant Professor of Medicine and Public Health Northwestern Feinberg School of Medicine Institute for Public Health and Medicine October 17, 2013
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationUse of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationThe Cross-Sectional Study:
The Cross-Sectional Study: Investigating Prevalence and Association Ronald A. Thisted Departments of Health Studies and Statistics The University of Chicago CRTP Track I Seminar, Autumn, 2006 Lecture Objectives
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationSAMPLE SIZE TABLES FOR LOGISTIC REGRESSION
STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationRandomized trials versus observational studies
Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James
More informationP (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)
22S:101 Biostatistics: J. Huang 1 Bayes Theorem For two events A and B, if we know the conditional probability P (B A) and the probability P (A), then the Bayes theorem tells that we can compute the conditional
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More information