A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander DiazQuijano


 George Sparks
 2 years ago
 Views:
Transcription
1 1 A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander DiazQuijano Grupo Latinoamericano de Investigaciones Epidemiológicas, Organización Latinoamericana para el Fomento de la Investigación en Salud (OLFIS), Bucaramanga, Colombia. Address correspondence to: Fredi Alexander Díaz Quijano, Organización Latinoamericana para el Fomento de la Investigación en Salud (OLFIS). Av. Búcaros No Laureles C 60. Bucaramanga, Colombia. Telephone: (57) Fax: (57)
2 2 Abstract Background: Odds ratios (OR) significantly overestimate associations between risk factors and common outcomes. The estimation of relative risks (RR) or prevalence ratios (PR) has represented a statistical challenge in multivariate analysis and, furthermore, some researchers do not have access to the available methods. Objective: To propose and evaluate a new method for estimating RR and PR by logistic regression. Methods: A provisional database was designed in which events were duplicated but identified as nonevents. After, a logistic regression was performed and effect measures were calculated, which were considered RR estimations. This method was compared with binomial regression, Cox regression with robust variance and ordinary logistic regression in analyses with three outcomes of different frequencies. Results: ORs estimated by ordinary logistic regression progressively overestimated RRs as the outcome frequency increased. RRs estimated by Cox regression and the method proposed in this article were similar to those estimated by binomial regression for every outcome. However, confidence intervals were wider with the proposed method. Conclusion: This simple tool could be useful for calculating the effect of risk factors and the impact of health interventions in developing countries when other statistical strategies are not available.
3 3 Keywords: Logistic regression. Odds ratio. Prevalence ratio. Relative risk. Abbreviations: CI: Confidence interval. OR: Odds ratio. PR: Prevalence ratio. RR: Relative risk. SE: Standard Error.
4 4 BACKGROUND The odds ratio (OR) is commonly used to assess associations between exposure and outcome and can be estimated by logistic regression, which is widely available in statistics software. OR has been considered an approximation to the prevalence ratio (PR) in crosssectional studies or the risk ratio (RR, which is mathematically equivalent to PR) in cohort studies or clinical trials. This is acceptable when the outcome is relatively rare (<10%). However, since many health outcomes are common, the interpretation of OR as RR is questionable because OR overstates RR, sometimes dramatically [13]. Moreover, the OR has been considered an unintelligible effect measure in some contexts [3]. Binomial regression has been recommended for the estimation of RRs (and PRs) in multivariate analysis [4]. However, sometimes this statistical method cannot estimate RR because convergence problems are frequent. Therefore, the Cox regression with robust variance has been recommended as a suitable method for estimating RRs [5,6]. However, these statistical methods (binomial and Cox regression) are not widely available in freeware (such as Epidat or EpiInfo). Therefore, the ability to estimate PRs and RRs in multivariate models could be limited in research groups with scant resources. In this article, a strategy for estimating RRs with ordinary logistic regression is proposed. This new method could be useful for identifying risk factors and estimating the impact of health interventions in developing countries.
5 5 METHODS Database A database of 1000 observations with dichotomous variables was created to simulate a cohort study in which a common event (incidence of 50%) would be strongly related to two independent predictors (A and B). These predictors would also be statistically associated with one another, resulting in a moderate confounding effect. Then, a third independent variable with a prevalence of 40% was included (predictor C). This variable was randomly distributed, but more often in positive than negative predictor A group. Thus, this variable was statistically associated with the outcome in a univariate analysis but the association would be explained by the presence of predictor A in a multivariate model. Finally, additional dependent variables were generated by randomly selecting a proportion of cases. Thus, outcome variables with frequencies of 20% and 5% were obtained. The first table shows the hypothetical distribution of subjects according to the predictors and outcomes (Table 1). Statistical analysis: Statistical analysis was performed using STATA software (STATA /IC 11.0). RRs and 95% confidence intervals (CI) were estimated by applying logbinomial regression and Cox regression with a constant in the time variable [6]. In order to obtain corrected CIs by Cox regression, the robust variance option was applied [7]. ORs and their correspondent CIs
6 6 were also estimated using an ordinary logistic regression. After univariate estimations were calculated, ORs and RRs were obtained in multivariate models including all independent variables (predictors A, B and C). Proposed modification to logistic regression analysis: The logbinomial model is similar to logistic regression in assuming a binomial distribution of the outcome. However, in a logistic regression the link function is the logarithm of the odds, which is the ratio between cases and noncases, while in binomial regression the link function is the logarithm of the proportion, i.e., the ratio between cases and cases plus noncases [4]. In a binomial regression model with k covariates, the function is written as: Log [a/(a+b)] = β 0 +β 1 X β k X k where a is the number of cases and b is the number of noncases (e. g., the proportion of sick persons in a group), and X i the covariates. Thus, a/(a+b) is the probability of success and the RR (or PR) estimated of a given covariate X i is e βi. On the other hand, in a logistic regression model, the function is written as: Log (a/b) = β 0 +β 1 X β k X k where a/b is the odds of success and the OR estimated of a given covariate X i is e βi.
7 7 In order for the case information to be included in the denominator of the estimates in a logistic regression, all observed cases were duplicated in a provisional database and identified as noncases. Thus, a number of observations was included equaling that of the cases and containing the same information about the covariates. Thus, this new logistic function could be written as: Log [a/(y)] = β 0 +β 1 X β k X k where y includes noncases as well as cases, although all of them are identified as noncases. Afterwards, a logistic regression procedure was performed with the modified dataset. The ORs obtained were considered direct estimations of RRs because β i defined the relationship between X i and the Log [a/(y)], which in this model would be mathematically similar to Log [a/(a+b)] of the logbinomial model. For each outcome, a provisional database was prepared. This strategy for logistic regression recognizes an entire cohort as controls. This trick is innovative but analogous to the analysis of casecohort studies. In that design, cases of a particular outcome are compared with a sample (subcohort) of the entire cohort that gave rise to all cases [8]. The objective of selecting this subcohort is to estimate the frequency of exposure in the entire cohort. For this reason, such studies have also been called caseexposure studies [9].
8 8 This subcohort may include some cases, which would consequently be overrepresented in the analysis. Then, by comparing the frequency of exposure between the cases and the subcohort set, we obtain a direct estimate of RR (not OR) [911]. Similarly, in the method proposed here, the cases would be compared against the entire cohort and thus all cases would be overrepresented. This affects the variance of the estimates and for this reason the CIs are wider [11]. Therefore, an inflation factor for the Standard Error (SE) of each predictor and outcome incidence was calculated as the ratio between SE obtained with the proposed method and SE resulting from binomial regression (as reference method).
9 9 RESULTS For the rarer event (incidence of 5%), RRs estimated by logbinomial were similar to those calculated both by the Cox regressions and the proposed method (modified logistic regression) (Table 1). Few differences were identified among the CIs of RRs: CIs from the modified method were wider than those estimated by logbinomial and Cox regression with the robust variance option. ORs estimated by ordinary logistic regression were close to RR values. Predictors A and B were statistically associated with the outcome in univariate analysis but only A was independently associated in the multivariate model (Table 1). For the second and third outcomes, with incidences of 20% and 50% respectively, the differences between RRs in logbinomial regression and ORs in ordinary logistic regression were more evident (Tables 2 and 3). This was especially remarkable for the commonest event, where the ORs of predictors A and B were at least twice the corresponding RR values (Table 3). On the other hand, RRs estimated in Cox regressions and modified logistic regression were similar or virtually identical to those estimated by logbinomial regression. However, the CIs outputted by the proposed method were wider than those obtained by the other models (Tables 2 and 3). Consequently, the SE inflation factor rose for each predictor as the outcome frequency increased (Figure 1).
10 10 DISCUSSION The use of an adjusted odds ratio to estimate an adjusted relative risk or prevalence ratio is appropriate for studies of rare outcome but may be misleading when the outcome is common. Such overestimation may inappropriately affect clinical decisionmaking or policy development [3]. For example, overestimation of the importance of a risk factor may lead to unintentional errors in the economical analysis of potential intervention programs or treatment, which could be particularly harmful in developing countries. The ordinary logistic model estimates OR (not RR) and was initially adapted for casecontrol studies since data from this type of study design can only determine OR [12]. Moreover, a casecontrol study is an optimal choice for analyzing rareevent risk factors, for which OR is a close approximation of RR. Thus, ordinary logistic regression is eminently useful for casecontrol studies mainly because the numeric value of OR mimics RR [12]. On the other hand, RR and PR can be directly determined from data based on cohort and crosssectional studies, respectively, which are practical only for relatively common outcomes. However, in such circumstances OR estimated by ordinary logistic regression will be more discrepant than RR (or PR). This was exemplified in the results of this paper in that ORs progressively overestimated RRs as the outcome frequency increased. Indeed, OR will always be greater than RR if RR is greater than 1 (adverse event) and OR will also be less than RR if RR less than 1 (protective effect). Therefore, the uncritical application of logistic regression and the misinterpretation of OR as RR can lead to serious
11 11 errors in determination of both the importance of risk factors and the impact of interventions on clinical practice and public health [13]. For these reasons, several strategies for estimating RRs in multivariate analysis have been proposed [7,1416]. Binomial regression is considered the most adequate choice. However, binomial models often predict probabilities greater than one and sometimes this regression cannot find possible values and converge in a model. Consequently, other alternative methods have been proposed when binomial regression cannot converge in a model. Cox regression with robust variance using a constant in the time variable seems like a good alternative [7]. However, these options and other statistical alternatives are only available in sophisticated software that some research groups cannot afford. This paper presents a strategy for logistic regression that recognizes an entire cohort as controls. As the results show, this method can appropriately estimate RRs or PRs, even in analyses with common outcomes. Moreover, the method proposed in this article could be easily performed using free statistics programs that include only logistic regression for multivariate analysis of dichotomous outcomes. However, the proposed method is associated with SE inflation, which increases confidence intervals. A simple and practical correction factor cannot be established for this problem because, in a multivariate regression, the standard error for each predictor depends on its correlation with all variables included in the model. Therefore, since the obtained CIs can be wider than those estimated by other models, investigators must be aware that the risk of Type II error could be higher. For this reason,
12 12 when an association is not statistically significant with the proposed method, ordinary logistic regression could be used for testing the hypothesis that association measure is different than unity. This is possible since the null hypothesis is mathematically equivalent for both OR and RR, because when RR is equal to 1, OR is also equal to 1. In conclusion, the proposed method may be useful for estimating RRs or PRs appropriately in analysis of common outcomes. However, because the resultant CIs are wider than those derived from other methods, this strategy should be employed when logistic regression is the only method available. This new method may help research groups from developing countries where access to sophisticated programs is limited. Competing Interests: None declared.
13 13 REFERENCES 1. McNutt LA, Wu C, Xue X, et al. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol 2003, 157: Zhang J, Yu KF. What's the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998, 280: Pearce N. Effect measure in prevalence studies. Environ Health Perspect 2004, 112: Wacholder S. Binomial regression in GLIM: estimating risk ratios and risk differences. Am J Epidemiol 1986, 123: Nijem K, Kristensen P, AlKhatib A, et al. Application of different statistical methods to estimate risk for selfreported health complaints among shoe factory workers exposed to organic solvents and plastic compounds. Norsk Epidemiologi 2005; 15: Lee J, Chia KS. Estimation of prevalence rate ratios for cross sectional data: an example in occupational epidemiology. Br J Ind Med 1993, 50: Barros AJD, Hirakata VN. Alternatives for logistic regression in crosssectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol 2003; 3: Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K. Casecohort design in practice  experiences from the MORGAM Project. Epidemiol Perspect Innov 2007, 4:15.
14 14 9. Flanders WD. Limitations of the caseexposure study. Epidemiology 1990, 1: Sato T. Estimation of a common risk ratio in stratified casecohort studies. Stat Med 1992, 11: Sato T. Risk ratio estimation in casecohort studies. Environ Health Perspect 1994, 102 Suppl 8: Lee J, Tan CS, Chia KS. A practical guide for multivariate analysis of dichotomous outcomes. Ann Acad Med Singapore 2009, 38: Schwartz LM, Woloshin S, Welch HG. Misunderstandings about the effects of race and sex on physicians referrals for cardiac catheterization. N Engl J Med 1999, 341: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. J Clin Epidemiol 2007, 60: Thompson ML, Myers JE, Kriebel D. Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Occup Environ Med 1998, 55: Coutinho LM, Scazufca M, Menezes PR. Methods for estimating prevalence ratios in crosssectional studies. Rev Saude Publica 2008, 42:9928.
15 15
16 Table 1. Hypothetical distribution of subjects according to the predictors and outcome incidence. Independent variable High incidence (50%) Intermediate incidence Low incidence (5%) (20%) Cases (n=500) Noncases (n=500) Cases (n=200) Noncases (n=800) Cases (n=50) Noncases (n=950) Total (n=1000) Predictor A positive negative Predictor B positive negative Predictor C positive negative
17 Table 2. RRs and ORs and corresponding CIs of associations between a rare event (incidence= 5%) and three independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox regression with robust variance and logistic regression with the proposed modification. Independent variable Logbinomial regression: RR (CI) Logistic regression: OR (CI) Cox regression  robust: RR (CI) Modified Logistic regression: RR (CI) Predictor A Unadjusted 6 ( ) 6.41 ( ) 6 ( ) 6 ( ) Adjusted * 4.96 ( ) 5.26 ( ) 4.97 ( ) 4.99 ( ) Predictor B Unadjusted 2.57 ( ) 2.69 ( ) 2.57 ( ) 2.57 ( ) Adjusted * 1.59 ( ) 1.64 ( ) 1.59 ( ) 1.59 ( ) Predictor C Unadjusted 1.28 ( ) 1.29 ( ) 1.28 ( ) 1.28 ( ) Adjusted * 0.98 ( ) 0.97 ( ) 0.97 ( ) 0.96 ( ) * Adjusted by the other independent variables. 17
18 Table 3. RRs and ORs and corresponding CIs of associations between an intermediate frequency event (incidence= 20%) and three independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox regression with robust variance and logistic regression with the proposed modification. Independent variable Logbinomial regression: RR (CI) Logistic regression: OR (CI) Cox regression  robust: RR (CI) Modified Logistic regression: RR (CI) Predictor A Unadjusted 2.75 ( ) 3.39 ( ) 2.75 ( ) 2.75 ( ) Adjusted * 1.79 ( ) 2.06 ( ) 1.77 ( ) 1.75 ( ) Predictor B Unadjusted 3.88 ( ) 5.22 ( ) 3.88 ( ) 3.88 ( ) Adjusted * 3.15 ( ) 4.07 ( ) 3.15 ( ) 3.15 ( ) Predictor C Unadjusted 1.09 ( ) 1.11 ( ) 1.09 ( ) 1.09 ( ) Adjusted * 0.92 ( ) 0.89 ( ) 0.92 ( ) 0.93 ( ) * Adjusted by the other independent variables. 18
19 Table 4. RRs and ORs and corresponding CIs of associations between a common event (incidence= 50%) and three independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox regression with robust variance and logistic regression with the proposed modification. Independent variable Logbinomial regression: RR (CI) Logistic regression: OR (CI) Cox regression  robust: RR (CI) Modified Logistic regression: RR (CI) Predictor A Unadjusted 3 ( ) 7.27 ( ) 3 ( ) 3 ( ) Adjusted * 1.9 ( ) 4.07 ( ) 1.89 ( ) 1.88 ( ) Predictor B Unadjusted 3.9 ( ) ( ) 3.9 ( ) 3.9 ( ) Adjusted * 3.08 ( ) ( ) 3.09 ( ) 3.09 ( ) Predictor C Unadjusted 1.25 ( ) 1.57 ( ) 1.25 ( ) 1.25 (1 1.55) Adjusted * 1.02 ( ) 1.12 ( ) 1.05 ( ) 1.06 ( ) * Adjusted by the other independent variables. 19
20
21 Figure 1
Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?
272 Occup Environ Med 1998;55:272 277 Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Mary Lou Thompson, J E Myers, D Kriebel Department of Biostatistics,
More informationA Practical Guide for Multivariate Analysis of Dichotomous Outcomes
714 Multivariate Analysis for Dichotomous Outcomes James Lee et al Risk Adjustment for Comparing Health Outcomes Yew Yoong Ding Review Article A Practical Guide for Multivariate Analysis of Dichotomous
More informationAdvanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015
1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 Instructor: Joanne M. Garrett, PhD email: joanne_garrett@med.unc.edu Class Notes: Copies of the class lecture slides
More informationSize of a study. Chapter 15
Size of a study 15.1 Introduction It is important to ensure at the design stage that the proposed number of subjects to be recruited into any study will be appropriate to answer the main objective(s) of
More informationGuide to Biostatistics
MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading
More informationDealing with confounding in the analysis
Chapter 14 Dealing with confounding in the analysis In the previous chapter we discussed briefly how confounding could be dealt with at both the design stage of a study and during the analysis of the results.
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationCalculating the number needed to be exposed with adjustment for confounding variables in epidemiological studies
Journal of Clinical Epidemiology 55 (2002) 525 530 Calculating the number needed to be exposed with adjustment for confounding variables in epidemiological studies Ralf Bender*, Maria Blettner Department
More informationData Analysis, Research Study Design and the IRB
Minding the pvalues p and Quartiles: Data Analysis, Research Study Design and the IRB Don AllensworthDavies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB
More informationProper Estimation of Relative Risk Using PROC GENMOD in Population Studies Kechen Zhao, University of Southern California, Los Angeles, California
Proper Estimation of Relative Risk Using PROC GENMOD in Population Studies Kechen Zhao, University of Southern California, Los Angeles, California ABSTRACT Relative risk (RR) is usually the parameter of
More informationToday s lecture. Lecture 6: Dichotomous Variables & Chisquare tests. Proportions and 2 2 tables. How do we compare these proportions
Today s lecture Lecture 6: Dichotomous Variables & Chisquare tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios
More informationDiabetes Prevention in Latinos
Diabetes Prevention in Latinos Matthew O Brien, MD, MSc Assistant Professor of Medicine and Public Health Northwestern Feinberg School of Medicine Institute for Public Health and Medicine October 17, 2013
More informationDepartment/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department of Mathematics and Statistics Degree Level Expectations, Learning Outcomes, Indicators of
More informationLogistic Regression: Basics
Evidence is no evidence if based solely on p value Logistic Regression: Basics Prediction Model: Binary Outcomes Nemours Stats 101 Laurens Holmes, Jr. General Linear Model OUTCOME Continuous Counts Data
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationPrognostic Outcome Studies
Prognostic Outcome Studies Edwin Chan PhD Singapore Clinical Research Institute Singapore Branch, Australasian Cochrane Centre DukeNUS Graduate Medical School Contents Objectives Incidence studies (Prognosis)
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2way tables Adds capability studying several predictors, but Limited to
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationAlways Start with PECO
Goals of This Course Be able to understand a study design (very basic concept) Be able to understand statistical concepts in a medical paper Be able to perform a data analysis Understanding: PECO study
More informationMEASURING ASSOCIATIONS IN EPIDEMIOLOGIC STUDIES. Dr. Ian Gardner University of California Davis
MEASURING ASSOCIATIONS IN EPIDEMIOLOGIC STUDIES Dr. Ian Gardner University of California Davis Measures of association Association means relationship Various type of measures of how numerically strong
More informationEBM Cheat Sheet Measurements Card
EBM Cheat Sheet Measurements Card Basic terms: Prevalence = Number of existing cases of disease at a point in time / Total population. Notes: Numerator includes old and new cases Prevalence is crosssectional
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationSample Size Estimation and Power Analysis
yumi Shintani, Ph.D., M.P.H. Sample Size Estimation and Power nalysis March 2008 yumi Shintani, PhD, MPH Department of Biostatistics Vanderbilt University 1 researcher conducted a study comparing the effect
More informationSection 3 Part 2. Describing Relationships Between Two Nominal Characteristics. PubH 6414 Section 3 Part 2 1
Section 3 Part 2 Describing Relationships Between Two Nominal Characteristics PubH 6414 Section 3 Part 2 1 Measuring relationship between two variables The Relative Risk and the Odds Ratio are measures
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationCasecontrol studies. Alfredo Morabia
Casecontrol studies Alfredo Morabia Division d épidémiologie Clinique, Département de médecine communautaire, HUG Alfredo.Morabia@hcuge.ch www.epidemiologie.ch Outline Casecontrol study Relation to cohort
More informationEpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME:
EpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME: Instructions: This exam is 30% of your course grade. The maximum number of points for the course is 1,000; hence this exam is worth 300
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: ChiSquared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chisquared
More informationStatistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
More informationSample Size Planning, Calculation, and Justification
Sample Size Planning, Calculation, and Justification Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationStudy Design and Statistical Analysis
Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing
More informationSPSS Multivariable Linear Models and Logistic Regression
1 SPSS Multivariable Linear Models and Logistic Regression Multivariable Models Single continuous outcome (dependent variable), one main exposure (independent) variable, and one or more potential confounders
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationDoes referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol
Does referral from an emergency department to an alcohol treatment center reduce subsequent emergency room visits in patients with alcohol intoxication? Robert Sapien, MD Department of Emergency Medicine
More informationImpact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 20042005. Tania Tang PHASE Symposium May 12, 2007
Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, 20042005 Tania Tang PHASE Symposium May 12, 2007 Presentation Outline Background Research Questions Methods Results Discussion
More informationLesson 14 14 Outline Outline
Lesson 14 Confidence Intervals of Odds Ratio and Relative Risk Lesson 14 Outline Lesson 14 covers Confidence Interval of an Odds Ratio Review of Odds Ratio Sampling distribution of OR on natural log scale
More informationWhen to Use a Particular Statistical Test
When to Use a Particular Statistical Test Central Tendency Univariate Descriptive Mode the most commonly occurring value 6 people with ages 21, 22, 21, 23, 19, 21  mode = 21 Median the center value the
More informationTips for surviving the analysis of survival data. Philip TwumasiAnkrah, PhD
Tips for surviving the analysis of survival data Philip TwumasiAnkrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationA CASECONTROL STUDY OF DOG BITE RISK FACTORS IN A DOMESTIC SETTING TO CHILDREN AGED 9 YEARS AND UNDER
A CASECONTROL STUDY OF DOG BITE RISK FACTORS IN A DOMESTIC SETTING TO CHILDREN AGED 9 YEARS AND UNDER L. Watson, K. Ashby, L. Day, S. Newstead, E. Cassell Background In Victoria, Australia, an average
More informationMeasuring reliability and agreement
Measuring reliability and agreement Madhukar Pai, MD, PhD Assistant Professor of Epidemiology, McGill University Montreal, Canada Professor Extraordinary, Stellenbosch University, S Africa Email: madhukar.pai@mcgill.ca
More informationTo use or not to use the odds ratio in epidemiologic studies?
To use or not to use the odds ratio in epidemiologic studies? Markku Nurminen European Journal of Epidemiology 1995; 11: 365371 Abstract. This paper argues that the use of the odds ratio parameter in
More informationChris Slaughter, DrPH. GI Research Conference June 19, 2008
Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions
More informationIf several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.
General Remarks This template of a data extraction form is intended to help you to start developing your own data extraction form, it certainly has to be adapted to your specific question. Delete unnecessary
More informationJournal of Statistical Software
JSS Journal of Statistical Software October 2013, Volume 55, Issue 5. http://www.jstatsoft.org/ Converting Odds Ratio to Relative Risk in Cohort Studies with Partial Data Information Zhu Wang Connecticut
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationGender Effects in the Alaska Juvenile Justice System
Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender
More informationMissing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationModule 223 Major A: Concepts, methods and design in Epidemiology
Module 223 Major A: Concepts, methods and design in Epidemiology Module : 223 UE coordinator Concepts, methods and design in Epidemiology Dates December 15 th to 19 th, 2014 Credits/ECTS UE description
More informationAn Application of the Gformula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.
An Application of the Gformula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao
More informationBasic research methods. Basic research methods. Question: BRM.2. Question: BRM.1
BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects
More informationApplication of Odds Ratio and Logistic Models in Epidemiology and Health Research
Application of Odds Ratio and Logistic Models in Epidemiology and Health Research By: Min Qi Wang, PhD; James M. Eddy, DEd; Eugene C. Fitzhugh, PhD Wang, M.Q., Eddy, J.M., & Fitzhugh, E.C.* (1995). Application
More informationThe CrossSectional Study:
The CrossSectional Study: Investigating Prevalence and Association Ronald A. Thisted Departments of Health Studies and Statistics The University of Chicago CRTP Track I Seminar, Autumn, 2006 Lecture Objectives
More informationSafety & Effectiveness of Drug Therapies for Type 2 Diabetes: Are pharmacoepi studies part of the problem, or part of the solution?
Safety & Effectiveness of Drug Therapies for Type 2 Diabetes: Are pharmacoepi studies part of the problem, or part of the solution? IDEG Training Workshop Melbourne, Australia November 29, 2013 Jeffrey
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationFamily physician job satisfaction in different medical care organization models
Family Practice Vol. 17, No. 4 Oxford University Press 2000 Printed in Great Britain Family physician job satisfaction in different medical care organization models Carmen GarcíaPeña a, Sandra ReyesFrausto
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationLecture 1: Introduction to Epidemiology
Lecture 1: Introduction to Epidemiology Lecture 1: Introduction to Epidemiology Dankmar Böhning Department of Mathematics and Statistics University of Reading, UK Summer School in Cesme, May/June 2011
More informationQuantitative relationship of sick building syndrome symptoms with ventilation rates
Published in Indoor Air Journal, volume 19, 2009 Quantitative relationship of sick building syndrome symptoms with ventilation rates William J. Fisk *, Anna G. Mirer, and Mark J. Mendell Indoor Environment
More informationMeasures of disease frequency
Measures of disease frequency Madhukar Pai, MD, PhD McGill University, Montreal Email: madhukar.pai@mcgill.ca 1 Overview Big picture Measures of Disease frequency Measures of Association (i.e. Effect)
More informationConverting Odds Ratio to Relative Risk in Cohort Studies with Partial Data Information
Converting Odds Ratio to Relative Risk in Cohort Studies with Partial Data Information Zhu Wang Connecticut Children s Medical Center University of Connecticut School of Medicine Abstract In medical and
More informationPredicting The Risk Of Rheumatoid Arthritis
Predicting The Risk Of Rheumatoid Arthritis Modelling Genetic And Environmental Risk Factors Ian Scott Arthritis Research UK Clinical Research Fellow Declaration Of Interests: No Competing Interests Describe
More informationSAMPLE SIZE TABLES FOR LOGISTIC REGRESSION
STATISTICS IN MEDICINE, VOL. 8, 795802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,
More informationDEVELOPING AN ANALYTICAL PLAN
The Fundamentals of International Clinical Research Workshop February 2004 DEVELOPING AN ANALYTICAL PLAN Mario Chen, PhD. Family Health International 1 The Analysis Plan for a Study Summary Analysis Plan:
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationUse of the ChiSquare Statistic. Marie DienerWest, PhD Johns Hopkins University
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationRelative Risk Regression in Medical Research: Models, Contrasts, Estimators, and Algorithms
UW Biostatistics Working Paper Series 7192006 Relative Risk Regression in Medical Research: Models, Contrasts, Estimators, and Algorithms Thomas Lumley University of Washington, tlumley@u.washington.edu
More informationAn Example of SAS Application in Public Health Research  Predicting Smoking Behavior in Changqiao District, Shanghai, China
An Example of SAS Application in Public Health Research  Predicting Smoking Behavior in Changqiao District, Shanghai, China Ding Ding, San Diego State University, San Diego, CA ABSTRACT Finding predictors
More informationAdministration of Emergency Medicine
doi:10.1016/j.jemermed.2005.07.008 The Journal of Emergency Medicine, Vol. 30, No. 4, pp. 455 460, 2006 Copyright 2006 Elsevier Inc. Printed in the USA. All rights reserved 07364679/06 $ see front matter
More informationAnswer keys for Assignment 16: Principles of data collection
Answer keys for Assignment 16: Principles of data collection (The correct answer is underlined in bold text) 1. Reliability denotes Precision Repeatability Reproducibility All of the above 2. Ability of
More informationCRITICAL APPRAISAL SKILLS PROGRAMME making sense of evidence. 11 questions to help you make sense of a Case Control Study
CRITICAL APPRAISAL SKILLS PROGRAMME making sense of evidence 11 questions to help you make sense of a Case Control Study General comments Three broad issues need to be considered when appraising a case
More informationBig data size isn t enough! Irene Petersen, PhD Primary Care & Population Health
Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health Introduction Reader (Statistics and Epidemiology) Research team epidemiologists/statisticians/phd students Primary care
More informationPREDICTION OF INDIVIDUAL CELL FREQUENCIES IN THE COMBINED 2 2 TABLE UNDER NO CONFOUNDING IN STRATIFIED CASECONTROL STUDIES
International Journal of Mathematical Sciences Vol. 10, No. 34, JulyDecember 2011, pp. 411417 Serials Publications PREDICTION OF INDIVIDUAL CELL FREQUENCIES IN THE COMBINED 2 2 TABLE UNDER NO CONFOUNDING
More informationLecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationBivariate Analysis. Comparisons of proportions: Chi Square Test (X 2 test) Variable 1. Variable 2 2 LEVELS >2 LEVELS CONTINUOUS
Bivariate Analysis Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X 2 chi square test >2 LEVELS X 2 chi square test CONTINUOUS ttest X 2 chi square test X 2 chi square test ANOVA (Ftest)
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationRandomized trials versus observational studies
Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James
More informationRisk Factors for Alcoholism among Taiwanese Aborigines
Risk Factors for Alcoholism among Taiwanese Aborigines Introduction Like most mental disorders, Alcoholism is a complex disease involving naturenurture interplay (1). The influence from the biopsychosocial
More informationIS 30 THE MAGIC NUMBER? ISSUES IN SAMPLE SIZE ESTIMATION
Current Topic IS 30 THE MAGIC NUMBER? ISSUES IN SAMPLE SIZE ESTIMATION Sitanshu Sekhar Kar 1, Archana Ramalingam 2 1Assistant Professor; 2 Post graduate, Department of Preventive and Social Medicine,
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationMarital Status and Suicide in the National Longitudinal Mortality Study
Marital Status and Suicide in the National Longitudinal Mortality Study Augustine J. Kposowa, Ph.D. University of California Riverside Journal of Epidemiology & Community Health March, 2000 Volume 54,
More informationLecture 7 Logistic Regression with Random Intercept
Lecture 7 Logistic Regression with Random Intercept Logistic Regression Odds: expected number of successes for each failure P(y log i x i ) = β 1 + β 2 x i 1 P(y i x i ) log{ Od(y i =1 x i =a +1) } log{
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth GarrettMayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 2729 July 2015 Missing data and net survival analysis Bernard Rachet General context Populationbased,
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationThe Effectiveness of Standard Care, Early Intervention, and Occupational Management in Worker s Compensation Claims
The Effectiveness of Standard Care, Early Intervention, and Occupational Management in Worker s Compensation Claims SPINE Volume 28, Number 3, pp 299 304 2003, Lippincott Williams & Wilkins, Inc. Mark
More informationCritical Appraisal of a Randomized Controlled Trial Are the Results of the Study Valid?
Critical Appraisal of a Randomized Controlled Trial Are the Results of the Study Valid? Objectives: 1. Compare and contrast observational and experimental studies. 2. Define study population and inclusion
More informationMedical expenditures of workrelated injuries among immigrant workers in USA
Medical expenditures of workrelated injuries among immigrant workers in USA Huiyun Xiang, MD, MPH, PhD Associate Professor Director for International Programs Center for Injury Research and Policy The
More informationPEER REVIEW HISTORY ARTICLE DETAILS VERSION 1  REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12Aug2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
More informationEPIDEMIOLOGY EXAM I REVIEW, FALL 2009
Topic 1. Case study: Hepatitis outbreak. 1. Characterizing outbreaks by time. Point source. Calculating likely exposure period... (latest reported case longest incubation time) (first reported case shortest
More informationBasic of Epidemiology in Ophthalmology Rajiv Khandekar. Presented in the 2nd Series of the MEACO Live Scientific Lectures 11 August 2014 Riyadh, KSA
Basic of Epidemiology in Ophthalmology Rajiv Khandekar Presented in the 2nd Series of the MEACO Live Scientific Lectures 11 August 2014 Riyadh, KSA Basics of Epidemiology in Ophthalmology Dr Rajiv Khandekar
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationIn the mid1960s, the need for greater patient access to primary care. Physician Assistants in Primary Care: Trends and Characteristics
Physician Assistants in Primary Care: Trends and Characteristics Bettie Coplan, MPAS, PAC 1 James Cawley, MPH, PAC 2 James Stoehr, PhD 1 1 Physician Assistant Program, College of Health Sciences, Midwestern
More informationBivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2
Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS ttest X 2 X 2 AOVA (Ftest) ttest AOVA
More information