Types of Biostatistics. Lecture 18: Review Lecture. Types of Biostatistics. Approach to Modeling. 2) Inferential Statistics
|
|
- Charlene King
- 7 years ago
- Views:
Transcription
1 Types of Biostatistics Lecture 18: Review Lecture Ani Manichaikul 15 May ) Inferential Statistics Confirmatory Data Analysis Methods Section of paper Goal: quantify relationships, test hypotheses Types of Biostatistics 1) Descriptive Statistics Exploratory Data Analysis often not in literature Summaries "Table 1" in a paper Goal: visualize relationships, generate hypotheses Approach to Modeling A general approach for most statistical modeling is to: Define the Population of Interest State the Scientific Questions & Underlying Theories Describe and Explore the Observed Data Define the Model Probability part (models the randomness / noise) Systematic part (models the expectation / signal)
2 Approach to Modeling Estimate the Parameters in the Model Fit the Model to the Observed Data Make Inferences about Covariates Check the Validity of the Model Verify the Model Assumptions Re-define, Re-fit, and Re-check the Model if necessary Interpret the results of the Analysis in terms of the Scientific Questions of Interest Grouping: Frequency Distribution Tables Shows the number of observations for each range of data Intervals can be chosen in ways similar to stem-and-leaf displays Age Interval Frequency Stem-and-Leaf Plots Age in years (10 observations) 25, 26, 29, 32, 35, 36, 38, 44, 49, 51 Histograms Pictures of the frequency or relative frequency distribution Age Interval Observations Frequency Histogram of Age Age Category
3 Box-and-Whisker Plots Box Plot of Age Age in Years IQR = = 15 Upper Fence = *1.5 = 66.5 Lower Fence = 29 15*1.5 = Continuous Variables Scatterplot Height in Centimeters Age by Height in cm Age in Years Scatterplots visually display the relationship between two continuous variables Why is the power of a test important? Power indicates the chance of finding a significant difference when there really is one Low power: like to obtain non-significant results even when significant differences exist High power is desirable! Low power is usually cause by small sample size
4 We re not always right Errors in Hypothesis Testing " Aim: To keep Type II error small and thus power high Errors in Hypothesis Testing! Aim: to keep Type I error small by specifying a small rejection region! is set before performing a test, usually at 0.05 ": Probability of Type II Error The value of " is usually unknown since it depends on a specified alternative value. " depends on sample size and!. Before data collection, scientists decide the test they will perform! the desired " They will use this information to choose the sample size
5 P-Values Definition: The p-value for a hypothesis test is the probability of obtaining by chance, alone, when H 0 is true, a value of the test statistic as extreme or more extreme (in the appropriate direction) than the one actually observed. Why use linear regression? Linear regression is very powerful. It can be used for many things: Binary X Continuous X Categorical X Adjustment for confounding Interaction Curved relationships between X and Y Steps of Hypothesis Testing Define the null hypothesis, H 0. Define the alternative hypothesis, H a, where H a is usually of the form not H 0. Define the type 1 error,!, usually Calculate the test statistic Calculate the P-value If the P-value is less than!, reject H 0. Otherwise fail to reject H 0. SLR: Y=! 0 +! 1 X 1!" Linear regression is used for continuous outcome variables! 0 : mean outcome when X=0 (Center!) Binary X = dummy variable for group! 1 : mean difference in outcome between groups Continuous X! 1 : mean difference in outcome corresponding to a 1-unit increase in X Center X to give meaning to! 0 Test! 1 =0 in the population 20
6 Assumptions of Linear Regression Regression Methods L Linear relationship I Independent observations N Normally distributed around line E Equal variance across X s In Simple Linear Regression Regression Methods In simple linear regression (SLR): One Predictor / Covariate / Explanatory Variable: X In multiple linear regression (MLR): Same Assumptions as SLR, (i.e. L.I.N.E.), but: More than one Covariate: X 1, X 2, X 3,, X p Model: Y ~ N(µ, # 2 ) µ = E(Y X) = " 0 + " 1 X 1 + " 2 X 2 + " 3 X " p Xp
7 Nested models One model is nested within another if the parent model contains one set of variables and the extended model contains all of the original variables plus one or more additional variables. The F test H 0 : all new! s=0 in population H A : at least one new! is not 0 in population F obs = ( RSSparent $ RSSnested ) ( # of new variablesadded ) RSS ( 69.6 $ 49.8) nested F 2 obs = = What is F cr? residual df nested Difference in assessing variables: nested models other predictor(s) assess with t test if single variable defines predictor assess with F test (today) if two or more variables are needed to define the predictor potential confounder(s) compare CI of primary predictor to see whether new parameter is significantly different The F test: notes The F test can be used to compare any two nested models If only one variable is added, it s easier to compare the models using the t test for that variable t 2 =F if one variable is added For any regression, the estimated variance of the residuals is RSS/(residual df)
8 Nested Models Comparing nested models 1 new variable: use t test for that variable 2+ new variables: use F test Categorical predictor set one group as reference create dummy variable for other groups include/exclude all dummy variables evaluate categorical predictor with F test Splines and Quadratic Terms Splines are used to allow the regression line to bend the breakpoint is arbitrary and decided graphically or by hypothesis the actual slope above and below the breakpoint is usually of more interest than the coefficient for the spline (ie the change in slope) Quadratic term allows for curvature in the model 31 Effect Modification In linear regression, effect modification is a way of allowing the association between the primary predictor and the outcome to change with the level of another predictor. If the 3 rd predictor is binary, that results in a graph in which the two lines (for the two groups) are no longer parallel. Logistic regression For binary outcomes Model log odds probability, which we also call the logit Baseline term interpreted as log odds Other coefficients are log odds ratios
9 Logistic regression model log [ odds(relief Tx) ] = log( ( % % ) P(no relief Tx) & * = " 0 + " 1 Tx P(relief Tx) ' And * odds(r D) ' ) odds(r P) & ( % Thus: log ( % = " 1 And: OR = exp(" 1 ) = e "1!! where: Tx = 0 if Placebo 1 if Drug So: exp(" 1 ) = odds ratio of relief for patients taking the Drug-vs-patients taking the Placebo. Then Logistic Regression log( odds(relief Drug) ) = " 0 + " 1 log( odds(relief Placebo) ) = " 0 log( odds(r D)) log( odds(r P)) = " 1 Logit estimates Number of obs = 70 LR chi2(1) = 2.83 Prob > chi2 = Log likelihood = Pseudo R2 = y Coef. Std. Err. z P> z [95% Conf. Interval] drug _cons Estimates: log( odds(relief) ) = ˆ "ˆ " + 0 Drug 1 = (Drug) Therefore: OR = exp(0.814) = 2.26!
10 Adding other variables What if Pr(relief) = function of Drug or Placebo AND Age Types of interpretation! 0 +! 1 = ln(odds) (for X=1)! 1 = difference in log odds We could easily include age in a model such as:! 0! e + 1 e! 1 = odds (for X=1) = odds ratio log( odds(relief) ) = " 0 + " 1 Drug + " 2 Age But we started with P(Y=1). Can we find that? Logistic Regression As in MLR, we can include many additional covariates. For a Logistic Regression model with p predictors: log ( odds(y=1)) = " 0 + " 1 X " p X p Pr( Y = 1) 1 $ Pr( Y = 1) where: odds(y=1) = = Pr( Y Pr( Y = 1) = 0) More useful math probability odds = 1$ probability odds probability = 1+ odds! +! e so probabilityfor + 1+ e 0 1 ( X = 1) =! 0! 1
11 Nested models Adding a single new variable to the model null model: full model: * p ' ln( % =! +! 1 ) 1$ p & ( Age 30) 0 $ * p ' ln( % =! 0 +! ) 1$ p & ( Age $ 30)! ( Multivita min) Conclusion from the Wald test The p-value for multivitamin is (<0.05) and the CI for coefficient multivitamin does not include 0 (CI for OR doesn t include 1) Reject H 0 Conclude that the larger model is better: after adjusting for age, multivitamin use is still an important predictor of physician visits in the population Comparing nested models that differ by one variable Compare models with p-value or CI What method is this? The Wald test, a test that applies the CLT, like Z test comparing proportions in 2x2 table analogous to the t test for linear regression H 0 : the new variable is not needed or H 0 :! new =0 in the population Interpretation - log odds! 0 : the log odds of not visiting a physician for a 30-year-old person who reports not regularly taking multivitamins! 1 : the log odds ratio of not visiting a physician for a one year increase in age controlling for multivitamin use! 2 : the log odds ratio of not visiting a physician for those who take multivitamins compared with those who do not, adjusting for age
12 Interpretation odds and odds ratio exp{! 0 }: the odds of not visiting a physician for a 30-year-old person who reports not regularly taking multivitamins Interpretation odds and odds ratio exp{! 2 }: the odds ratio of not visiting a physician for those who take multivitamins compared with those who do not is exp{! 2 }=0.46, adjusting for age taking multivitamins is associated with regular physician visits (p=0.007) Interpretation odds and odds ratio exp{! 1 }: after adjusting for multivitamin use, the odds ratio of not visiting a physician changes by a factor of exp{! 1 }=1.001 for each additional year of age additional age is associated with lower frequency of physician visits in these students, but the association is not statistically significant (p>0.05) Interpretation In General * odds(y = 1 X + 1,X ( Also: log 1 2 % ( % = " 1 ( ) odds(y = 1 X,X ) 1 2 And: OR = exp(" 1 )!! exp(" 1 ) is the Multiplicative change in odds for a 1 unit increase in X 1 provided X 2 is held constant. ) ' The result is similar for X 2 % &
13 CHD by smoking and coffee Y i = 1 if CHD case, 0 if control COF i = 1 if Coffee Drinker, 0 if not SMK i = 1 if Smoker, 0 if not p i = Pr (Y i = 1) Logistic Regression Model * ( ) p ' % & COF SMK i log ( = " 0 + " 1 i + " 2 i + " 3 1$ p % i COF SMK Which implies that Pr(Y i =1) is the logistic function! 0 +! 1X i1+ " 2 X i 2 + " 3 e p i =! 0 +! 1 X i 1+ " 2X i 2+ " 3 1+ e i X i i1 X i 2 X i 1X i 2 n i = Number observed at pattern i of Xs Logistic Regression Model Y i are from a Binomial (n i, p i ) distribution Yi are independent log odds (Y i =1) (or, logit( Y i =1) ) is a function of Coffee Smoking and coffee x smoking interaction Interpretations exp{# 1 }: odds ratio of being a CHD case for coffee drinkers -vs- non-drinkers among non-smokers exp{# 1!# 3 }: odds ratio of being a CHD case for coffee drinkers -vs- nondrinkers among smokers
14 Interpretations exp{# 2 }: odds ratio of being a CHD case for smokers -vs- non-smokers among non-coffee drinkers exp{# 2!# 3 }: odds ratio of being case for smokers -vs- non-smokers among coffee drinkers exp{# 3 } Interpretations exp{# 3 }: factor by which odds ratio of being a CHD case for coffee drinkers -vsnondrinkers is multiplied for smokers as compared to non-smokers or exp{# 3 }: factor by which odds ratio of being a CHD case for smokers -vs- non-smokers is multiplied for coffee drinkers as compared to non-coffee drinkers Interpretations e " e " 0 fraction of cases among nonsmoking non-coffee drinking individuals in the sample (determined by sampling plan) exp{# 3 }: ratio of odds ratios Some Special Cases Given * Pr( Y = 1) ' log( % = " 0 + " 1COF + " 2SMK + " 3COF * SMK ) Pr( Y = 0) & If # 1 = # 2 = # 3 = 0 Neither smoking nor coffee drinking is associated with increased risk of CHD
15 Some Special Cases Given * Pr( Y = 1) ' log( % = " 0 + " 1COF + " 2SMK + " 3COF * SMK ) Pr( Y = 0) & If # 1 = # 3 = 0 Smoking, but not coffee drinking, is associated with increased risk of CHD Confounding In epidemiological terms, Z is a confounder of the relationship of Y with X if Z is related to both X and Y and Z is not in the causal pathway between X and Y In statistical terms, Z is a confounder of the relationship of Y with X if the X coefficient changes when Z is added to a regression of Y on X Some Special Cases If # 3 = 0 Smoking and coffee drinking are both associated with risk of CHD but the odds ratio of CHD-smoking is the same at levels of coffee Smoking and coffee drinking are both associated with risk of CHD but the odds ratio of CHD-coffee is the same at levels of smoking. Confounding For example, consider the two models Y = # 0 + # 1 X + " 1 Y = $ 0 + $ 1 X + $ 2 Z + " 2 then Z is a confounder of the X, Y relationship if $ 1 " # 1
16 Look at Confidence Intervals Without Smoking OR = e 0.79 = % CI for log(or): 0.79 ± 1.96(0.33) = (0.13, 1.44) 95% CI for OR: (e 0.13, e 1.44 ) = (1.14, 4.24) Conclusion So, ignoring smoking, the CHD and coffee OR is 2.2 (95%CI: ) Adjusting for smoking, gives more modest evidence for a coffee effect In this case-control study, smoking is a weak-to-moderate confounder of the coffee-chd association Look at Confidence Intervals Interaction Model With Smoking (adjusting for smoking) OR = e 0.53 = 1.7 Variable Est Model 3 se z Intercept % CI for log(or): 0.53 ± 1.96(0.35) = (-0.17, 1.22) 95% CI for OR: (e -0.17, e 1.22 ) = (0.85, 3.39) Coffee Smoking Coffee* Smoking
17 Testing Interaction Term Likelihood Ratio Test Z= -0.59, p-value = % Confidence interval for # 1!# 3 (0.42, 3.99) Both of the above suggest that there is little evidence that smoking is an effect modifier! Deviance is a term used for the difference in -2*log likelihood relative to the best possible value from a perfectly predicting model. Change in deviance is the same as change in -2LL. Likelihood Ratio Test LRT Example The Likelihood Ratio Test will help decide whether or not additional term(s) significantly improve the model fit Likelihood Ratio Test (LRT) statistic for comparing nested models is -2 times the difference between the log likelihoods (LLs) for the Null -vs- Extended models the % obtained is identical to % from an analysis of variance test for linear regression models
18 Model comparisons using likelihood ratio test Summary: Adjusted ORs Controlling for the potential confounding of smoking, the coffee odds ratio was estimated to be 1.7 with 95% CI: (.85, 3.4). Hence, the evidence in these data are insufficient to conclude coffee has an independent effect on CHD beyond that of smoking. Summary: Unadjusted ORs The odds of CHD was estimated to be 3.4 times higher among smokers compared to non-smokers 95% CI: (1.7, 7.9) The odds of CHD was estimated to be 2.2 times higher among coffee drinkers compared to non-coffee drinkers 95% CI: (1.1, 4.3) Comparing the models Models C and F are both nested in Model A Models C and F cannot be directly compared to one another, but we can see which has a smaller p-value when compared to Model A C vs. A: X 2 = 26.5 with 2 df F vs. A: X 2 = 21.7 with 3 df
19 What next? Model C improves prediction beyond gender alone (Model A) more than Model F. Model C should be the next parent model, and we should test the new variables in Model F to see if they continue to improve prediction within the context of Model C. When a tentative final model is identified, the assumptions of logistic regression should be checked. Poisson regression model Log-linear model for mean rate where p is the number of predictors in the model Random component: Here: Flexibility in linear models Exponentiating Poisson regression models A spline allows the slope for a continuous predictor to change at a given point; the coefficient is for the difference in log odds ratio An interaction term allows the odds ratio for one variable to differ by the value of a second variable; the coefficient is for the difference in log odds ratio 74
20 Interpreting Poisson regression parameters Person-years In defining rates, it is crucial to state what denominator we have in mind For disease, we are usually interested in disease rate per person, per year If the HIV incidence rate is 5 per 1 million person years, that means we expect to see 5 new cases of HIV per 1 million persons per year Modelling rates Of key interest in Poisson regression models is to make inference about rates of events We are often interested in whether the rate of cancer, or some other disease, varies by population subgroups such as gender, race, or age Modelling Danish Cancer cases with an offset We observed Danish cancer cases in 6 age groups over a period of 4 years The model: predicts log rates per 10,000 person years
21 Interpretation of coefficients Poisson regression for cohort studies Log-linear regression can be used to estimate relative risks for cohort studies (but not case control) Relative risks is like relative rates, but we are comparing risks (probability of disease) instead of rates (expected cases per personyear) across groups Could also estimate relative risk by transforming results from logistic regression More about offsets The purpose of an offset is to specify the denominator of the predicted rates We should always try to use an offset if we suspect the underlying population sizes vary for the observed counts Typically, we ll use log(n) as the offset, where N is the sample size or number of person years generating each count Grand summary Exploratory analysis includes graphs and tables good to get a feel for the data Confirmatory analysis is useful for making definitive conclusions Linear models provide us with a framework in which to perform confirmatory analysis in many settings
22 Grand summary: linear models Linear regression: for continuous (normal) outcomes Logistic regression: for binary outcomes Poisson regression: for counts Grand summary: testing We can test significance of a single predictor using z-test (or t-test for linear regression) Test significance of several covariates using a pair of nested models by a likelihood ratio test Know how to interpret p-values and confidence intervals! Grand summary: modelling In all generalized linear models, we can use the following tools to make models more flexible: Adjust for confounders using additive covariates Effect modification allows by interaction terms Curved and bent lines through polynomials and splines
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationIII. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis
III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationBiostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationOutline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationMultiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict
More informationIntroduction. Survival Analysis. Censoring. Plan of Talk
Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationList of Examples. Examples 319
Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationA full analysis example Multiple correlations Partial correlations
A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationAdvanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090
Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001-(781)-751-6356 fax 001-(781)-329-3379 trhodes@mib.com Abstract
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationChapter 18. Effect modification and interactions. 18.1 Modeling effect modification
Chapter 18 Effect modification and interactions 18.1 Modeling effect modification weight 40 50 60 70 80 90 100 male female 40 50 60 70 80 90 100 male female 30 40 50 70 dose 30 40 50 70 dose Figure 18.1:
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More information10 Dichotomous or binary responses
10 Dichotomous or binary responses 10.1 Introduction Dichotomous or binary responses are widespread. Examples include being dead or alive, agreeing or disagreeing with a statement, and succeeding or failing
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationLatent Class Regression Part II
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationLongitudinal Data Analysis
Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
More information