Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni"

Transcription

1 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed MCMC algorithms for BAC and TBAC. In Web Appendices B-E, we provide additional simulation results to further evaluate the performances of TBAC and BAC under various situations. A summary of these simulation results is provided in Section 4.1 of the paper. Web Appendix A: Details of the MCMC Algorithms BAC The posterior samples of (α X, α Y, β αy ) are obtained by iteratively sampling from P (α X β αy, α Y, D), P (α Y β αy, α X, D) and P (β αy α X, α Y, D). The three full conditionals are 1) P (α X β αy, α Y, D) P (X α X )P (α X α Y ), where, based on Raftery et al. (1997), P (X α X ) using A3 ====== P (α X α Y, X) = P (X αx, α Y )P (α X α Y ) P (X α Y ) Γ( ν+n 2 = )(νλ)ν/2 π n/2 Γ( ν ) I 2 n + φ 2 W α XΣ 0α XW α X 1/2 using A2 ====== P (X αx )P (α X α Y ) P (X α Y ) {λν + (X W α Xµ 0α X) (I n + φ 2 W α XΣ 0α XW α X) 1 (X W α Xµ 0α X)} ν+n 2, and W α X is the design matrix of the exposure regression, I n is the n n identity matrix, n is the sample size. 2) P (α Y β αy, α X, D) P (Ỹ αy )P (α Y α X ), using A1 ====== P (α Y α X, Ỹ ) = P (Ỹ αx, α Y )P (α Y α X ) P (Ỹ αx ) using A4 ====== P (Ỹ αy )P (α Y α X ) P (Ỹ αx ) where Ỹ = Y βαy X. Let W α Y be the design matrix of the outcome regression and suppose the observations of X are placed in the first column of W α Y. Let W α Y be the

2 2 Biometrics, second to the (M + 1)th columns of W α Y, based on Raftery et al. (1997), P (Ỹ αy ) Γ( ν+n 2 = )(νλ)ν/2 π n/2 Γ( ν ) I 2 n + φ 2 W Σ W α Y 0α Y α Y 1/2 {λν + (Ỹ W α Y µ 0α Y ) (I n + φ 2 W α Y Σ 0α Y W α Y ) 1 (Ỹ W α Y µ 0α Y )} ν+n 2, where µ 0α Y is the second to the (M + 1)th elements of µ 0α Y and Σ 0α Y is the second to the (M + 1)th rows and the second to the (M + 1)th columns of Σ 0α Y. 3) P (β αy α X, α Y, D) using A3 ====== P (β αy α Y, D). Based on Bernardo and Smith (2000), we obtain β αy α Y, D t n+ν (β nα Y, σ 2 nα Y ), where β nα Y is the first element of θ nα Y, σ 2 nα Y is the (1,1) element of S nα Y, and θ nα Y = (W α Y W α Y + Σ 1 0α Y /φ 2 ) 1 (Σ 1 0α Y µ 0α Y /φ 2 + W α Y Y ) S nα Y = (n+ν) 1 {νλ+(y W α Y θ nα Y ) Y +(µ 0α Y θ nα Y ) Σ 1 0α Y µ 0α Y /φ 2 }{W α Y W α Y +Σ 1 0α Y /φ 2 ) 1 }. TBAC We implement two separate MCMC algorithms to draw from P (α X X) and from P (β αy, α Y D). First, we use the MC 3 method (Madigan and York, 1995) to sample from P (α X X) and count the appearance frequency of each α X. Second, we draw a posterior sample of (α Y, β αy ) to approximate P (α Y, β D). Using P (α Y X) in equation (7) as prior of α Y, we have P (α Y, β αy D) = P (Y αy, β αy )P (β αy α Y )P (α Y ) P (Y ) = P (Y αy, β αy )P (β αy α Y ) P (α Y α X )P (α X X) P (Y ) α X = α X P α X(α Y, β αy Y )P (α X X), where P α X(α Y, β αy Y ) = P (Y α Y, β αy )P (β αy α Y )P (α Y α X )/P (Y ) is the joint posterior of (α Y, β αy ) with prior on α Y specified as P (α Y α X ). For each given α X, we draw a

3 3 sample of (α Y, β αy ) from P α X(α Y, β αy Y ) by iteratively sampling from P α X(α Y β αy, Y ) and P α X(β αy α Y, Y ). These two full conditionals can be derived as follows: 1) P α X(α Y β αy, Y ) using A1 ===== P α X(α Y Ỹ ) P (Ỹ αy )P (α Y α X ). 2) P α X(β αy α Y, Y ) follows the t-distribution t n+ν (β nα Y, σ 2 nα Y ). We take the sample size equal to the frequency of α X in the Markov chain from P (α X X) and combine samples from different α X s together to obtain the sample of (α Y, β αy ) from their joint posterior P (α Y, β αy D). Web Appendix B: Simulation Results in the Presence of Predictors Only Correlated with X Consider the true model: Y i = βx i + δ1 Y U 1i + δ2 Y U 2i + ɛ Y i, where i = 1,..., 1000, and ɛ Y i are independent N(0, 1). (X i, U 1i, U 2i, U 3i, U 4i ) are independent normal vectors with mean zero and a covariance matrix, Σ = (σ kl ) 5 5, where σ kk = 1, k = 1,..., 5, σ 12 = σ 14 = σ 21 = σ 24 = ρ, σ 15 = σ 51 = ρ/2, and all other σ kl s equal to zero. Under this scenario, U 3 and U 4 are two predictors that are only correlated with X but not with Y (given X). The set of potential confounders U includes U 1,..., U 4 as well as 49 additional independent N(0, 1) random variables. In our simulation, ρ is set to 0.6 and β = δ1 Y = δ2 Y = 0.1. Five hundred independent data sets were generated. We applied odds priors for both BAC and TBAC, and chose the value of dependence parameter ω to be 2, 4, 10 or. The results are summarized in Web Table 1. [Web Table 1 about here.] For both BAC and TBAC, the standard errors of estimates increase as ω increases. Since the data include two predictors only correlated with X but not correlated with Y (given X), increasing ω assigns higher probabilities to include them in the outcome model, which yields larger standard errors. On the other hand, the bias decreases as ω increases. Since

4 4 Biometrics, the data include U 1, a confounder strongly correlated with X but weakly correlated with Y, increasing ω assigns higher probability to include this predictor in the outcome model, which yields less bias. In terms of MSE, which balances bias and variation, having ω less than infinity yields smaller MSEs. But in terms of coverage probability of 95% CIs, small ω, such as ω = 2, provides lower coverage probability than desired. Interestingly, ω = 10 has the same coverage probability as ω = but much smaller MSE, and therefore is a better choice under this simulation scenario. Web Appendix C: Simulation Results when the Exposure Model is Misspecified Consider the same true outcome model as in the first simulation scenario in the paper: Y i = βx i + δ Y 1 U 1i + δ Y 2 U 2i + ɛ Y i, where i = 1,..., 1000, and ɛ Y i are independent N(0, 1). The set of potential confounders U includes U 1, U 2 as well as 49 additional random variables. All these potential confounders follow independent N(0, 1). In this scenario, the exposure X is modeled as a non-linear function of U 1 : X i = δ X 1 U 3 1i + ɛ X i, where ɛ X i are independent N(0, 0.5). In our simulation, we set β = δ Y 1 = δ Y 2 = 0.1, and δ X 1 = 0.7. We generated 500 data sets and applied BAC and TBAC with ω =. The results are summarized in Web Table 2. [Web Table 2 about here.] Estimates from BAC and TBAC are very similar to each other, both close to the results obtained from the true model. In this simulation scenario, both methods are robust to the misspecification of the exposure model. Web Appendix D: Simulation Results for Comparing TBAC vs. BAC Consider the true model: Y i = βx i + δ Y 1 U 1i + δ Y 2 U 2i + δ Y 3 U 3i + δ Y 4 U 4i + ɛ Y i, where i = 1,..., 100, and ɛ Y i are independent N(0, 1). The set of potential confounders U consists

5 5 of U 1i,..., U 4i, which are independent N(0, 1) random variables. X is modeled by X i = δ X 1 U 1i + δ X 2 U 2i + δ X 3 U 3i + δ X 4 U 4i + ɛ X i, where ɛ X i are independent N(0, ). In our simulation, we set β = δ X 1 = δ X 2 = δ Y 1 = δ Y 3 = 0.1, δ X 3 = δ X 4 = 0.6 and δ Y 2 = δ Y 4 = 2. Five hundred independent data sets were generated and BAC and TBAC with ω = were applied. We first compared the marginal posterior distributions of α X (Web Table 3). For both BAC and TBAC, all the posterior weights are assigned to models containing U 3 and U 4 since these two predictors are strongly correlated with X. TBAC assigns equal weights to α X = (0, 1, 1, 1) and α X = (1, 0, 1, 1) since U 1 and U 2 have the same correlation coefficient with X. In contrast, BAC assigns much higher weight to α X = (0, 1, 1, 1) than to α X = (1, 0, 1, 1). This is the result of feedback effect since U 2 is strongly correlated with Y while U 1 is weakly correlated with Y. [Web Table 3 about here.] We next compared the marginal posterior distribution of α Y (Web Table 4). For both BAC and TBAC, the posterior weights are concentrated on models containing U 2, U 3 and U 4 since thses three predictors are highly correlated with either X or Y or both. Large weights are assigned to α Y = (0, 1, 1, 1), the model not containing U 1, since U 1 is only weakly correlated with both X and Y. Compared to TBAC, BAC assigns more weight to α Y = (0, 1, 1, 1). By considering the feedback effect and joint modeling the exposure and outcome models, BAC tends to assign higher weights to more parsimonious models. [Web Table 4 about here.] Finally, we compared the estimation of exposure effect, β. As shown in Web Table 5, the estimates from two methods are very similar. [Web Table 5 about here.]

6 6 Biometrics, Web Appendix E: Simulation Results for Comparing the MSEs from BAC and TBAC versus that from BMA We used the same models as in the two simulation scenarios in the paper, but considered a smaller sample size of 100. For each simulation scenario, we generated 500 replications. The dependence parameters ω in both BAC and TBAC are set to. The estimation results are summarized in Web Table 6. When sample size is 100, the MSE from BMA is lower than those from BAC and TBAC in simulation scenario one but is higher in scenario two. When sample size increases to 1000, as shown in the paper, MSEs of BAC and TBAC are lower in both scenarios. [Web Table 6 about here.] References Bernardo, J. M. and Smith, A. F. M. (2000). Bayesian Theory. John Wiley & Sons, England. Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review 63, Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92,

7 7 Web Table 1 Comparison of estimates of β from BAC and TBAC using odds priors. BIAS is the difference between the mean of estimates of β and the true value, SEE is the mean of standard error estimates, SSE is the standard error of the estimates of β, MSE is the mean square error, and CP is the coverage probability of the 95% confidence interval or credible interval. Method BIAS SEE SSE MSE CP True model BAC ω = ω = ω = ω = TBAC ω = ω = ω = ω =

8 8 Biometrics, Web Table 2 Comparison of estimates of β from BAC and TBAC when the exposure model is misspecified. The dependence parameters ω in both BAC and TBAC are set to. Method BIAS SEE SSE MSE CP True model BAC TBAC

9 9 Web Table 3 Comparison of marginal posterior distributions of α X. The dependence parameters ω in both BAC and TBAC are set to. Model P (α X D) from BAC P (α X D) from TBAC (0,0,1,1) (0,1,1,1) (1,0,1,1) (1,1,1,1)

10 10 Biometrics, Web Table 4 Comparison of marginal posterior distributions of α Y. The dependence parameters ω in both BAC and TBAC are set to. Model P (α Y D) from BAC P (α Y D) from TBAC (0,1,1,1) (1,1,1,1)

11 11 Web Table 5 Comparison of estimates of β from BAC and TBAC. The dependence parameters ω in both BAC and TBAC are set to. Model BIAS SEE SSE MSE CP MLE from model (0,1,1,1) MLE from model (1,1,1,1) BAC TBAC

12 12 Biometrics, Web Table 6 Comparison of MSEs from BAC, TBAC and BMA. The data were generated from the same models as in the two simulation scenarios in the paper, but with sample size 100. The dependence parameters ω in both BAC and TBAC are set to. Simulation Scenario Method BIAS SEE SSE MSE CP One BAC TBAC BMA Two BAC TBAC BMA

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Fraternity & Sorority Academic Report Spring 2016

Fraternity & Sorority Academic Report Spring 2016 Fraternity & Sorority Academic Report Organization Overall GPA Triangle 17-17 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190-190 4 Phi Gamma Delta 85 3

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Fraternity & Sorority Academic Report Fall 2015

Fraternity & Sorority Academic Report Fall 2015 Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 1-1 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

More information

Bayesian Effect Estimation Accounting for Adjustment Uncertainty

Bayesian Effect Estimation Accounting for Adjustment Uncertainty Bayesian Effect Estimation Accounting for Adjustment Uncertainty Chi Wang, 1,2, Giovanni Parmigiani, 3,4 and Francesca Dominici 4 1 Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA

More information

The University of Kansas

The University of Kansas All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha

More information

Lab 8: Introduction to WinBUGS

Lab 8: Introduction to WinBUGS 40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

The University of Kansas

The University of Kansas Fall 2011 Scholarship Report All Greek Summary Rank Chapter Name Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 *4 Gamma Phi Beta 3.28 4 Kappa Kappa Gamma 3.28 6 Pi Beta Phi

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Brian J. Smith, Ph.D. The University of Iowa Joint Statistical Meetings August 10,

More information

Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods. 1 The Joint Posterior Distribution Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Extreme Value Modeling for Detection and Attribution of Climate Extremes Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures. Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Lecture 7 Linear Regression Diagnostics

Lecture 7 Linear Regression Diagnostics Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Bayesian Model Averaging CRM in Phase I Clinical Trials

Bayesian Model Averaging CRM in Phase I Clinical Trials M.D. Anderson Cancer Center 1 Bayesian Model Averaging CRM in Phase I Clinical Trials Department of Biostatistics U. T. M. D. Anderson Cancer Center Houston, TX Joint work with Guosheng Yin M.D. Anderson

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report

University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report University of Maryland Fraternity & Sorority Life Academic Report Academic and Population Statistics Population: # of Students: # of New Members: Avg. Size: Avg. GPA: % of the Undergraduate Population

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i ) Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH. GI Research Conference June 19, 2008 Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component) Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Model Selection and Claim Frequency for Workers Compensation Insurance

Model Selection and Claim Frequency for Workers Compensation Insurance Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Course 4 Examination Questions And Illustrative Solutions. November 2000

Course 4 Examination Questions And Illustrative Solutions. November 2000 Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile first-order moving average model to a time series. The lag-one sample autocorrelation coefficient is 0.35.

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Model selection bias and Freedman s paradox

Model selection bias and Freedman s paradox Ann Inst Stat Math (2010) 62:117 125 DOI 10.1007/s10463-009-0234-4 Model selection bias and Freedman s paradox Paul M. Lukacs Kenneth P. Burnham David R. Anderson Received: 16 October 2008 / Revised: 10

More information

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Confidence Intervals for Cp

Confidence Intervals for Cp Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

More information

Questions and Answers on Hypothesis Testing and Confidence Intervals

Questions and Answers on Hypothesis Testing and Confidence Intervals Questions and Answers on Hypothesis Testing and Confidence Intervals L. Magee Fall, 2008 1. Using 25 observations and 5 regressors, including the constant term, a researcher estimates a linear regression

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

Bayesian Approaches to Handling Missing Data

Bayesian Approaches to Handling Missing Data Bayesian Approaches to Handling Missing Data Nicky Best and Alexina Mason BIAS Short Course, Jan 30, 2012 Lecture 1. Introduction to Missing Data Bayesian Missing Data Course (Lecture 1) Introduction to

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

More information

Applications of R Software in Bayesian Data Analysis

Applications of R Software in Bayesian Data Analysis Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

More information

A Latent Variable Approach to Validate Credit Rating Systems using R

A Latent Variable Approach to Validate Credit Rating Systems using R A Latent Variable Approach to Validate Credit Rating Systems using R Chicago, April 24, 2009 Bettina Grün a, Paul Hofmarcher a, Kurt Hornik a, Christoph Leitner a, Stefan Pichler a a WU Wien Grün/Hofmarcher/Hornik/Leitner/Pichler

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

More information

QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

More information

IU Fraternity & Sorority Spring 2012 Grade Report

IU Fraternity & Sorority Spring 2012 Grade Report SORORITY CHAPTER RANKINGS 1 Chi Delta Phi 3.550 2 Alpha Omicron Pi 3.470 3 Kappa Delta 3.447 4 Alpha Gamma Delta 3.440 5 Delta Gamma 3.431 6 Alpha Chi Omega 3.427 7 Phi Mu 3.391 8 Chi Omega 3.372 8 Kappa

More information

GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide

GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M Agenda Introduction Testing the link function

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

15.1 The Regression Model: Analysis of Residuals

15.1 The Regression Model: Analysis of Residuals 15.1 The Regression Model: Analysis of Residuals Tom Lewis Fall Term 2009 Tom Lewis () 15.1 The Regression Model: Analysis of Residuals Fall Term 2009 1 / 12 Outline 1 The regression model 2 Estimating

More information

Comparison of resampling method applied to censored data

Comparison of resampling method applied to censored data International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

More information

SPRING 2011 FRATERNITY/SORORITY GRADE REPORT SORORITY CHAPTER RANKINGS

SPRING 2011 FRATERNITY/SORORITY GRADE REPORT SORORITY CHAPTER RANKINGS SPRING 2011 FRATERNITY/SORORITY GRADE REPORT SORORITY CHAPTER RANKINGS 1 Kappa Alpha Theta 3.4643 2 Phi Mu 3.4273 3 Kappa Delta 3.4260 4 Alpha Omicron Pi 3.4072 5 Delta Gamma 3.4072 6 Alpha Chi Omega 3.3989

More information

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

Basic Quantum Mechanics Prof. Ajoy Ghatak Department of Physics. Indian Institute of Technology, Delhi

Basic Quantum Mechanics Prof. Ajoy Ghatak Department of Physics. Indian Institute of Technology, Delhi Basic Quantum Mechanics Prof. Ajoy Ghatak Department of Physics. Indian Institute of Technology, Delhi Module No. # 02 Simple Solutions of the 1 Dimensional Schrodinger Equation Lecture No. # 7. The Free

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

Bayesian Adaptive Designs for Early-Phase Oncology Trials

Bayesian Adaptive Designs for Early-Phase Oncology Trials The University of Hong Kong 1 Bayesian Adaptive Designs for Early-Phase Oncology Trials Associate Professor Department of Statistics & Actuarial Science The University of Hong Kong The University of Hong

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Web-based Supplementary Materials

Web-based Supplementary Materials Web-based Supplementary Materials Continual Reassessment Method for Partial Ordering by Nolan A. Wages, Mark R. Conaway, and John O Quigley Web Appendix A: Further details for matrix orders In this section,

More information

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6

More information

Can we establish cause-and-effect relationships in large healthcare databases?

Can we establish cause-and-effect relationships in large healthcare databases? Can we establish cause-and-effect relationships in large healthcare databases? Lawrence McCandless Associate Professor lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University Spring 2016 Example

More information

Multilevel Modelling of medical data

Multilevel Modelling of medical data Statistics in Medicine(00). To appear. Multilevel Modelling of medical data By Harvey Goldstein William Browne And Jon Rasbash Institute of Education, University of London 1 Summary This tutorial presents

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Solutions to Worksheet on Hypothesis Tests

Solutions to Worksheet on Hypothesis Tests s to Worksheet on Hypothesis Tests. A production line produces rulers that are supposed to be inches long. A sample of 49 of the rulers had a mean of. and a standard deviation of.5 inches. The quality

More information

INTEREST RATES AND FX MODELS

INTEREST RATES AND FX MODELS INTEREST RATES AND FX MODELS 8. Portfolio greeks Andrew Lesniewski Courant Institute of Mathematical Sciences New York University New York March 27, 2013 2 Interest Rates & FX Models Contents 1 Introduction

More information

In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

More information

Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals. In Example 41 we used the fact that Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

More information

Modern regression 2: The lasso

Modern regression 2: The lasso Modern regression 2: The lasso Ryan Tibshirani Data Mining: 36-462/36-662 March 21 2013 Optional reading: ISL 6.2.2, ESL 3.4.2, 3.4.3 1 Reminder: ridge regression and variable selection Recall our setup:

More information

1. χ 2 minimization 2. Fits in case of of systematic errors

1. χ 2 minimization 2. Fits in case of of systematic errors Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information