Applications of R Software in Bayesian Data Analysis

Size: px
Start display at page:

Download "Applications of R Software in Bayesian Data Analysis"

Transcription

1 Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: ISSN: Florida, USA Applications of R Software in Bayesian Data Analysis Nageena Nazir*, Athar Ali Khan A. H. Mir and Showkat Maqbool Division of Agricultural Statistics, Sher-e- Kashmir University of Agricultural Sciences & Technology Kashmir, Shalimar Srinagar * To whom correspondence should be addressed: nazir.nageena@gmail.com Article history: Received 15 May 2012, Received in revised form 29 May 2012, Accepted 29May 2012, Published 30 May Abstract: Bayesian statistics is an approach to statistics which formally seeks use of prior information with the data, and Baye s Theorem provides the formal basis for making use of both sources of information in a formal manner. The Bayesian analysis is the study of different features of posterior density. R software is used to explore these features from numeric as well as graphic view point. Proper emphasis has been given on graphical features throughout. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. Keywords: R software, Bayesian Data Analysis 1. Introduction Bayesian statistics is an approach to statistics, which formally seeks use of prior information and Baye's theorem provides the basis for making use of this information in a formal manner. When significant prior information is available, the Bayesian approach shows how to utilize it sensibly. This is not possible with most non Bayesian approaches. In Bayesian approach the parameter of interest is treated as random and data as fixed which is in contrast to frequents approach where parameter is treated as fixed and data as random. The business of statistics is to provide information or conclusion about uncertain quantities. The language of uncertainty is probability and only the conditional probability, Bayesian approach consistently uses this language to address uncertainty. Bayes Theorem states that

2 8 or equivalently posterior likelihood p ( θ y) p( y θ ) p( θ ) prior Bayesian statistics is an excellent alternative to be more reasonable for moderate and especially for small sample sizes when non Bayesian procedures do not work (e.g., Berger 1985, page 125). Data analysis is indispensable in any agricultural research. A large number of software have been developed and most common among them are SAS, SPSS, Minitab, S-PLUS and R. In the present study, R software was used for statistical and graphical analyses. It has an integrated suite of software for data manipulation, calculation, and graphical display. It has a large number of functions for data analysis. It has its own programming language, which is very effective and simple. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. 2. Material and Methods In the present paper, R-software is applied to study the Bayesian methods of agricultural data analysis this includes summary features of the data, that is, empirical mean standard, standard error of means, quantiles, posterior density of each of the variable is also plotted. Functions available in the R- software and MCMC pack of R-software are used for illustrating analytical as well as graphical view point. Existing data are used for the purpose of illustration. Concepts of Bayesian methods and R- software implementations are addressed in each section. 3. Bayesian Analysis of Linear Regression Model Analysis of simple regression model is illustrated here and multiple regression models can also be discussed on the similar lines, however one can get such results for multiple regression models on the similar lines. Example: wormy Fruits Percentage of wormy fruits attacked by codling moth larvae is greater on apple trees bearing small crop. Regressor x is the size of crop (hundreds of fruits) and response variable y is the percentage of wormy fruits ( e.g, Snedecor and Cochran 1989, page 162). The data frame wormyfruits consists of 12 rows and 2 columns having column names fruitsize and wormypercent for x and y, respectively.

3 9 fruitsize wormypercent Fit a Bayesian linear model for the data. # Look into the data graphically >x11(width=4, height=4) # To define height and width of Fig. > plot (wormypercent~fruitsize,data=wormyfruits) # Output is reported in Figure 1. wormypercent fruitsize Figure 1: This plot clearly suggests that a simple linear regression model can be fitted. We shall use MCMCregress of MCMCpack to analyze this model.

4 10 > library(mcmcpack) > M6<-MCMCregress (wormypercent~fruitsize, data = wormyfruits) > summary(m6) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) fruitsize sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) fruitsize sigma This is the numeric summary which clearly shows that both intercept and regression coefficient are statistically significant. Now we can get graphic summary also. To plot the posterior densities of the regression coefficients, we use the function plot as: >plot(m6,trace=false) Output is reported in Figure 2.

5 11 Density of (Intercept) Density of fruitsize N = Bandwidth = N = Bandwidth = Density of sigma N = Bandwidth = Figure 2: It is evident from this figure that all the required information is contained in posterior densities for parameters β, β and σ of the model wormyperce nt β + fruitsize + error = 0 β1 It may be noted that likelihood is Normal and prior is non-informative. 4. Bayesian Analysis of Designed Experiments 4.1. Bayesian Analysis of One Way Data Analysis of variance technique is commonly used to analyze a data generated in an experiment. Bayesian parallel is discussed here. Example: fat data Fat absorption data in which 4 type of fats are used to study the fat absorption patterns, and each fat was replicated 6 times. Purpose of study was to see absorption of different fats in doughnuts. Detail of data is available in Snedecor and Cochran1989, page 218. Replication Fat R1 R2 R3 R4 R5 R Fat Fat Fat

6 12 Fat A data frame fatdata has been created for the use of Bayesian modeling. Fit the data model as: > M7<-MCMCregress(absorption~Fat,data=fatdata) Print the summary of results as: > summary(m7) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) FatFat FatFat FatFat sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) FatFat FatFat FatFat sigma It is evident from this output that keeping Fat1 as baseline, Fat2 differ significantly from Fat1, whereas Fat3 and Fat4 do not differ significantly from Fat1. This is evidenced into graphic features of the Bayesian analysis also as graphic output is reported in Figure 3.

7 13 >plot(m7,trace=false) Density of (Intercept) Density of FatFat N = Bandw idth = N = Bandw idth = Density of FatFat3 Density of FatFat N = Bandw idth = N = Bandw idth = Density of sigma N = Bandw idth = Figure 3: Posterior summaries of MCMCregress for fatdata. This is the Bayesian couterpart of analysis of variance for one way data Bayesian Analysis of Factorial Experiments Example: cowpea data A data is reported in Snedecor and Cochran (1989), page 308, in which 3 levels of Variety and 3 levels of Spacing are the two factors with 4 Replications. Response is Yield of cowpea hay (lb/100 morgen plot). Design is factorial Randomized Block Design (RBD). Details of the data are as under:

8 14 Table 1: Data on yield of cowpea Variety Spacing Replication R1 R2 R3 R4 V1 S S S V2 S S S V3 S S S To get the Bayesian analysis of this data we use the function MCMCregress of MCMCpack. A data frame cowpea is constructed for Bayesian modeling. This data frame contains 36 rows and 4 columns of Replication, Spacing, Variety and yield. Model is fitted as: > M8<-MCMCregress(yield~Variety*Spacing, data=cowpea) > summary(m8) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) VarietyV VarietyV SpacingS SpacingS

9 15 VarietyV2:SpacingS VarietyV3:SpacingS VarietyV2:SpacingS VarietyV3:SpacingS Sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) VarietyV VarietyV SpacingS SpacingS VarietyV2:SpacingS VarietyV3:SpacingS VarietyV2:SpacingS VarietyV3:SpacingS Sigma

10 16 Density of (Intercept) Density of VarietyV N = Bandw idth = N = Bandw idth = Density of VarietyV3 Density of SpacingS N = Bandw idth = N = Bandw idth = Density of SpacingS N = Bandw idth = Density of VarietyV2:SpacingS N = Bandw idth = Figure 4: Posterior summaries of cowpea data generated in a factorial experiment. It is evident from these outputs that if V1 and S1 are kept as baseline, then varieties V2 and V3 differ significantly from V1. Similarly, S3 differs significantly from S1 whereas S2 does not differ significantly from S1. It is obvious that interaction V1S1 will be the baseline for testing interactions, and it is evident that only V2S3 differs significantly from V1S1, whereas V2S2, V3S2 and V3S3 do not differ significantly from V1S1. Posterior densities of interactions V3S2, V2S3 and V3S3 are not reported here. 5. Bayesian Analysis of Logistic Regression Model Example: radiotherapy data The data object radiotherapy consists of data taken from Mandenhall et al. (1989): Radiotherapy and Oncology 16, (See also Tanner 1996, page 28). The radiotherapy data frame contains data radio therapy of 24 patients in which rows represent patient and columns represent Days, number of days received by each patient and Response, absence (1) and presence (0) of disease at a site 3 years after treatment. This data does not have any reference of agricultural sciences, however, such type of

11 17 data are quite common in agricultural sciences too. The purpose of illustration of Bayesian logistic regression was the only aim to introduce such a data here. Days Response The model for the data is logistic regression model p i log( ) xi (1) 1 pi = α + β where x i represents the covariate for the ith patient, success (no disease). p i represents corresponding probability of

12 18 This model specifies that log-odds of success is linearly related to the number of days the subject received radiotherapy. The intercept α represents the log-odds of success for 0 days, while the slope β represent s the change in the log-odds of success for every unit increase in covariate. Thus from model (1) probability of success p i can be defined as pi ( xi ) = exp( α + βxi ) /(1 + exp( α + βxi )) Fitting the logic model for radiotherapy data using the function MCMClogit of MCMCpack. > M9<-MCMClogit(Response~Days,data=radiotherapy) The Metropolis acceptance rate for beta was > summary(m9) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) Days (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) Days To get graphic summary of Bayesian analysis >plot(m9,trace=false) #Output is reported in Figure 5.

13 19 Density of (Intercept) Density of Days N = Bandwidth = N = Bandwidth = Figure 5: Posterior summary of logistic regression model fitted for radiotherapy data discussed above. This figure clearly indicates that Days of therapy are significantly related to the probability of emergence of disease. 6. Bayesian Analysis of Mixed Effects Model (Hierarchical Bayes analysis) It is a well-known fact that mixed effects model lack theoretical foundations and Bayesian approach provides the grounds for it (e.g., Lindley and Smith, 1972) for detailed discussion. Kass and Steffey (1989) use the terms common effect and unit specific effects for fixed and random effects, respectively. In terms of priors, non-informative priors stand for fixed effects and informative priors for the random effects. However, in Bayesian spirit every effect is random. A practical implementation of this analysis has been made into lme4 package of R. Example: coagulation Effect of diet on coagulation time (seconds) for blood drawn from 24 animals randomly allocated to four different diets. (Gelman et al., 1995, page 274.; Box, Hunter and Hunter, 1978). Diet Coagulation time number of observations A B C D

14 20 A data frame coagulation contains the information desired for the analysis. This data frame contains 24 rows and two columns of diet and coagulation time. Bayesian analysis of the data can be made using R software in same spirit as it was done in the earlier examples. >print(dotplot(diet~coag.time,data=coagulation,xlab= Coagulation time(seconds),ylab= Diet )) D C Diet B A Coagulation time(seconds) Figure 6: Dot plot of coagulation data. This figure suggests random effect of intercept. Fitting the model using lmer2 function of lme4 package > M10<-lmer(coag.time~1+(1 diet),data=coagulation) > summary(m10) Linear mixed-effects model fit by REML Formula: coag.time ~ 1 + (1 diet) Data: coagulation AIC BIC loglik MLdeviance REMLdeviance Random effects: Groups Name Variance Std.Dev. diet (Intercept) Residual number of obs: 24, groups: diet, 4 Fixed effects: Estimate Std. Error t value (Intercept)

15 21 6. Simulations from M10 a Posterior Fitted by lmer An in depth Bayesian analysis of this data can be made using simulation tools available in R. For example to simulate 2000 observations from the fitted object M10 we use the function mcmcsamp as: > M10.mcmc<-mcmcsamp(M10,n=2000,deviance=TRUE) > summary(m10.mcmc) Iterations = 1:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 2000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) log(sigma^2) log(diet.(in)) Deviance (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) log(sigma^2) log(diet.(in)) Deviance >plot(m10.mcmc) #To get graphic summaries reported in Figure 7.

16 Trace of (Intercept) Iterations Density of (Intercept) N = 2000 Bandw idth = Trace of log(sigma^2) Density of log(sigma^2) Iterations N = 2000 Bandw idth = Trace of log(diet.(in)) Density of log(diet.(in)) Iterations N = 2000 Bandw idth = Trace of deviance Iterations Density of deviance N = 2000 Bandw idth = Figure 7: It is evident from above plots of posterior densities that except Intercept none of the posterior densities can be approximated by Normal approximation, a common approach used by non- Bayesians. 7. Conclusion It is clear from this study that Bayesian approach to agricultural data analysis is a very rich and useful tool. It provides in depth study of different features of the data which are otherwise hidden and cannot be explored using other techniques. Moreover, R software has a power and efficiency to deal with the numeric as well as graphic features of an agricultural data. Simulation tools are more powerful than any other statistical package. Future of the data analysis lies with Bayesian approach and R only.

17 23 References [1] Box, G. E. P., Hunter W. G., and Hunter J. S. (1978): Statistics for Experimenters. John Wiley. [2] Gelman, A., Carlin, J. B., Stern H. S. and Rubin, D. B. (1995): Bayesian Data Analysis. Chapman and Hall. [3] Kass, R. E. and Steffy, D. (1989): Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Amer. Statist. Assoc., 84: [4] Lindley, D. V. and Smith, A. F. M. (1972): Bayes estimates for the linear model (with discussion). J. R. Statist. Soc. Ser B 34: [5] R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL [6] Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods, 8th edition. IOWA State University Press, Ames. IOWA. [7] Tanner, M. A. (1996): Tools for Statistical Inference. Springer-Verlag [8] Venables, W. N. and Replay, D. B. (2002). Modern Applied Statistics with S-PLUS. Springer, New York.

data visualization and regression

data visualization and regression data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Introducing the Multilevel Model for Change

Introducing the Multilevel Model for Change Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.

More information

A Latent Variable Approach to Validate Credit Rating Systems using R

A Latent Variable Approach to Validate Credit Rating Systems using R A Latent Variable Approach to Validate Credit Rating Systems using R Chicago, April 24, 2009 Bettina Grün a, Paul Hofmarcher a, Kurt Hornik a, Christoph Leitner a, Stefan Pichler a a WU Wien Grün/Hofmarcher/Hornik/Leitner/Pichler

More information

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

Bayesian inference for population prediction of individuals without health insurance in Florida

Bayesian inference for population prediction of individuals without health insurance in Florida Bayesian inference for population prediction of individuals without health insurance in Florida Neung Soo Ha 1 1 NISS 1 / 24 Outline Motivation Description of the Behavioral Risk Factor Surveillance System,

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Basic Bayesian Methods

Basic Bayesian Methods 6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

More information

Linear regression methods for large n and streaming data

Linear regression methods for large n and streaming data Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is

More information

Electronic Theses and Dissertations UC Riverside

Electronic Theses and Dissertations UC Riverside Electronic Theses and Dissertations UC Riverside Peer Reviewed Title: Bayesian and Non-parametric Approaches to Missing Data Analysis Author: Yu, Yao Acceptance Date: 01 Series: UC Riverside Electronic

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Chapter 4 Models for Longitudinal Data

Chapter 4 Models for Longitudinal Data Chapter 4 Models for Longitudinal Data Longitudinal data consist of repeated measurements on the same subject (or some other experimental unit ) taken over time. Generally we wish to characterize the time

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

MIXED MODEL ANALYSIS USING R

MIXED MODEL ANALYSIS USING R Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

DISCUSSION PAPER ANALYSIS OF VARIANCE WHY IT IS MORE IMPORTANT THAN EVER 1. BY ANDREW GELMAN Columbia University

DISCUSSION PAPER ANALYSIS OF VARIANCE WHY IT IS MORE IMPORTANT THAN EVER 1. BY ANDREW GELMAN Columbia University The Annals of Statistics 2005, Vol. 33, No. 1, 1 53 DOI 10.1214/009053604000001048 Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE WHY IT IS MORE IMPORTANT THAN EVER 1

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification. COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Introduction to Bayesian Analysis Using SAS R Software

Introduction to Bayesian Analysis Using SAS R Software Introduction to Bayesian Analysis Using SAS R Software Joseph G. Ibrahim Department of Biostatistics University of North Carolina Introduction to Bayesian statistics Outline 1 Introduction to Bayesian

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

Mixed-effects regression and eye-tracking data

Mixed-effects regression and eye-tracking data Mixed-effects regression and eye-tracking data Lecture 2 of advanced regression methods for linguists Martijn Wieling and Jacolien van Rij Seminar für Sprachwissenschaft University of Tübingen LOT Summer

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Visualization of Complex Survey Data: Regression Diagnostics

Visualization of Complex Survey Data: Regression Diagnostics Visualization of Complex Survey Data: Regression Diagnostics Susan Hinkins 1, Edward Mulrow, Fritz Scheuren 3 1 NORC at the University of Chicago, 11 South 5th Ave, Bozeman MT 59715 NORC at the University

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining

More information

Sampling for Bayesian computation with large datasets

Sampling for Bayesian computation with large datasets Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation

More information

Introduction to Hierarchical Linear Modeling with R

Introduction to Hierarchical Linear Modeling with R Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package. A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package. Lab 2 - June, 2008 1 jointdata objects To analyse longitudinal data

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Fuzzy Probability Distributions in Bayesian Analysis

Fuzzy Probability Distributions in Bayesian Analysis Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 pollacks@stjohns.edu

More information

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model Bartolucci, A.A 1, Singh, K.P 2 and Bae, S.J 2 1 Dept. of Biostatistics, University of Alabama at Birmingham, Birmingham,

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

A Bayesian Antidote Against Strategy Sprawl

A Bayesian Antidote Against Strategy Sprawl A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Multiple Choice: 2 points each

Multiple Choice: 2 points each MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information