Applications of R Software in Bayesian Data Analysis


 Herbert Ryan
 2 years ago
 Views:
Transcription
1 Article International Journal of Information Science and System, 2012, 1(1): 723 International Journal of Information Science and System Journal homepage: ISSN: Florida, USA Applications of R Software in Bayesian Data Analysis Nageena Nazir*, Athar Ali Khan A. H. Mir and Showkat Maqbool Division of Agricultural Statistics, Shere Kashmir University of Agricultural Sciences & Technology Kashmir, Shalimar Srinagar * To whom correspondence should be addressed: Article history: Received 15 May 2012, Received in revised form 29 May 2012, Accepted 29May 2012, Published 30 May Abstract: Bayesian statistics is an approach to statistics which formally seeks use of prior information with the data, and Baye s Theorem provides the formal basis for making use of both sources of information in a formal manner. The Bayesian analysis is the study of different features of posterior density. R software is used to explore these features from numeric as well as graphic view point. Proper emphasis has been given on graphical features throughout. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. Keywords: R software, Bayesian Data Analysis 1. Introduction Bayesian statistics is an approach to statistics, which formally seeks use of prior information and Baye's theorem provides the basis for making use of this information in a formal manner. When significant prior information is available, the Bayesian approach shows how to utilize it sensibly. This is not possible with most non Bayesian approaches. In Bayesian approach the parameter of interest is treated as random and data as fixed which is in contrast to frequents approach where parameter is treated as fixed and data as random. The business of statistics is to provide information or conclusion about uncertain quantities. The language of uncertainty is probability and only the conditional probability, Bayesian approach consistently uses this language to address uncertainty. Bayes Theorem states that
2 8 or equivalently posterior likelihood p ( θ y) p( y θ ) p( θ ) prior Bayesian statistics is an excellent alternative to be more reasonable for moderate and especially for small sample sizes when non Bayesian procedures do not work (e.g., Berger 1985, page 125). Data analysis is indispensable in any agricultural research. A large number of software have been developed and most common among them are SAS, SPSS, Minitab, SPLUS and R. In the present study, R software was used for statistical and graphical analyses. It has an integrated suite of software for data manipulation, calculation, and graphical display. It has a large number of functions for data analysis. It has its own programming language, which is very effective and simple. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. 2. Material and Methods In the present paper, Rsoftware is applied to study the Bayesian methods of agricultural data analysis this includes summary features of the data, that is, empirical mean standard, standard error of means, quantiles, posterior density of each of the variable is also plotted. Functions available in the R software and MCMC pack of Rsoftware are used for illustrating analytical as well as graphical view point. Existing data are used for the purpose of illustration. Concepts of Bayesian methods and R software implementations are addressed in each section. 3. Bayesian Analysis of Linear Regression Model Analysis of simple regression model is illustrated here and multiple regression models can also be discussed on the similar lines, however one can get such results for multiple regression models on the similar lines. Example: wormy Fruits Percentage of wormy fruits attacked by codling moth larvae is greater on apple trees bearing small crop. Regressor x is the size of crop (hundreds of fruits) and response variable y is the percentage of wormy fruits ( e.g, Snedecor and Cochran 1989, page 162). The data frame wormyfruits consists of 12 rows and 2 columns having column names fruitsize and wormypercent for x and y, respectively.
3 9 fruitsize wormypercent Fit a Bayesian linear model for the data. # Look into the data graphically >x11(width=4, height=4) # To define height and width of Fig. > plot (wormypercent~fruitsize,data=wormyfruits) # Output is reported in Figure 1. wormypercent fruitsize Figure 1: This plot clearly suggests that a simple linear regression model can be fitted. We shall use MCMCregress of MCMCpack to analyze this model.
4 10 > library(mcmcpack) > M6<MCMCregress (wormypercent~fruitsize, data = wormyfruits) > summary(m6) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Timeseries SE (Intercept) fruitsize sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) fruitsize sigma This is the numeric summary which clearly shows that both intercept and regression coefficient are statistically significant. Now we can get graphic summary also. To plot the posterior densities of the regression coefficients, we use the function plot as: >plot(m6,trace=false) Output is reported in Figure 2.
5 11 Density of (Intercept) Density of fruitsize N = Bandwidth = N = Bandwidth = Density of sigma N = Bandwidth = Figure 2: It is evident from this figure that all the required information is contained in posterior densities for parameters β, β and σ of the model wormyperce nt β + fruitsize + error = 0 β1 It may be noted that likelihood is Normal and prior is noninformative. 4. Bayesian Analysis of Designed Experiments 4.1. Bayesian Analysis of One Way Data Analysis of variance technique is commonly used to analyze a data generated in an experiment. Bayesian parallel is discussed here. Example: fat data Fat absorption data in which 4 type of fats are used to study the fat absorption patterns, and each fat was replicated 6 times. Purpose of study was to see absorption of different fats in doughnuts. Detail of data is available in Snedecor and Cochran1989, page 218. Replication Fat R1 R2 R3 R4 R5 R Fat Fat Fat
6 12 Fat A data frame fatdata has been created for the use of Bayesian modeling. Fit the data model as: > M7<MCMCregress(absorption~Fat,data=fatdata) Print the summary of results as: > summary(m7) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Timeseries SE (Intercept) FatFat FatFat FatFat sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) FatFat FatFat FatFat sigma It is evident from this output that keeping Fat1 as baseline, Fat2 differ significantly from Fat1, whereas Fat3 and Fat4 do not differ significantly from Fat1. This is evidenced into graphic features of the Bayesian analysis also as graphic output is reported in Figure 3.
7 13 >plot(m7,trace=false) Density of (Intercept) Density of FatFat N = Bandw idth = N = Bandw idth = Density of FatFat3 Density of FatFat N = Bandw idth = N = Bandw idth = Density of sigma N = Bandw idth = Figure 3: Posterior summaries of MCMCregress for fatdata. This is the Bayesian couterpart of analysis of variance for one way data Bayesian Analysis of Factorial Experiments Example: cowpea data A data is reported in Snedecor and Cochran (1989), page 308, in which 3 levels of Variety and 3 levels of Spacing are the two factors with 4 Replications. Response is Yield of cowpea hay (lb/100 morgen plot). Design is factorial Randomized Block Design (RBD). Details of the data are as under:
8 14 Table 1: Data on yield of cowpea Variety Spacing Replication R1 R2 R3 R4 V1 S S S V2 S S S V3 S S S To get the Bayesian analysis of this data we use the function MCMCregress of MCMCpack. A data frame cowpea is constructed for Bayesian modeling. This data frame contains 36 rows and 4 columns of Replication, Spacing, Variety and yield. Model is fitted as: > M8<MCMCregress(yield~Variety*Spacing, data=cowpea) > summary(m8) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Timeseries SE (Intercept) VarietyV VarietyV SpacingS SpacingS
9 15 VarietyV2:SpacingS VarietyV3:SpacingS VarietyV2:SpacingS VarietyV3:SpacingS Sigma (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) VarietyV VarietyV SpacingS SpacingS VarietyV2:SpacingS VarietyV3:SpacingS VarietyV2:SpacingS VarietyV3:SpacingS Sigma
10 16 Density of (Intercept) Density of VarietyV N = Bandw idth = N = Bandw idth = Density of VarietyV3 Density of SpacingS N = Bandw idth = N = Bandw idth = Density of SpacingS N = Bandw idth = Density of VarietyV2:SpacingS N = Bandw idth = Figure 4: Posterior summaries of cowpea data generated in a factorial experiment. It is evident from these outputs that if V1 and S1 are kept as baseline, then varieties V2 and V3 differ significantly from V1. Similarly, S3 differs significantly from S1 whereas S2 does not differ significantly from S1. It is obvious that interaction V1S1 will be the baseline for testing interactions, and it is evident that only V2S3 differs significantly from V1S1, whereas V2S2, V3S2 and V3S3 do not differ significantly from V1S1. Posterior densities of interactions V3S2, V2S3 and V3S3 are not reported here. 5. Bayesian Analysis of Logistic Regression Model Example: radiotherapy data The data object radiotherapy consists of data taken from Mandenhall et al. (1989): Radiotherapy and Oncology 16, (See also Tanner 1996, page 28). The radiotherapy data frame contains data radio therapy of 24 patients in which rows represent patient and columns represent Days, number of days received by each patient and Response, absence (1) and presence (0) of disease at a site 3 years after treatment. This data does not have any reference of agricultural sciences, however, such type of
11 17 data are quite common in agricultural sciences too. The purpose of illustration of Bayesian logistic regression was the only aim to introduce such a data here. Days Response The model for the data is logistic regression model p i log( ) xi (1) 1 pi = α + β where x i represents the covariate for the ith patient, success (no disease). p i represents corresponding probability of
12 18 This model specifies that logodds of success is linearly related to the number of days the subject received radiotherapy. The intercept α represents the logodds of success for 0 days, while the slope β represent s the change in the logodds of success for every unit increase in covariate. Thus from model (1) probability of success p i can be defined as pi ( xi ) = exp( α + βxi ) /(1 + exp( α + βxi )) Fitting the logic model for radiotherapy data using the function MCMClogit of MCMCpack. > M9<MCMClogit(Response~Days,data=radiotherapy) The Metropolis acceptance rate for beta was > summary(m9) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Timeseries SE (Intercept) Days (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) Days To get graphic summary of Bayesian analysis >plot(m9,trace=false) #Output is reported in Figure 5.
13 19 Density of (Intercept) Density of Days N = Bandwidth = N = Bandwidth = Figure 5: Posterior summary of logistic regression model fitted for radiotherapy data discussed above. This figure clearly indicates that Days of therapy are significantly related to the probability of emergence of disease. 6. Bayesian Analysis of Mixed Effects Model (Hierarchical Bayes analysis) It is a wellknown fact that mixed effects model lack theoretical foundations and Bayesian approach provides the grounds for it (e.g., Lindley and Smith, 1972) for detailed discussion. Kass and Steffey (1989) use the terms common effect and unit specific effects for fixed and random effects, respectively. In terms of priors, noninformative priors stand for fixed effects and informative priors for the random effects. However, in Bayesian spirit every effect is random. A practical implementation of this analysis has been made into lme4 package of R. Example: coagulation Effect of diet on coagulation time (seconds) for blood drawn from 24 animals randomly allocated to four different diets. (Gelman et al., 1995, page 274.; Box, Hunter and Hunter, 1978). Diet Coagulation time number of observations A B C D
14 20 A data frame coagulation contains the information desired for the analysis. This data frame contains 24 rows and two columns of diet and coagulation time. Bayesian analysis of the data can be made using R software in same spirit as it was done in the earlier examples. >print(dotplot(diet~coag.time,data=coagulation,xlab= Coagulation time(seconds),ylab= Diet )) D C Diet B A Coagulation time(seconds) Figure 6: Dot plot of coagulation data. This figure suggests random effect of intercept. Fitting the model using lmer2 function of lme4 package > M10<lmer(coag.time~1+(1 diet),data=coagulation) > summary(m10) Linear mixedeffects model fit by REML Formula: coag.time ~ 1 + (1 diet) Data: coagulation AIC BIC loglik MLdeviance REMLdeviance Random effects: Groups Name Variance Std.Dev. diet (Intercept) Residual number of obs: 24, groups: diet, 4 Fixed effects: Estimate Std. Error t value (Intercept)
15 21 6. Simulations from M10 a Posterior Fitted by lmer An in depth Bayesian analysis of this data can be made using simulation tools available in R. For example to simulate 2000 observations from the fitted object M10 we use the function mcmcsamp as: > M10.mcmc<mcmcsamp(M10,n=2000,deviance=TRUE) > summary(m10.mcmc) Iterations = 1:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 2000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Timeseries SE (Intercept) log(sigma^2) log(diet.(in)) Deviance (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) log(sigma^2) log(diet.(in)) Deviance >plot(m10.mcmc) #To get graphic summaries reported in Figure 7.
16 Trace of (Intercept) Iterations Density of (Intercept) N = 2000 Bandw idth = Trace of log(sigma^2) Density of log(sigma^2) Iterations N = 2000 Bandw idth = Trace of log(diet.(in)) Density of log(diet.(in)) Iterations N = 2000 Bandw idth = Trace of deviance Iterations Density of deviance N = 2000 Bandw idth = Figure 7: It is evident from above plots of posterior densities that except Intercept none of the posterior densities can be approximated by Normal approximation, a common approach used by non Bayesians. 7. Conclusion It is clear from this study that Bayesian approach to agricultural data analysis is a very rich and useful tool. It provides in depth study of different features of the data which are otherwise hidden and cannot be explored using other techniques. Moreover, R software has a power and efficiency to deal with the numeric as well as graphic features of an agricultural data. Simulation tools are more powerful than any other statistical package. Future of the data analysis lies with Bayesian approach and R only.
17 23 References [1] Box, G. E. P., Hunter W. G., and Hunter J. S. (1978): Statistics for Experimenters. John Wiley. [2] Gelman, A., Carlin, J. B., Stern H. S. and Rubin, D. B. (1995): Bayesian Data Analysis. Chapman and Hall. [3] Kass, R. E. and Steffy, D. (1989): Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Amer. Statist. Assoc., 84: [4] Lindley, D. V. and Smith, A. F. M. (1972): Bayes estimates for the linear model (with discussion). J. R. Statist. Soc. Ser B 34: [5] R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL [6] Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods, 8th edition. IOWA State University Press, Ames. IOWA. [7] Tanner, M. A. (1996): Tools for Statistical Inference. SpringerVerlag [8] Venables, W. N. and Replay, D. B. (2002). Modern Applied Statistics with SPLUS. Springer, New York.
data visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling  A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationA Latent Variable Approach to Validate Credit Rating Systems using R
A Latent Variable Approach to Validate Credit Rating Systems using R Chicago, April 24, 2009 Bettina Grün a, Paul Hofmarcher a, Kurt Hornik a, Christoph Leitner a, Stefan Pichler a a WU Wien Grün/Hofmarcher/Hornik/Leitner/Pichler
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationBayesian inference for population prediction of individuals without health insurance in Florida
Bayesian inference for population prediction of individuals without health insurance in Florida Neung Soo Ha 1 1 NISS 1 / 24 Outline Motivation Description of the Behavioral Risk Factor Surveillance System,
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationBasic Bayesian Methods
6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA
More informationHighlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression
GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data
More informationChapter 4 Models for Longitudinal Data
Chapter 4 Models for Longitudinal Data Longitudinal data consist of repeated measurements on the same subject (or some other experimental unit ) taken over time. Generally we wish to characterize the time
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulationbased method designed to establish that software
More informationA Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationMIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationElectronic Theses and Dissertations UC Riverside
Electronic Theses and Dissertations UC Riverside Peer Reviewed Title: Bayesian and Nonparametric Approaches to Missing Data Analysis Author: Yu, Yao Acceptance Date: 01 Series: UC Riverside Electronic
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDISCUSSION PAPER ANALYSIS OF VARIANCE WHY IT IS MORE IMPORTANT THAN EVER 1. BY ANDREW GELMAN Columbia University
The Annals of Statistics 2005, Vol. 33, No. 1, 1 53 DOI 10.1214/009053604000001048 Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE WHY IT IS MORE IMPORTANT THAN EVER 1
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationLinear regression methods for large n and streaming data
Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationA Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationMixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions
Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 20110316 Douglas
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationCOURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 20152016 Academic Year Qualification.
COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 20152016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data
More informationA short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.
A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package. Lab 2  June, 2008 1 jointdata objects To analyse longitudinal data
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationHandling attrition and nonresponse in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 6372 Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationDealing with Missing Data
Res. Lett. Inf. Math. Sci. (2002) 3, 153160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationInternational Journal of Modern Computer Science & Engineering, 2012, 1(1): International Journal of Modern Computer Science & Engineering
International Journal of Modern Computer Science & Engineering, 2012, 1(1): 110 International Journal of Modern Computer Science & Engineering Journal homepage:www.modernscientificpress.com/journals/ijmcse.aspx
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationFEV1 (litres) Figure 1: Models for gas consumption and lung capacity
Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure 1 (a), we ve fitted a model relating a household
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationWednesday PM. Multiple regression. Multiple regression in SPSS. Presentation of AM results Multiple linear regression. Logistic regression
Wednesday PM Presentation of AM results Multiple linear regression Simultaneous Stepwise Hierarchical Logistic regression Multiple regression Multiple regression extends simple linear regression to consider
More informationSAS R IML (Introduction at the Master s Level)
SAS R IML (Introduction at the Master s Level) Anton Bekkerman, Ph.D., Montana State University, Bozeman, MT ABSTRACT Most graduatelevel statistics and econometrics programs require a more advanced knowledge
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationIntroduction to Bayesian Analysis Using SAS R Software
Introduction to Bayesian Analysis Using SAS R Software Joseph G. Ibrahim Department of Biostatistics University of North Carolina Introduction to Bayesian statistics Outline 1 Introduction to Bayesian
More informationSAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling WithinPerson Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More informationVisualization of Complex Survey Data: Regression Diagnostics
Visualization of Complex Survey Data: Regression Diagnostics Susan Hinkins 1, Edward Mulrow, Fritz Scheuren 3 1 NORC at the University of Chicago, 11 South 5th Ave, Bozeman MT 59715 NORC at the University
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationMixedeffects regression and eyetracking data
Mixedeffects regression and eyetracking data Lecture 2 of advanced regression methods for linguists Martijn Wieling and Jacolien van Rij Seminar für Sprachwissenschaft University of Tübingen LOT Summer
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationAnalyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model
Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model Bartolucci, A.A 1, Singh, K.P 2 and Bae, S.J 2 1 Dept. of Biostatistics, University of Alabama at Birmingham, Birmingham,
More informationFull Factorial Design of Experiments
Full Factorial Design of Experiments 0 Module Objectives Module Objectives By the end of this module, the participant will: Generate a full factorial design Look for factor interactions Develop coded orthogonal
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationIntroduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 1210 SCIENCE 010 5 6 7 8 40 30 20 10 010 40 1 2 3 4 30 20 10 010 5 10 15
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 18831889 Note on the M Algorithm in Linear Regression Model JiXia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationStatistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Deardra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm3pm in SC 109, Thursday 5pm6pm in SC 705 Office Hours: Thursday 6pm7pm SC
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationSimulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 pollacks@stjohns.edu
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationMSwM examples. Jose A. SanchezEspigares, Alberto LopezMoreno Dept. of Statistics and Operations Research UPCBarcelonaTech.
MSwM examples Jose A. SanchezEspigares, Alberto LopezMoreno Dept. of Statistics and Operations Research UPCBarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015711 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationLecture 7 Linear Regression Diagnostics
Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis WeiChen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationCentre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationRegression Analysis. Pekka Tolonen
Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationAspects in Development of Statistic Data Analysis in Romanian Sanitary System
Aspects in Development of Statistic Data Analysis in Romanian Sanitary System DANA SIMIAN 1, CORINA SIMIAN 1, OANA DANCIU 2, LAVINIA DANCIU 3 1 Faculty of Sciences University Lucian Blaga Sibiu Str. Ion
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More information