Applications of R Software in Bayesian Data Analysis



Similar documents
data visualization and regression

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Introducing the Multilevel Model for Change

A Latent Variable Approach to Validate Credit Rating Systems using R

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

More details on the inputs, functionality, and output can be found below.

Bayesian inference for population prediction of individuals without health insurance in Florida

Bayesian Statistics in One Hour. Patrick Lam

A Basic Introduction to Missing Data

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Simple Linear Regression Inference

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Basic Bayesian Methods

SUMAN DUVVURU STAT 567 PROJECT REPORT

Least Squares Estimation

Multivariate Logistic Regression

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Linear regression methods for large n and streaming data

Electronic Theses and Dissertations UC Riverside

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Chapter 4 Models for Longitudinal Data

Logistic Regression (a type of Generalized Linear Model)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

PS 271B: Quantitative Methods II. Lecture Notes

STATISTICA Formula Guide: Logistic Regression. Table of Contents

MIXED MODEL ANALYSIS USING R

17. SIMPLE LINEAR REGRESSION II

Handling missing data in Stata a whirlwind tour

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

How To Run Statistical Tests in Excel

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Introduction to Bayesian Analysis Using SAS R Software

SAS Software to Fit the Generalized Linear Model

Dealing with Missing Data

Statistical issues in the analysis of microarray data

Mixed-effects regression and eye-tracking data

Lecture 3: Linear methods for classification

Analysis of Bayesian Dynamic Linear Models

ANOVA. February 12, 2015

Statistical Machine Learning

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Bayesian Statistics: Indian Buffet Process

Note on the EM Algorithm in Linear Regression Model

Handling attrition and non-response in longitudinal data

2. Simple Linear Regression

11. Time series and dynamic linear models

Model-based Synthesis. Tony O Hagan

Introduction to General and Generalized Linear Models

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Package EstCRM. July 13, 2015

Time Series Analysis

Visualization of Complex Survey Data: Regression Diagnostics

Parallelization Strategies for Multicore Data Analysis

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Big Data, Statistics, and the Internet

SAS Syntax and Output for Data Manipulation:

Sampling for Bayesian computation with large datasets

Introduction to Hierarchical Linear Modeling with R

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

11. Analysis of Case-control Studies Logistic Regression

2. Linear regression with multiple regressors

1 Prior Probability and Posterior Probability

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.

Getting Correct Results from PROC REG

Interpretation of Somers D under four simple models

Linear Classification. Volker Tresp Summer 2015

Dongfeng Li. Autumn 2010

Fuzzy Probability Distributions in Bayesian Analysis

Pearson's Correlation Tests

VI. Introduction to Logistic Regression

Geostatistics Exploratory Analysis

Classification Problems

DATA INTERPRETATION AND STATISTICS

Statistics Graduate Courses

Fitting Subject-specific Curves to Grouped Longitudinal Data

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

BayesX - Software for Bayesian Inference in Structured Additive Regression

Statistics 104: Section 6!

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

A Bayesian Antidote Against Strategy Sprawl

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Some Essential Statistics The Lure of Statistics

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Gamma Distribution Fitting

Multiple Choice: 2 points each

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Transcription:

Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx ISSN: 2168-5754 Florida, USA Applications of R Software in Bayesian Data Analysis Nageena Nazir*, Athar Ali Khan A. H. Mir and Showkat Maqbool Division of Agricultural Statistics, Sher-e- Kashmir University of Agricultural Sciences & Technology Kashmir, Shalimar Srinagar-191121 * To whom correspondence should be addressed: E-Mail: nazir.nageena@gmail.com Article history: Received 15 May 2012, Received in revised form 29 May 2012, Accepted 29May 2012, Published 30 May 2012. Abstract: Bayesian statistics is an approach to statistics which formally seeks use of prior information with the data, and Baye s Theorem provides the formal basis for making use of both sources of information in a formal manner. The Bayesian analysis is the study of different features of posterior density. R software is used to explore these features from numeric as well as graphic view point. Proper emphasis has been given on graphical features throughout. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. Keywords: R software, Bayesian Data Analysis 1. Introduction Bayesian statistics is an approach to statistics, which formally seeks use of prior information and Baye's theorem provides the basis for making use of this information in a formal manner. When significant prior information is available, the Bayesian approach shows how to utilize it sensibly. This is not possible with most non Bayesian approaches. In Bayesian approach the parameter of interest is treated as random and data as fixed which is in contrast to frequents approach where parameter is treated as fixed and data as random. The business of statistics is to provide information or conclusion about uncertain quantities. The language of uncertainty is probability and only the conditional probability, Bayesian approach consistently uses this language to address uncertainty. Bayes Theorem states that

8 or equivalently posterior likelihood p ( θ y) p( y θ ) p( θ ) prior Bayesian statistics is an excellent alternative to be more reasonable for moderate and especially for small sample sizes when non Bayesian procedures do not work (e.g., Berger 1985, page 125). Data analysis is indispensable in any agricultural research. A large number of software have been developed and most common among them are SAS, SPSS, Minitab, S-PLUS and R. In the present study, R software was used for statistical and graphical analyses. It has an integrated suite of software for data manipulation, calculation, and graphical display. It has a large number of functions for data analysis. It has its own programming language, which is very effective and simple. In this study, Bayesian analyses have been covered on linear regression, analysis of designed experiments, analysis of mixed effect models and logistic regression analysis. Simulation approach of Bayesian analysis was found to be the most useful one. 2. Material and Methods In the present paper, R-software is applied to study the Bayesian methods of agricultural data analysis this includes summary features of the data, that is, empirical mean standard, standard error of means, quantiles, posterior density of each of the variable is also plotted. Functions available in the R- software and MCMC pack of R-software are used for illustrating analytical as well as graphical view point. Existing data are used for the purpose of illustration. Concepts of Bayesian methods and R- software implementations are addressed in each section. 3. Bayesian Analysis of Linear Regression Model Analysis of simple regression model is illustrated here and multiple regression models can also be discussed on the similar lines, however one can get such results for multiple regression models on the similar lines. Example: wormy Fruits Percentage of wormy fruits attacked by codling moth larvae is greater on apple trees bearing small crop. Regressor x is the size of crop (hundreds of fruits) and response variable y is the percentage of wormy fruits ( e.g, Snedecor and Cochran 1989, page 162). The data frame wormyfruits consists of 12 rows and 2 columns having column names fruitsize and wormypercent for x and y, respectively.

9 fruitsize wormypercent 8 59 6 58 11 56 22 53 14 50 17 45 18 43 24 42 19 39 23 38 26 30 40 27 Fit a Bayesian linear model for the data. # Look into the data graphically >x11(width=4, height=4) # To define height and width of Fig. > plot (wormypercent~fruitsize,data=wormyfruits) # Output is reported in Figure 1. wormypercent 30 35 40 45 50 55 60 5 10 15 20 25 30 35 40 fruitsize Figure 1: This plot clearly suggests that a simple linear regression model can be fitted. We shall use MCMCregress of MCMCpack to analyze this model.

10 > library(mcmcpack) > M6<-MCMCregress (wormypercent~fruitsize, data = wormyfruits) > summary(m6) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 64.297 4.0469 0.040469 0.036846 fruitsize -1.016 0.1936 0.001936 0.001757 sigma2 34.180 19.7964 0.197964 0.246161 (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 56.36 61.782 64.299 66.7742 72.5564 fruitsize -1.41-1.133-1.014-0.8949-0.6341 sigma2 13.58 21.713 29.227 40.2910 83.4581 This is the numeric summary which clearly shows that both intercept and regression coefficient are statistically significant. Now we can get graphic summary also. To plot the posterior densities of the regression coefficients, we use the function plot as: >plot(m6,trace=false) Output is reported in Figure 2.

11 Density of (Intercept) Density of fruitsize 0.00 0.08 0.0 1.5 30 40 50 60 70 80 N = 10000 Bandwidth = 0.6258-2.0-1.5-1.0-0.5 0.0 0.5 N = 10000 Bandwidth = 0.02987 Density of sigma2 0.000 0.025 0 100 200 300 400 500 N = 10000 Bandwidth = 2.329 Figure 2: It is evident from this figure that all the required information is contained in posterior densities for parameters β, β and 2 0 1 σ of the model wormyperce nt β + fruitsize + error = 0 β1 It may be noted that likelihood is Normal and prior is non-informative. 4. Bayesian Analysis of Designed Experiments 4.1. Bayesian Analysis of One Way Data Analysis of variance technique is commonly used to analyze a data generated in an experiment. Bayesian parallel is discussed here. Example: fat data Fat absorption data in which 4 type of fats are used to study the fat absorption patterns, and each fat was replicated 6 times. Purpose of study was to see absorption of different fats in doughnuts. Detail of data is available in Snedecor and Cochran1989, page 218. Replication Fat R1 R2 R3 R4 R5 R6 ---- ---------------------------------------------------- Fat1 64 72 68 77 56 95 Fat2 78 91 97 82 85 77 Fat3 75 93 78 71 63 76

12 Fat4 55 66 49 64 70 68 A data frame fatdata has been created for the use of Bayesian modeling. Fit the data model as: > M7<-MCMCregress(absorption~Fat,data=fatdata) Print the summary of results as: > summary(m7) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 71.984 4.359 0.04359 0.04314 FatFat2 13.027 6.197 0.06197 0.05058 FatFat3 4.006 6.103 0.06103 0.05917 FatFat4-9.955 6.089 0.06089 0.06808 sigma2 111.596 39.254 0.39254 0.52445 (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 63.569 69.1487 71.977 74.764 80.760 FatFat2 0.705 9.0827 13.033 17.082 25.153 FatFat3-8.175 0.1061 4.066 7.890 16.093 FatFat4-22.146-13.8333-9.938-5.923 1.908 sigma2 58.664 84.1986 104.026 130.268 208.026 It is evident from this output that keeping Fat1 as baseline, Fat2 differ significantly from Fat1, whereas Fat3 and Fat4 do not differ significantly from Fat1. This is evidenced into graphic features of the Bayesian analysis also as graphic output is reported in Figure 3.

13 >plot(m7,trace=false) Density of (Intercept) Density of FatFat2 0.00 0.04 0.08 0.00 0.03 0.06 50 60 70 80 90 100 N = 10000 Bandw idth = 0.704-20 0 20 40 N = 10000 Bandw idth = 1.003 Density of FatFat3 Density of FatFat4 0.00 0.03 0.06 0.00 0.03 0.06-30 -20-10 0 10 20 30 40 N = 10000 Bandw idth = 0.9759-40 -30-20 -10 0 10 20 N = 10000 Bandw idth = 0.9917 Density of sigma2 0.000 0.006 0.012 0 100 200 300 400 500 N = 10000 Bandw idth = 5.776 Figure 3: Posterior summaries of MCMCregress for fatdata. This is the Bayesian couterpart of analysis of variance for one way data. 4.2. Bayesian Analysis of Factorial Experiments Example: cowpea data A data is reported in Snedecor and Cochran (1989), page 308, in which 3 levels of Variety and 3 levels of Spacing are the two factors with 4 Replications. Response is Yield of cowpea hay (lb/100 morgen plot). Design is factorial Randomized Block Design (RBD). Details of the data are as under:

14 Table 1: Data on yield of cowpea Variety Spacing Replication ------------------------------- R1 R2 R3 R4 V1 S1 56 45 43 46 S2 60 50 45 48 S3 66 57 50 50 V2 S1 65 61 60 63 S2 60 58 56 60 S3 53 53 48 55 V3 S1 60 61 50 53 S2 62 68 67 60 S3 73 77 77 65 To get the Bayesian analysis of this data we use the function MCMCregress of MCMCpack. A data frame cowpea is constructed for Bayesian modeling. This data frame contains 36 rows and 4 columns of Replication, Spacing, Variety and yield. Model is fitted as: > M8<-MCMCregress(yield~Variety*Spacing, data=cowpea) > summary(m8) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 47.504 2.596 0.02596 0.02602 VarietyV2 14.696 3.650 0.03650 0.03774 VarietyV3 8.470 3.683 0.03683 0.03156 SpacingS2 3.244 3.691 0.03691 0.03641 SpacingS3 8.267 3.660 0.03660 0.04597

15 VarietyV2:SpacingS2-6.958 5.244 0.05244 0.05448 VarietyV3:SpacingS2 5.051 5.260 0.05260 0.05498 VarietyV2:SpacingS3-18.212 5.145 0.05145 0.05924 VarietyV3:SpacingS3 8.762 5.214 0.05214 0.05472 Sigma2 27.226 8.070 0.08070 0.10528 (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 42.4758 45.7993 47.515 49.196 52.772 VarietyV2 7.5450 12.2917 14.664 17.098 21.846 VarietyV3 1.2622 6.0279 8.506 10.906 15.753 SpacingS2-4.1768 0.8687 3.280 5.619 10.484 SpacingS3 0.9234 5.9211 8.283 10.700 15.322 VarietyV2:SpacingS2-17.5002-10.4468-6.931-3.504 3.302 VarietyV3:SpacingS2-5.2450 1.5982 5.074 8.469 15.479 VarietyV2:SpacingS3-28.3504-21.5851-18.259-14.780-8.096 VarietyV3:SpacingS3-1.3725 5.3133 8.658 12.183 19.064 Sigma2 15.6872 21.5315 25.809 31.258 47.066

16 Density of (Intercept) Density of VarietyV2 0.00 0.10 0.00 0.06 35 40 45 50 55 60 N = 10000 Bandw idth = 0.4258-5 0 5 10 15 20 25 30 N = 10000 Bandw idth = 0.6025 Density of VarietyV3 Density of SpacingS2 0.00 0.06 0.00 0.06-10 0 10 20 30 N = 10000 Bandw idth = 0.6115-10 0 10 20 N = 10000 Bandw idth = 0.5956 0.00 0.06 Density of SpacingS3-10 0 10 20 N = 10000 Bandw idth = 0.5992 0.00 0.04 0.08 Density of VarietyV2:SpacingS2-30 -20-10 0 10 20 N = 10000 Bandw idth = 0.8704 Figure 4: Posterior summaries of cowpea data generated in a factorial experiment. It is evident from these outputs that if V1 and S1 are kept as baseline, then varieties V2 and V3 differ significantly from V1. Similarly, S3 differs significantly from S1 whereas S2 does not differ significantly from S1. It is obvious that interaction V1S1 will be the baseline for testing interactions, and it is evident that only V2S3 differs significantly from V1S1, whereas V2S2, V3S2 and V3S3 do not differ significantly from V1S1. Posterior densities of interactions V3S2, V2S3 and V3S3 are not reported here. 5. Bayesian Analysis of Logistic Regression Model Example: radiotherapy data The data object radiotherapy consists of data taken from Mandenhall et al. (1989): Radiotherapy and Oncology 16, 275-282. (See also Tanner 1996, page 28). The radiotherapy data frame contains data radio therapy of 24 patients in which rows represent patient and columns represent Days, number of days received by each patient and Response, absence (1) and presence (0) of disease at a site 3 years after treatment. This data does not have any reference of agricultural sciences, however, such type of

17 data are quite common in agricultural sciences too. The purpose of illustration of Bayesian logistic regression was the only aim to introduce such a data here. Days Response 21 1 24 1 25 1 26 1 28 1 31 1 33 1 34 1 35 1 37 1 43 1 49 1 51 1 55 1 25 0 29 0 43 0 44 0 46 0 46 0 51 0 55 0 56 0 58 0 The model for the data is logistic regression model p i log( ) xi (1) 1 pi = α + β where x i represents the covariate for the ith patient, success (no disease). p i represents corresponding probability of

18 This model specifies that log-odds of success is linearly related to the number of days the subject received radiotherapy. The intercept α represents the log-odds of success for 0 days, while the slope β represent s the change in the log-odds of success for every unit increase in covariate. Thus from model (1) probability of success p i can be defined as pi ( xi ) = exp( α + βxi ) /(1 + exp( α + βxi )) Fitting the logic model for radiotherapy data using the function MCMClogit of MCMCpack. > M9<-MCMClogit(Response~Days,data=radiotherapy) The Metropolis acceptance rate for beta was 0.53945 > summary(m9) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 4.34877 1.98478 0.0198478 0.054296 Days -0.09796 0.04664 0.0004664 0.001277 (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 0.6595 2.9172 4.2581 5.65795 8.51977 Days -0.1936-0.1288-0.0968-0.06571-0.01275 To get graphic summary of Bayesian analysis >plot(m9,trace=false) #Output is reported in Figure 5.

19 Density of (Intercept) Density of Days 0.00 0.10 0.20 0 2 4 6 8 0 5 10 N = 10000 Bandwidth = 0.3334-0.3-0.2-0.1 0.0 N = 10000 Bandwidth = 0.007835 Figure 5: Posterior summary of logistic regression model fitted for radiotherapy data discussed above. This figure clearly indicates that Days of therapy are significantly related to the probability of emergence of disease. 6. Bayesian Analysis of Mixed Effects Model (Hierarchical Bayes analysis) It is a well-known fact that mixed effects model lack theoretical foundations and Bayesian approach provides the grounds for it (e.g., Lindley and Smith, 1972) for detailed discussion. Kass and Steffey (1989) use the terms common effect and unit specific effects for fixed and random effects, respectively. In terms of priors, non-informative priors stand for fixed effects and informative priors for the random effects. However, in Bayesian spirit every effect is random. A practical implementation of this analysis has been made into lme4 package of R. Example: coagulation Effect of diet on coagulation time (seconds) for blood drawn from 24 animals randomly allocated to four different diets. (Gelman et al., 1995, page 274.; Box, Hunter and Hunter, 1978). Diet Coagulation time number of observations A 62 60 63 59 4 B 63 67 71 64 65 66 6 C 68 66 71 67 68 68 6 D 56 62 60 61 63 64 63 59 8

20 A data frame coagulation contains the information desired for the analysis. This data frame contains 24 rows and two columns of diet and coagulation time. Bayesian analysis of the data can be made using R software in same spirit as it was done in the earlier examples. >print(dotplot(diet~coag.time,data=coagulation,xlab= Coagulation time(seconds),ylab= Diet )) D C Diet B A 60 65 70 Coagulation time(seconds) Figure 6: Dot plot of coagulation data. This figure suggests random effect of intercept. Fitting the model using lmer2 function of lme4 package > M10<-lmer(coag.time~1+(1 diet),data=coagulation) > summary(m10) Linear mixed-effects model fit by REML Formula: coag.time ~ 1 + (1 diet) Data: coagulation AIC BIC loglik MLdeviance REMLdeviance 119.8 122.1-57.89 118.8 115.8 Random effects: Groups Name Variance Std.Dev. diet (Intercept) 11.6915 3.4193 Residual 5.5994 2.3663 number of obs: 24, groups: diet, 4 Fixed effects: Estimate Std. Error t value (Intercept) 64.01 1.78 35.96

21 6. Simulations from M10 a Posterior Fitted by lmer An in depth Bayesian analysis of this data can be made using simulation tools available in R. For example to simulate 2000 observations from the fitted object M10 we use the function mcmcsamp as: > M10.mcmc<-mcmcsamp(M10,n=2000,deviance=TRUE) > summary(m10.mcmc) Iterations = 1:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 2000 (1). Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 63.919 2.7592 0.061697 0.074906 log(sigma^2) 1.787 0.3217 0.007193 0.009917 log(diet.(in)) 2.706 1.0420 0.023299 0.035873 Deviance 122.134 2.9526 0.066023 0.090958 (2). Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 58.429 62.633 63.960 65.265 69.150 log(sigma^2) 1.209 1.557 1.777 1.981 2.488 log(diet.(in)) 1.021 1.988 2.567 3.266 5.161 Deviance 118.866 119.956 121.234 123.405 129.842 >plot(m10.mcmc) #To get graphic summaries reported in Figure 7.

22 50 80 Trace of (Intercept) 0 500 1000 1500 2000 Iterations 0.00 0.20 Density of (Intercept) 50 60 70 80 N = 2000 Bandw idth = 0.4553 Trace of log(sigma^2) Density of log(sigma^2) 1.0 3.0 0.0 1.0 0 500 1000 1500 2000 Iterations 1.0 1.5 2.0 2.5 3.0 N = 2000 Bandw idth = 0.07326 Trace of log(diet.(in)) Density of log(diet.(in)) 0 4 0.0 0.3 0 500 1000 1500 2000 Iterations 0 2 4 6 8 N = 2000 Bandw idth = 0.2211 120 135 Trace of deviance 0 500 1000 1500 2000 Iterations 0.00 0.20 Density of deviance 120 125 130 135 140 N = 2000 Bandw idth = 0.5965 Figure 7: It is evident from above plots of posterior densities that except Intercept none of the posterior densities can be approximated by Normal approximation, a common approach used by non- Bayesians. 7. Conclusion It is clear from this study that Bayesian approach to agricultural data analysis is a very rich and useful tool. It provides in depth study of different features of the data which are otherwise hidden and cannot be explored using other techniques. Moreover, R software has a power and efficiency to deal with the numeric as well as graphic features of an agricultural data. Simulation tools are more powerful than any other statistical package. Future of the data analysis lies with Bayesian approach and R only.

23 References [1] Box, G. E. P., Hunter W. G., and Hunter J. S. (1978): Statistics for Experimenters. John Wiley. [2] Gelman, A., Carlin, J. B., Stern H. S. and Rubin, D. B. (1995): Bayesian Data Analysis. Chapman and Hall. [3] Kass, R. E. and Steffy, D. (1989): Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Amer. Statist. Assoc., 84:717-726. [4] Lindley, D. V. and Smith, A. F. M. (1972): Bayes estimates for the linear model (with discussion). J. R. Statist. Soc. Ser B 34: 1-41. [5] R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.r-project.org. [6] Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods, 8th edition. IOWA State University Press, Ames. IOWA. [7] Tanner, M. A. (1996): Tools for Statistical Inference. Springer-Verlag [8] Venables, W. N. and Replay, D. B. (2002). Modern Applied Statistics with S-PLUS. Springer, New York.