Bayesian Methods for the Social and Behavioral Sciences Jeff Gill Harvard University 2007 ICPSR First Session: June 25-July 20, 9-11 AM. Email: jgill@iq.harvard.edu TA: Yu-Sung Su (ys463@columbia.edu). Course Description: Workshop: 3 credits. This course introduces the theoretical and applied foundations of Bayesian statistical analysis in a manner accessible to social and behavioral scientists. The Bayesian paradigm is ideally suited to the type of data analysis required of social scientists because it recognizes the mobility of population parameters, incorporates prior knowledge that researchers possess, and updates estimates as new data are observed. This course will introduce the basic principles of Bayesian statistics to students in the social and behavioral sciences without requiring an extensive background in mathematical statistics. Most of the examples will be drawn from sociology, political science, economics, marketing, psychology, public policy, and anthropology. The prerequisites for this course are a linear regression course and knowledge of matrix algebra. The emphasis will be on applying the principles to actual data-analytic problems of interest to participants rather than through textbook examples. The course will make extensive use of software that is in the public domain, yet is high in quality. The course includes basic topics such as setting up a probability model, conditioning on observed data, and the essential ideas behind likelihood inference and prediction. The fundamentals of Bayesian statistics are reviewed, including Bayes Law and prior and posterior distributions, as well as summarizing the model and checking sensitivity to the assumptions. Practical applications will be developed with a variety of parametric forms including so-called non-informative prior densities. All of the fundamental Bayesian simulation techniques will be reviewed including numerical integration, importance sampling, the EM algorithm, and the primary Markov Chain Monte Carlo algorithms: Gibbs sampling and Metropolis-Hastings. Note: this course is part of the new track of advanced courses that include Maximum Likelihood, Linear, Nonlinear, and Regression III: Advanced Methods (Modern Regression), and the special seminar on using R. These courses are linked and integrated around the theme of providing cutting-edge methodological training to relatively advanced graduate students. All computational work will be done in R and WinBUGS. LaTeX is preferred for the homework assignments. Office Hours: Instructor, Monday-Friday 1:00 -- 2:00, and by appointment, HN-306. Yu-Sung, Monday-Friday 2:30 -- 4:30.
Required Texts: - Bayesian Methods for the Social and Behavioral Sciences, Chapman and Hall, Second Edition (2007). Author: Jeff Gill IMPORTANT NOTE. Do not purchase the first edition. The second edition will be available for purchase in manuscript from ISR. Details given the first day of class. - An R and S-Plus Companion to Applied Regression, Sage (2002). Author: John Fox Software Used: This course will make extensive use of extremely high quality and fully featured software for Bayesian and classical statistical analysis. Furthermore the software is free so you may load it on your home computers after you return from the summer program. For the purpose of this seminar all of the statistical software required has already been loaded onto ICPSR machines and tested; so there is nothing you have to do. These packages are currently in use at universities and research centers throughout the world. R. The official description: R is `GNU S' - A language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering,...). R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time. Some free manuals and guides for R. Do not print these in the ICPSR lab, use acrobat reader to look at them on the screen. - ``Data Analysis and Graphics Using R'' by John Maindonald: http://wwwmaths.anu.edu.au/~johnm/r/usingr.pdf, where the data sets and scripts are also available at his homepage: http://room.anu.edu.au/~johnm. - ``R for Beginners / R pour les débutants'' by Emmanuel Paradis, an introductory exposition in English: http://lib.stat.cmu.edu/r/cran/doc/contrib/rdebuts_en.pdf, and French: http://lib.stat.cmu.edu/r/cran/doc/contrib/rdebuts_fr.pdf. - ``Kickstarting R'' by Jim Lemon, a short introduction: http://lib.stat.cmu.edu/r/cran/doc/contrib/kickstart.tar.gz (gzipped TAR), and: http://lib.stat.cmu.edu/r/cran/doc/contrib/kickstart.zip (zipped). - ``Notes on the use of R for psychology experiments and questionnaires'' Jonathan Baron and Yuelin Li: http://lib.stat.cmu.edu/r/cran/doc/contrib/rpsych.htm (HTML), and: http://lib.stat.cmu.edu/r/cran/doc/contrib/rpsych.pdf (PDF).
Since R is "not unlike" S, you can buy any Splus book and it will correspond. On my Splus/R help page, http://psblade.ucdavis.edu/s-language.help.html R/Splus, there are multiple links to Splus resources and a near-complete bibliography on published books featuring or using Splus. BUGS (Bayesian inference Using Gibbs Sampling. The official description: Bayesian inference Using Gibbs Sampling is a piece of computer software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. It grew from a statistical research project at the MRC Biostatistics Unit, but now is developed jointly with the Imperial College School of Medicine at St Mary's, London. The Classic BUGS program uses text-based model description and a command-line interface, and versions are available for major computer platforms. A Windows version, WinBUGS, has an option of a graphical user interface and has on-line monitoring and convergence diagnostics. CODA is a suite of S-plus/R functions for convergence diagnostics. We will use WinBUGS this summer. The programs are reasonably easy to use and come with a range of examples. Considerable caution is, however, needed in their use, since the software is not perfect and MCMC is inherently less robust than analytic statistical methods. There is no inbuilt protection against misuse. Course Content: (problem set assignment subject to change, see class announcements) WEEK 1: INTRODUCTION, BACKGROUND, AND BASICS OF BAYESIAN INFERENCE A. Introductory Topics: - Bayesian Statistics in Perspective - Motivation and Justification - Why Are We Uncertain about Probability? - Bayes Law and Conditional Inference - A Scientific Approach to Social and Behavioral Data Analysis - Computing Topic: Simple Gibbs Sampling in R - Gill (2007), Chapter 1, Extended Reading: - Gelman, Carlin, Stern, and Rubin, Bayesian Data Analysis, Chapman and Hall (1995), Chapter 1. - Bayes, Thomas. (1763). ``An Essay Towards Solving a Problem in the Doctrine of Chances.'' Philosophical Transactions of the Royal Society of London 53, 370-418. Homework: text problems 1.1, 1.2, 1.3, 1.4, 1.6 (due Friday, June 29). B. Math-Stat Review Topics: - Likelihood Theory and Estimation
- The Generalized Linear Model - Defining the Link Function - Deviance Residuals - Numerical Maximum Likelihood - Newton-Raphson and Root Finding - Iterative Weighted Least Squares - Quasi-Likelihood - Computing Topic: GLMs in R - Gill (2007) Appendix A, - Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modeling Based on Generalized Linear Models. Second Edition. New York: Springer. Chapter 2. - McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Second Edition. New York: Chapman & Hall. Chapters 1-2. - Fahrmeir, L. and Kaufmann, H. (1985). ``Consistency and Asymptotic Normality of the Maximum Likelihood Estimator in Generalized Linear Models.'' Annals of Statistics 13, 342-368. - Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society, Series A 135, 370-385. Homework: text problems 2.1, 2.2, 2.3, 2.4, 2.8, 2.9, 2.11 (due Monday, July 2). C. Bayesian Mechanics: - The Bayesian Setup - Philosophical and Mechanical Issues - Combining Likelihood and Priors - Bayesian ``Learning'' - Computing Topic: Bayesian GLMs in R - Gill (2007) Chapter 2. - Edwards, W., Lindman, H., and Savage, L. J. (1963). Bayesian Statistical Inference for Psychological Research. Psychological Research 70, 193-242. - Western, B. (1999). Bayesian Methods for Sociologists: An Introduction. Sociological Methods & Research 28, 7-34. - Diaconis, P. and Freedman, D. A. (1986). On the Consistency of Bayes Estimates. Annals of Statistics 14, 1-67. - Hyndman, R. J. (1996). Computing and Graphing Highest Density Regions. The American Statistician 50, 120-126. Homework: text problems 3.1, 3.2, 3.3, 3.7, 3.9 (due Monday, July 9). D. Weekly paper for discussion: Efron, B. (1986) ``Why Isn't Everyone a Bayesian?'' The American Statistician, 40, 1-5.
WEEK 2: CONVENTIONAL SOCIAL SCIENCE MODEL APPLICATIONS AND ISSUES (lab on Wednesday) A. The Bayesian Normal Model - The Normal Model with Variance Known - The Normal Model with Mean Known - Multivariate Normal Model When mu and sigma Are Both Unknown - Computing Topic: Bayesian Normal Models in R - Gill (2007) Chapters 3 and 4. - Broemeling, L. D. (1985). {Bayesian Analysis of Linear Models. New York: Marcel Dekker. - Geweke, J. (1993). Bayesian Treatment of the Independent Student-t Linear Model. Journal of Applied Econometrics 8, S19-S40. - Lindley, D. V. and Smith, A. F. M. (1972). ``Bayes Estimates for the Linear Model.'' Journal of the Royal Statistical Society, Series B 34, 1-41. - Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. New York: Wiley \& Sons. Homework: text problems 4.1, 4.2, 4.3, 4.4, 4.7, computer assignment for week 2. B. The Bayesian Prior - Conjugacy - The Exponential Family Form - Noninformative Priors - Uniform Priors - Jeffreys Prior - Reference Priors - Proper Versus Improper Priors - Elicited Priors - Computing Topic: Specifying Different Priors in R - Gill (2007) Chapter 5. - Bernardo, J. M. (1979). ``Reference Posterior Distributions for Bayesian Inference.'' Journal of the Royal Statistical Society, Series B 41, 113-147. - Diaconis, P. and Ylvisaker, D. (1979). ``Conjugate Priors for Exponential Families.'' Annals of Statistics 7, 269-281. - Garthwaite, P. H. and Dickey, J. M. (1988). ``Quantifying Expert Opinion in Linear Regression Problems.'' Journal of the Royal Statistical Society, Series B 50, 462-474. - Kass, R. E. and Wasserman, L. (1996). ``The Selection of Prior Distributions by Formal Rules.'' Journal of the American Statistic Association 90, 928-934. - Villegas, C. (1977). ``On the Representation of Ignorance.'' Journal of the American Statistical Association 72, 651-654. Homework: text problems 5.1, 5.2, 5.3, 5.4.
C. Assessing Model Quality - The Bayesian Linear Regression Model - Sensitivity Analysis - Local Sensitivity Analysis - Global Sensitivity Analysis - Robustness Evaluation - Basic Linear Modeling Robustness - Bayesian Linear Outlier Detection - Bayesian Specification Robustness - Posterior Predictive Distribution - Computing Topic: Using R to Test Model Quality - Gill (2007) Chapter 6. - Berger, J. and Berliner, L. M. (1986). ``Robust Bayes and Empirical Bayes Analysis with $\epsilon$-contaminated Priors.'' The Annals of Statistics 14, 461-486. - Box, G. E. P. and Tiao, G. C. (1968). ``A Bayesian Approach to Some Outlier Problems.'' Biometrika 55, 119-129. - Cook, D. R. and Weisberg, S. (1982). Residuals and Influence in Regression. New York: Chapman \& Hall. - Delampady, M. and Dey, D. K. (1994). ``Bayesian Robustness for Multiparameter Problems.'' Journal of Statistical Planning and Inference 50, 375-382. - Gelfand, A. E. and Dey, D. K. (1991). ``On Bayesian Robustness of Contaminated Classes of Priors.'' Statistical Decisions 9, 63-80. - Lange, K. L., Little, R. J. A., and Taylor, J. M. G. (1989). ``Robust Statistical Modeling Using the $t$ Distribution.'' Journal of the American Statistical Association 84, 881-896. Homework: text problems 6.1, 6.2, 6.8. D. Bayes Factor and Bayesian Hypothesis Testing - Evidence Through Posterior Descriptions - Bayesian Approximation to Frequentist Hypothesis Testing - One Sided Testing - Two Sided Testing - Bayesian Decision Theory - The Bayes Factor as Evidence - Bayes Factor for a Two-sided Test - Local Bayes Factor - Intrinsic Bayes Factor - Partial Bayes Factor - Fractional Bayes Factor - Laplace Approximation - Computing Topic: Formal Model Tests in R
- Gill (2007) Chapter 7. - Berger, J. O. and Mortera, J. (1999). ``Default Bayes Factors for Nonnested Hypothesis Testing.'' Journal of the American Statistical Association 94, 542-554. - Kass, R. E. (1993). ``Bayes Factors in Practice.'' The Statistician 42, 551-560. - Kass, R. E. and Raftery, A. E. (1995). ``Bayes Factors.'' Journal of the American Statistical Association 90, 773-795. - Raftery, A. E. (1996). ``Hypothesis Testing and Model Selection.'' In Markov Chain Monte Carlo in Practice, W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (eds.). New York: Chapman & Hall, pp. 163-188. Homework: 7.1, 7.4, 7.8. E. Weekly paper for discussion: Richard A. Berk, Alec Campbell, Ruth Klap; Bruce Western. (1992). ``The Deterrent Effect of Arrest in Incidents of Domestic Violence: A Bayesian Analysis of Four Field Experiments.'' American Sociological Review, 57, 698-708. Presented by volunteer group. WEEK 3: POSTERIOR CALCULATION WITH BAYESIAN STOCHASTIC SIMULATION (lab on Wednesday, problem 5.10) A. Basic Monte Carlo Integration - Rejection Sampling - Classical Numerical Integration - Importance Sampling/Sampling Importance Resampling - Mode Finding and the EM Algorithm - Convergence of the EM Algorithm - EM for Exponential Families - Computing Topic: Introduction to WinBUGS for MCMC estimation - Gill (2007) Chapter 8. - Carlin, B. P. and Chib, S. (1995). ``Bayesian Model Choice via Markov Chain Monte Carlo Methods.'' Journal of the Royal Statistical Society, Series B 57, 473-484. - Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). ``Maximum Likelihood from Incomplete Data via the EM Algorithm.'' Journal of the Royal Statistical Society, Series B 39, 1-38. - Kennedy, W. J. and Gentle, J. E. (1980). Statistical Computing. New York: Marcel Dekker. - Metropolis, N. and Ulam, S. (1949). ``The Monte Carlo Method.'' Journal of the American Statistical Association 44, 335-341. - Mooney, C. Z. (1997). Monte Carlo Simulation. Thousand Oaks, CA: Sage.
- Rubin, D. B. (1987). ``A Noniterative Sampling/Importance Resampling Alternative to the Data Augmentation Algorithm for Creating a Few Imputations When Fractions of Missing Information Are Modest: the SIR Algorithm.'' Discussion of Tanner & Wong (1987). Journal of the American Statistical Society 82, 543-546. Homework: text problems 8.1, 8.2, 8.7, 8.9, computer assignment for week 3. B. The Theory and Practice of Markov Chain Monte Carlo - General Properties of Markov Chains - The Chapman-Kolmogorov Equations - Marginal Distributions - Stationarity - Ergodicity - The Gibbs Sampler - The Metropolis-Hastings Algorithm - Data Augmentation - Random Number Generation - Computing Topic: Running and monitoring Markov Chains in WinBUGS - Gill (2007) Chapter 9. - Casella, G. and George, E. I. (1992). ``Explaining the Gibbs Sampler.'' The American Statistician 46, 167-174. - Gelfand, A. E. and Smith, A. F. M. (1990). ``Sampling-Based Approaches to Calculating Marginal Densities.'' Journal of the American Statistical Association 85: 389-409. - Geman, S. and Geman, D. (1984). ``Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images.'' IEEE Transactions on Pattern Analysis and Machine Intelligence 6,721-741. - Geyer, C. J. (1992). ``Practical Markov Chain Monte Carlo.'' Statistical Science 7, 473-511. - Hastings, W. K. (1970). ``Monte Carlo Sampling Methods Using Markov Chains and Their Applications.'' Biometrika 57, 97-109. - Jackman, S. (2000). ``Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo.'' American Journal of Political Science 44, 375-404. - Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller E. ``Equation of State Calculations by Fast Computing Machine.'' Journal of Chemical Physics 21, 1087-1091. - Peskun, P. H. (1973). ``Optimum Monte Carlo Sampling Using Markov Chains.'' Biometrika 60, 607-612. - Tierney, L. (1994). ``Markov Chains for Exploring Posterior Distributions.'' Annals of Statistics 22, 1701-1728. Homework: text problems 9.1, 9.2, 9.5.
C. Weekly paper for discussion: Quinn, K. M., Martin, A., and Whitford, A. B. (1999). ``Voter Choice in Multi-Party Democracies: A Test of Competing Theories and Models.'' American Journal of Political Science 43, 1231-1247. Presented by volunteer group. WEEK 4: HIERARCHICAL MODELS AND MCMC DIAGNOSTICS (lab on Wednesday and Thursday, no class on Friday) A. Bayesian Hierarchical Models - Basic Structure of the BHM - A Poisson-Gamma Hierarchical Model in Detail - The General Role of Priors and Hyperpriors - Exchangeability - The General Bayesian Hierarchical Linear Model - Computing Topic: Specifying Hierarchical Models in WinBUGS - Gill (2007) Chapter 10. - Albert, J. H. (1998). ``Computational Methods Using a Bayesian Hierarchical Generalized Linear Model.'' Journal of the American Statistical Association 83, 1037-1044. - Bryk, A. S. and Raudenbush, S. W. (1992). Hierarchical Linear Models. Beverly Hills: Sage. - Cohen, J., Nagin, D., Wallstrom, G., and Wasserman, L. (1998). ``Hierarchical Bayesian Analysis of Arrest Rates.'' Journal of the American Statistical Association 93, 1260-1270. Homework: text problems 10.1, 10.2, 10.4, computer assignment for week 4. B. The Convergence and Behavior of Markov Chains - Autocorrelation - Graphical Techniques for Demonstrating Nonconvergence - Empirical Diagnostics - Customized Diagnostics - Why We Shouldn't Worry Too Much About Stationarity - Mixing and Acceleration - Simulated Annealing - Rao-Blackwellizing for Improved Variance Estimation - The Slice Sampler. - Computing Topic: Diagnostics in WinBUGS and the R Suites: CODA and BOA - Gill (2007) Chapter 11. - Cowles, M. K., Roberts, G. O., and Rosenthal, J. S. (1999). ``Possible Biases Induced by MCMC Convergence Diagnostics.'' Journal of Statistical Computation and Simulation 64, 87-104.
- Gelfand, A. E. and Sahu, S. K. (1994). ``On Markov Chain Monte Carlo Acceleration.'' Journal of Computational and Graphical Statistics 3, 261-276. - Gelman, A., Rubin, D. B. (1992). ``Inference from Iterative Simulation Using Multiple Sequences.'' Statistical Science 7, 457-511. - Geyer, C. J. (1992). ``Practical Markov Chain Monte Carlo.'' Statistical Science 7, 473-511. - Zellner, A. and Min, C-K. (1995). ``Gibbs Sampler Convergence Criteria.'' Journal of the American Statistical Association 90, 921-927. Homework: text problems 11.1, 11.2, 11.4. C. Weekly paper for discussion: Western, B. (1998). ``Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach.'' American Journal of Political Science 42, 1233-1259. Presented by volunteer group.