Big Data need Big Model 1/44



Similar documents
Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Tutorial on Markov Chain Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Computational Statistics for Big Data

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

STA 4273H: Statistical Machine Learning

MCMC Using Hamiltonian Dynamics

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Model-based Synthesis. Tony O Hagan

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

A Latent Variable Approach to Validate Credit Rating Systems using R

STAT3016 Introduction to Bayesian Data Analysis

Parallelization Strategies for Multicore Data Analysis

Lecture 3: Linear methods for classification

Bayesian Statistics in One Hour. Patrick Lam

Applications of R Software in Bayesian Data Analysis

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

R2MLwiN Using the multilevel modelling software package MLwiN from R

Bayesian Statistics: Indian Buffet Process

11. Time series and dynamic linear models

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Examples. David Ruppert. April 25, Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

HT2015: SC4 Statistical Data Mining and Machine Learning

Monte Carlo-based statistical methods (MASM11/FMS091)

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Distributed Bayesian Posterior Sampling via Moment Sharing

CSCI567 Machine Learning (Fall 2014)

Imperfect Debugging in Software Reliability

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Bayesian Penalized Methods for High Dimensional Data

Markov Chain Monte Carlo Simulation Made Simple

Dirichlet Processes A gentle tutorial

Statistics Graduate Courses

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Centre for Central Banking Studies

Analysis of Financial Time Series

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

A Bayesian Model to Enhance Domestic Energy Consumption Forecast

PS 271B: Quantitative Methods II. Lecture Notes

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

More details on the inputs, functionality, and output can be found below.

How To Solve A Sequential Mca Problem

Imputing Values to Missing Data

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Supervised Learning (Big Data Analytics)

Big Data, Statistics, and the Internet

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

A Bayesian Antidote Against Strategy Sprawl

Bayesian Hidden Markov Models for Alcoholism Treatment Tria

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

Logistic Regression (1/24/13)

Machine Learning and Pattern Recognition Logistic Regression

Optimizing Prediction with Hierarchical Models: Bayesian Clustering

Sampling for Bayesian computation with large datasets

Linear Threshold Units

Statistics & Probability PhD Research. 15th November 2014

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Logistic Regression for Spam Filtering

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Dealing with large datasets

Monte Carlo-based statistical methods (MASM11/FMS091)

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

MSCA Introduction to Statistical Concepts

Machine Learning Logistic Regression

Christfried Webers. Canberra February June 2015

A logistic approximation to the cumulative normal distribution

Principles of Data Mining by Hand&Mannila&Smyth

WAIC and cross-validation in Stan

Machine Learning for Data Science (CS4786) Lecture 1

Hierarchical Bayesian Modeling of the HIV Response to Therapy

itesla Project Innovative Tools for Electrical System Security within Large Areas

Statistics in Applications III. Distribution Theory and Inference

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

How To Understand The Theory Of Probability

Chenfeng Xiong (corresponding), University of Maryland, College Park

Mohr s Circle. Academic Resource Center

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Nonlinear Regression:

Discrete Frobenius-Perron Tracking

Machine Learning with MATLAB David Willingham Application Engineer

Audit Report and Requirements of Mantid Software from the Molecular Spectroscopy Group (2013)

Software support for economic research at CNB

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Exercise 1.12 (Pg )

3.2 Sources, Sinks, Saddles, and Spirals

Bayesian Estimation Supersedes the t-test

Transcription:

Big Data need Big Model 1/44

Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell,... Department of Statistics, Columbia University, New York (and other places) 10 Nov 2014 2/44

3/44

4/44

5/44

Ordered probit data { int<lower=2> K; int<lower=0> N; int<lower=1> D; int<lower=1,upper=k> y[n]; row_vector[d] x[n]; } parameters { vector[d] beta; ordered[k-1] c; } model { vector[k] theta; for (n in 1:N) { real eta; eta <- x[n] * beta; theta[1] <- 1 - Phi(eta - c[1]); for (k in 2:(K-1)) theta[k] <- Phi(eta - c[k-1]) - Phi(eta - c[k]); theta[k] <- Phi(eta - c[k-1]); y[n] ~ categorical(theta); } } 6/44

Measurement error model data {... real x_meas[n]; real<lower=0> tau; } parameters { real x[n]; real mu_x; real sigma_x;... } model { } // measurement of x // measurement noise // unknown true value // prior location // prior scale x ~ normal(mu_x, sigma_x); x_meas ~ normal(x, tau); y ~ normal(alpha + beta * x, sigma);... // prior // measurement model 7/44

Stan overview 8/44

Stan overview Fit open-ended Bayesian models 8/44

Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ 8/44

Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ Code a distribution once, then use it everywhere 8/44

Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ Code a distribution once, then use it everywhere Hamiltonian No-U-Turn sampler 8/44

Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ Code a distribution once, then use it everywhere Hamiltonian No-U-Turn sampler Autodiff 8/44

Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ Code a distribution once, then use it everywhere Hamiltonian No-U-Turn sampler Autodiff Runs from R, Python, Matlab, Julia; postprocessing 8/44

People 9/44

People Stan core (15) 9/44

People Stan core (15) Research collaborators (30) Developers (100) 9/44

People Stan core (15) Research collaborators (30) Developers (100) User community (1000) 9/44

People Stan core (15) Research collaborators (30) Developers (100) User community (1000) Users (10000) 9/44

Funding National Science Foundation Institute for Education Sciences Department of Energy Novartis YouGov 10/44

Roles of Stan 11/44

Roles of Stan Bayesian inference for unsophisticated users (alternative to Stata, Bugs, etc.) 11/44

Roles of Stan Bayesian inference for unsophisticated users (alternative to Stata, Bugs, etc.) Bayesian inference for sophisticated users (alternative to programming it yourself) 11/44

Roles of Stan Bayesian inference for unsophisticated users (alternative to Stata, Bugs, etc.) Bayesian inference for sophisticated users (alternative to programming it yourself) Fast and scalable gradient computation 11/44

Roles of Stan Bayesian inference for unsophisticated users (alternative to Stata, Bugs, etc.) Bayesian inference for sophisticated users (alternative to programming it yourself) Fast and scalable gradient computation Environment for developing new algorithms 11/44

12/44

13/44

14/44

This week, the New York Times and CBS News published a story using, in part, information from a non-probability, opt-in survey sparking concern among many in the polling community. In general, these methods have little grounding in theory and the results can vary widely based on the particular method used. Michael Link, President, American Association for Public Opinion Research 15/44

16/44

17/44

Xbox estimates, adjusting for demographics 18/44

19/44

Karl Rove, Wall Street Journal, 7 Oct: Mr. Romney s bounce is significant. Nate Silver, New York Times, 6 Oct: Mr. Romney has not only improved his own standing but also taken voters away from Mr. Obama s column. 20/44

Xbox estimates, adjusting for demographics and partisanship 21/44

Jimmy Carter Republicans and George W. Bush Democrats 22/44

23/44

24/44

Toxicology 25/44

Earth science 26/44

Lots of other applications Astronomy, ecology, linguistics, epidemiology, soil science,... 27/44

Steps of Bayesian data analysis 28/44

Steps of Bayesian data analysis Model building 28/44

Steps of Bayesian data analysis Model building Inference 28/44

Steps of Bayesian data analysis Model building Inference Model checking 28/44

Steps of Bayesian data analysis Model building Inference Model checking Model understanding and improvement 28/44

Background on Bayesian computation 29/44

Background on Bayesian computation Point estimates and standard errors 29/44

Background on Bayesian computation Point estimates and standard errors Hierarchical models 29/44

Background on Bayesian computation Point estimates and standard errors Hierarchical models Posterior simulation 29/44

Background on Bayesian computation Point estimates and standard errors Hierarchical models Posterior simulation Markov chain Monte Carlo (Gibbs sampler and Metropolis algorithm) 29/44

Background on Bayesian computation Point estimates and standard errors Hierarchical models Posterior simulation Markov chain Monte Carlo (Gibbs sampler and Metropolis algorithm) Hamiltonian Monte Carlo 29/44

Solving problems 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation Problem: Existing algo-diff slow, limited, unextensible 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation Problem: Existing algo-diff slow, limited, unextensible Solution: Our own algo-diff 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation Problem: Existing algo-diff slow, limited, unextensible Solution: Our own algo-diff Problem: Algo-diff requires fully templated functions 30/44

Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation Problem: Existing algo-diff slow, limited, unextensible Solution: Our own algo-diff Problem: Algo-diff requires fully templated functions Solution: Our own density library, Eigen linear algebra 30/44

Radford Neal (2011) on Hamiltonian Monte Carlo One practical impediment to the use of Hamiltonian Monte Carlo is the need to select suitable values for the leapfrog stepsize, ɛ, and the number of leapfrog steps L... Tuning HMC will usually require preliminary runs with trial values for ɛ and L... Unfortunately, preliminary runs can be misleading... 31/44

The No U-Turn Sampler Created by Matt Hoffman Run the HMC steps until they start to turn around (bend with an angle > 180 ) Computationally efficient Requires no tuning of #steps Complications to preserve detailed balance 32/44

Hoffman and Gelman NUTS Example Trajectory 0.4 0.3 0.2 0.1 0 0.1 0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced out by the leapfrog integrator and associated with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are positions associated with states that must be excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman Carpenter and thehoffman positionslee withgoodrich a red x through Betancourt them. correspond.. Stan: to states A platform that mustfor be Bayesian inference 33/44

Hoffman and Gelman NUTS Example Trajectory 0.4 0.3 0.2 0.1 0 0.1 0.1 0 0.1 0.2 0.3 0.4 0.5 Blue ellipse is contour of target distribution Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced out by the leapfrog integrator and associated with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are positions associated with states that must be excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman Carpenter and thehoffman positionslee withgoodrich a red x through Betancourt them. correspond.. Stan: to states A platform that mustfor be Bayesian inference 33/44

Hoffman and Gelman NUTS Example Trajectory 0.4 0.3 0.2 0.1 0 0.1 0.1 0 0.1 0.2 0.3 0.4 0.5 Blue ellipse is contour of target distribution Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced Initial out byposition the leapfrog integrator at black and associated solid with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are positions associated with states that must be excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman Carpenter and thehoffman positionslee withgoodrich a red x through Betancourt them. correspond.. Stan: to states A platform that mustfor be Bayesian inference 33/44

Hoffman and Gelman NUTS Example Trajectory 0.4 0.3 0.2 0.1 0 0.1 0.1 0 0.1 0.2 0.3 0.4 0.5 Blue ellipse is contour of target distribution Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced Initial out byposition the leapfrog integrator at black and associated solid with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are Arrows positions associated indicate with astates U-turn that mustinbe momentum excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman Carpenter and thehoffman positionslee withgoodrich a red x through Betancourt them. correspond.. Stan: to states A platform that mustfor be Bayesian inference 33/44

NUTS vs. Gibbs andthe Metropolis No-U-Turn Sampler igure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots compare 1,000 independent draws from a highly correlated 250-dimensional distribution (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 34/44

NUTS vs. Gibbs andthe Metropolis No-U-Turn Sampler Two dimensions of highly correlated 250-dim distribution igure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots compare 1,000 independent draws from a highly correlated 250-dimensional distribution (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 34/44

NUTS vs. Gibbs andthe Metropolis No-U-Turn Sampler Two dimensions of highly correlated 250-dim distribution igure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots compare 1M samples 1,000 independent from Metropolis, draws from1ma highly from correlated Gibbs (thin 250-dimensional to 1K) distribution (right) with 1,000,000 samples (thinned to 1,000 samples for display) generated by random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 34/44

NUTS vs. Gibbs andthe Metropolis No-U-Turn Sampler Two dimensions of highly correlated 250-dim distribution igure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots compare 1M samples 1,000 independent from Metropolis, draws from1ma highly from correlated Gibbs (thin 250-dimensional to 1K) distribu- 1K(right) samples with 1,000,000 from NUTS, samples 1K(thinned independent to 1,000 samples draws for display) generated by tion random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 34/44

NUTS vs. Basic HMC 250-D normal and logistic regression models Vertical axis shows effective #sims (big is good) (Left) NUTS; (Right) HMC with increasing t = ɛl 35/44

NUTS vs. Basic HMC II Hierarchical logistic regression and stochastic volatility Simulation time is step size ɛ times #steps L NUTS can beat optimally tuned HMC 36/44

Solving more problems in Stan 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. lim x 0 x log x) 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. lim x 0 x log x) 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. lim x 0 x log x) Problem: Restrictive licensing (e.g., closed, GPL, etc.) 37/44

Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. lim x 0 x log x) Problem: Restrictive licensing (e.g., closed, GPL, etc.) Solution: Open-source, BSD license 37/44

New stuff: Differential equation models Simple harmonic oscillator: dz 1 dt dz 2 dt = z 2 = z 1 θz 2 with observations (y 1, y 2 ) t, t = 1,..., T : y 1t N(z 1 (t), σ 2 1) y 2t N(z 2 (t), σ 2 2) Given data (y 1, y 2 ) t, t = 1,..., T, estimate initial state (y 1, y 2 ) t=0 and parameter θ 38/44

Stan program: 1 functions { real[] sho(real t, real[] y, real[] theta, real[] x_r, int[] x_i) { real dydt[2]; dydt[1] <- y[2]; dydt[2] <- -y[1] - theta[1] * y[2]; return dydt; } } data { int<lower=1> T; real y[t,2]; real t0; real ts[t]; } transformed data { real x_r[0]; int x_i[0]; } 39/44

Stan program: 2 parameters { real y0[2]; vector<lower=0>[2] sigma; real theta[1]; } model { real z[t,2]; sigma ~ cauchy(0,2.5); theta ~ normal(0,1); y0 ~ normal(0,1); z <- integrate_ode(sho, y0, t0, ts, theta, x_r, x_i); for (t in 1:T) y[t] ~ normal(z[t], sigma); } 40/44

Stan output Run RStan with data simulated from θ = 0.15, y 0 = (1, 0), and σ = 0.1: Inference for Stan model: sho. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat y0[1] 1.05 0.00 0.09 0.87 0.98 1.05 1.10 1.23 1172 1 y0[2] -0.06 0.00 0.06-0.18-0.10-0.06-0.02 0.06 1524 1 sigma[1] 0.14 0.00 0.04 0.08 0.11 0.13 0.16 0.25 1354 1 sigma[2] 0.11 0.00 0.03 0.06 0.08 0.10 0.12 0.18 1697 1 theta[1] 0.15 0.00 0.04 0.08 0.13 0.15 0.17 0.22 1112 1 lp 28.97 0.06 1.80 24.55 27.95 29.37 30.29 31.35 992 1 41/44

Big Data, Big Model, Scalable Computing 10 0 10 Items 100 Items 1000 Items Seconds / Sample 10-1 10-2 10-3 10 2 10 3 10 4 10 5 10 6 Total Ratings 42/44

Thinking about scalability 43/44

Thinking about scalability Hierarchical item response model: Stan JAGS # items # raters # groups # data time memory time memory 20 2,000 100 40,000 :02m 16MB :03m 220MB 40 8,000 200 320,000 :16m 92MB :40m 1400MB 80 32,000 400 2,560,000 4h:10m 580MB :??m?mb 43/44

Thinking about scalability Hierarchical item response model: Stan JAGS # items # raters # groups # data time memory time memory 20 2,000 100 40,000 :02m 16MB :03m 220MB 40 8,000 200 320,000 :16m 92MB :40m 1400MB 80 32,000 400 2,560,000 4h:10m 580MB :??m?mb Also, Stan generated 4x effective sample size per iteration 43/44

Future work 44/44

Future work Programming 44/44

Future work Programming Faster gradients and higher-order derivatives 44/44

Future work Programming Faster gradients and higher-order derivatives Functions 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms Riemannian Hamiltonian Monte Carlo 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms Riemannian Hamiltonian Monte Carlo (Penalized) mle 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms Riemannian Hamiltonian Monte Carlo (Penalized) mle (Penalized) marginal mle 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms Riemannian Hamiltonian Monte Carlo (Penalized) mle (Penalized) marginal mle Black-box variational Bayes 44/44

Future work Programming Faster gradients and higher-order derivatives Functions Statistical algorithms Riemannian Hamiltonian Monte Carlo (Penalized) mle (Penalized) marginal mle Black-box variational Bayes Data partitioning and expectation propagation 44/44