# Bayesian Statistics: Indian Buffet Process

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY August 2012 Reference: Most of the material in this note is taken from: Griffiths, T. L. & Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. Gatsby Unit Technical Report GCNU-TR Introduction A common goal of unsupervised learning is to discover the latent variables responsible for generating the observed properties of a set of objects. For example, factor analysis attempts to find a set of latent variables (or factors) that explain the correlations among the observed variables. A problem with factor analysis, however, is that the user has to specify the number of latent variables when using this technique. Should one use 5, 10, or 15 latent variables (or some other number)? There are at least two ways one can address this question. One way is by performing model selection. Another way is to use a Bayesian nonparametric method. In general, Bayesian nonparametric methods grow the number of parameters as the size and complexity of the data set grow. An important example of a Bayesian nonparametric method is the Indian Buffet Process. (Another important example is the Dirichlet Process Mixture Model; see the Computational Cognition Cheat Sheet on Dirichlet Processes.) An Indian buffet process (IBP) is a stochastic process that provides a probability distribution over equivalence classes of binary matrices of bounded rows and potentially infinite columns. Adoption of the IBP as a prior distribution in Bayesian modeling has led to successful Bayesian nonparametric models of human cognition including models of latent feature learning (Austerweil & Griffiths, 2009), causal induction (Wood, Griffiths, & Ghahramani, 2006), similarity judgements (Navarro & Griffiths, 2008), and the acquisition of multisensory representations (Yildirim & Jacobs, 2012). Two aspects of the Indian buffet process that it shares with other Bayesian nonparametric methods underlie its successful application in modeling human cognition. First, it defines a probability distribution over very rich, combinatoric structures. Indeed, depending on the goal of the modeler, one can interpret binary matrices as feature ownership tables, adjacency matrices representing graphs, or other complex, structured representations. Second, under IBP, binary matrices can grow or shrink with more data, effectively letting models adapt to the complexity of the observations. For example, a binary matrix is often thought of as a feature ownership matrix in which there is a row for each object, and a column for each possible feature. An entry of 1 at element (i, k) of the matrix means that object i posseses feature k. Otherwise this entry is 0. Importantly, the user does not need to specify the number of columns (that is, the number of features). Instead, the number of 1

2 columns can grow with the size and complexity of the data set. That is, a priori, IBP due to its stochastic process foundations has support over feature ownership matrices with any number of columns. Upon observing data, a posteriori, IBP concentrates its mass over a subset of binary matrices with finitely many columns via probabilistic inference (assuming the use of a finite data set). Because IBP lends itself to Bayesian inference, the posterior IBP will maintain the variability regarding the exact number of features arising from the observations, thereby solving (or at least alleviating) the difficult problems of model selection and model comparison. As stated above, IBP does not make assumptions about the exact number of latent features underlying a finite set of observed objects, although it does make other assumptions about these units (Griffiths & Ghahramani, 2005, 2006). It assumes that the latent features are binary. Thus, an object either does or does not possess a feature. It also assumes that latent features are statistically independent, meaning that knowledge that an object possesses one feature does not provide information about whether it possesses other features. Lastly, it assumes that the latent features are a finite subset of an unbounded or infinite set of features. More formally, the probability of a binary matrix Z under IBP can be written as the following: α K K p(z) = 2 N 1 h=1 K h! exp{ αh (N m k )!(m k 1)! N} (1) N! k=1 where N is the number of objects, K is the number of multisensory features, K h is the number of features with history h (the history of a feature is the matrix column for that feature interpreted as a binary number), H N is the N th harmonic number, m k is the number of objects with feature k, and α is a variable influencing the number of features (a derivation of this equation can be found in Griffiths & Ghahramani, 2005). We denote Z IBP(α), meaning that Z is distributed according to the IBP with parameter α. Below, we provide intuition for α, the only parameter of the process. 2. Constructing the Indian Buffet Process There are three well studied constructions for the Indian buffet process constructions based on an implicit representation, a finite representation, and an explicit stick-breaking representation. Here we mainly study the implicit representation before we briefly introduce the other two constructions. An implicit representation: In the field of Bayesian nonparametric models (and the study of stochastic processes, in general), implicit representations are a way of constructing models which exploit de Finetti s Theorem. If a sequence of observations, X 1,..., X n, are exchangeable (i.e., its probability distribution is invariant to permutations of the variables in the sequence), then by de Finetti s theorem there is a stochastic element (also known as the mixing distribution) that underlies these observations. For such cases, one can choose to work only with the observations, X i s, (e.g., in defining conditional distributions) and thereby address the underlying stochastic process implicitly. A well know example of such a representation is the Chinese Restaurant Process which implies the Dirichlet Process as 2

3 Figure 1: Simulations of the Indian buffet process with different α values. The binary matrices depict samples from the IBP with α = 1 resulting in a total of 3 dishes sampled (Left), with α = 3 resulting in 12 dishes sampled (Middle), and with α = 5 resulting in 18 dishes sampled (Right). In all simulations, the number of customers (i.e., the number of rows) was fixed to 15. White pixels are 1 and black pixels are 0. its mixing distribution (Aldous, 1985; also see the Computational Cognition Cheat Sheet on Dirichlet Processes.). Griffiths and Ghahramani (2005, 2006) introduced the following culinary metaphor that results in an IBP while keeping the mixing distribution implicit. In this metaphor, customers (observations) sequentially arrive at an Indian buffet (hence the name) with an apparently infinite number of dishes (features). The first customer arrives and samples the first Poisson(α) dishes, with α > 0 being the only parameter of the process. The i th customer samples one of the already sampled dished with probability proportional to the popularity of that dish prior to her arrival (that is, proportional to n i,k where n i i,k is the number of previous customers who sampled dish k). When she is done sampling dishes previously sampled, customer i further samples Poisson( α ) number of new dishes. This process continues until all N customers i visit the buffet. We can represent the outcome of this process in a binary matrix Z where the rows of the matrix are customers (observations) and the columns are dishes (features) (z i,k is 1 if observation i possesses feature k). The probability distribution over binary matrices induced by this process is indeed as expressed in Equation 1. To get a better intuition for the parameter α, we simulated samples from IBP with three different α values. As illustrated in Figure 1, the smaller the α, the smaller the number of features with i z i,k > 0. In other words, for equal number of observations, the parameter α influences how likely it is that multiple observations will share the same features. Based on this property of the process, α is called the concentration parameter (similar to the α parameter in Dirichlet Processes). Thibaux and Jordan (2007) showed that Beta processes are the underlying mixing distribution for IBP. (Similar to the relationship between the Chinese Restaurant Process and the Dirichlet Processes.) We do not provide details of this derivation here. However, we note that the availability of the underlying mixing distribution is advantageous in at least two ways: (1) one can devise new constructions (e.g., stick-breaking constructions) of the 3

4 same stochastic process which can lead to new and more efficient inference techniques; and (2) it can lead to derivations of more sophisticated models from the same family, such as hierarchical extensions of IBP. Finally, another way of constructing Bayesian nonparametric methods is by first defining a finite latent feature model, and then take the limit of this model as the number of latent features goes to infinity (see Griffiths & Ghahramani, 2005, 2006). Conditional distributions for Inference: We need to derive the prior conditional distributions to perform Gibbs sampling for the IBP. In doing so, we exploit the exchangeability of IBP. For non-singleton features (i.e., features possessed by at least one other observation), we imagine that the i th observation is the last observation, allowing us to write the conditional distribution in the following way: P (z ik = 1 z i,k ) = n i,k N where z i,k is the feature assignments for column k except for observation i, and n i,k is the number of observations that possesses feature k excluding observation i. 3. Infinite Linear-Gaussian Model Now that we have a prior over binary matrices, we put it in action by outlining a model of unsupervised latent feature learning. Here we consider a simple linear-gaussian model (Griffiths & Ghahramani, 2005, 2006). Model: An important goal of the model is to find a set of latent features, denoted Z, explaining a set of observations, denoted X. Let Z be a binary feature ownership matrix, where z i,k = 1 indicates that object i possesses feature k. Let X be real-valued observations (e.g., X i,j is the value of feature j for object i). The problem of inferring Z from X can be solved via Bayes rule: p(x Z) p(z) p(z X) = (3) Z p(x Z ) p(z ) where p(z) is the prior probability of the feature ownership matrix, and p(x Z) is the likelihood of the observed objects, given the features. We assign IBP as a prior over binary feature ownership matrices, P (Z), which is described above. We now describe the likelihood function. The likelihood function is based on a linear-gaussian model. Let z i be the feature values for object i, and let x i be the observed values for object i. Then x i is drawn from a Gaussian distribution whose mean is a linear function of the features, z i W, and whose covariance matrix equals σx 2 I, where W is a weight matrix (the weight matrix itself is drawn from zeromean Gaussian distribution with covariance σw 2 I). Given these assumptions, the likelihood for a feature matrix is: p(x Z, W, σ 2 X) = 1 1 exp{ (2πσX 2 )ND/2 2σ 2 X (2) tr((x ZW ) T (X ZW ))} (4) where D is the dimensionality of X, and tr( ) denotes the trace operator. A graphical model representation of the model can be found in Figure 2. 4

5 Figure 2: A graphical model for the infinite linear Gaussian model. Inference: Exact inference in the model is computationally intractable and, thus, approximate inference must be performed using Markov chain Monte Carlo (MCMC) sampling methods (Gelman et al., 1995; Gilks, Richardson, & Spiegelhalter, 1996). We describe the MCMC sampling algorithm of Griffiths and Ghahramani (2005). A Matlab implementation of this sampling algorithm can be found on the Computational Cognition Cheat Sheet website. This inference algorithm is a combination of Gibbs and Metropolis-Hastings (MH) steps. We take Gibbs steps when sampling the cells in the feature ownership matrix. For all i with n i,k > 0 (non-singleton features), the conditional distribution can be computed using Equation 4 (the likelihood) and Equation 2 (the prior). To sample the number of new features for observation i, we compute a truncated distribution for a limited number of proposals (e.g., letting the number of new features range from 0 up to 6) using Equation 4 (the likelihood) and the prior over new features, Poisson( α ). Then we sample the number N of new features for observation i from this truncated distribution (after normalizing it). We take Gibbs steps when sampling α. The hyper-prior over the α parameter is a vague Gamma distribution, α Gamma(1, 1). The resulting posterior conditional for α is still a Gamma distribution, α Z G(1 + K +, 1 + N i=1 H i) (Görur, Jaekel, & Rasmussen, 2006). We take MH steps when sampling σ X. Proposals are generated by perturbing the current value by ɛ Uniform( 0.05, 0.05). We compute the acceptance function by computing the likelihood (Equation 4) twice, once with the current value (the denominator of the acceptance function) and once with the proposed value (the nominator of the acceptance function). Simulations: We applied the model to a synthetic data set of 100 images, denoted X. Each image x i was a real-valued vector of length 6 6 = 36. Each image was a superposition of a subset of four base images. These base images corresponded to the rows of the weight matrix W. For each row of the binary feature matrix, z i, each entry was set to 1 with probability 0.5 and set to 0 otherwise. Then we generated each x i by adding white noise with variance σ 2 x = 0.5 to the linear combination of the weight matrix based on the binary feature matrix, z i W. Figure 3 illustrates this process: the four base images (top row), binary feature vectors for a subsample, and the resulting images from the data set (bottom row). 5

6 Figure 3: (Top row) Four basis images underlying the data set. These binary images of length 6 6 = 36 constitute each of the four rows of the weight matrix. (Bottom row) Binary vectors of length 4 at the titles are four rows from the feature matrix, Z. Present features are denoted 1. Corresponding images from the dataset are depicted. Each image is a linear combination of the present features and white noise Figure 4: (Left) The posterior distribution of the number of features based on the samples of the MCMC chain. (Right) The expected posterior weight matrix computed in the following way: E[W Z, X] = (Z T Z + σ2 X σ 2 A I) 1 Z T X. In our Gibbs sampler, we initialized α = 1, and initialized Z to a random draw from IBP(α). We ran our simulation for 1000 steps. We discarded the first 200 samples as burnin. We thinned the remaining 800 samples by retaining only every 10 th sample. Our results are based on this remaining 80 samples. (We ran many other chains initialized differently. Results reported were typical across all simulations.) Figure 4 illustrates the results. The left panel in Figure 4 illustrates the distribution of the number of latent features discovered by the model. Clearly, the model captured that there were 4 latent features underlying the observations. The four images on the right panel in Figure 4 illustrate the weights for the four most frequent features across the chain. Notice how remarkably close the latent features discovered by the model are to the weights used to generate the data in the first place (top row of Figure 3). Aldous, D. (1985). Exchangeability and related topics. In Saint-Flour, XIII 1983 (pp ). Berlin: Springer. École d été de probabilités de 6

7 Austerweil, J. L. & Griffiths, T. L. (2009). Analyzing human feature learning as nonparametric Bayesian inference. In D. Koller, Y. Bengio, D. Schuurmans, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 21. Cambridge, MA: MIT Press. Gelman, A., Carlin, J. B., Stern. H. S., & Rubin, D. B. (1995). Bayesian Data Analysis. London: Chapman and Hall. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall. Görur, D., Jaekel, F., & Rasmussen, C. E. (2006). A choice model with infinitely many features. Proceedings of the International Conference on Machine Learning. Griffiths, T. L. & Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. Gatsby Unit Technical Report GCNU-TR Griffiths, T. L. & Ghahramani, Z. (2006). Infinite latent feature models and the Indian buffet process. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press. Navarro, D. J. & Griffiths, T. L. (2008). Latent features in similarity judgments: A nonparametric Bayesian approach. Neural Computation, 20, Thibaux, R. & Jordan, M. I. (2007). Hierarchical beta processes and the Indian buffet process. Proceedings of the Tenth Conference on Artificial Intelligence and Statistics (AISTATS). Wood, F., Griffiths, T. L., & Ghahramani, Z. (2006). A non-parametric Bayesian method for inferring hidden causes. Proceedings of the 2006 conference on Uncertainty in Artificial Intelligence. Yildirim, I. & Jacobs, R. A. (2012). A rational analysis of the acquisition of multisensory representations. Cognitive Science, 36,

### Nonparametric Factor Analysis with Beta Process Priors

Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract

### The Chinese Restaurant Process

COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

### Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

### Stick-breaking Construction for the Indian Buffet Process

Stick-breaking Construction for the Indian Buffet Process Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk Dilan

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution

Görür D, Rasmussen CE. Dirichlet process Gaussian mixture models: Choice of the base distribution. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 5(4): 615 66 July 010/DOI 10.1007/s11390-010-1051-1 Dirichlet

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Bayesian Factorization Machines

Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

### A Bayesian Antidote Against Strategy Sprawl

A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### 1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form

1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Thomas L. Griffiths. Department of Psychology. University of California, Berkeley. Adam N. Sanborn. Department of Psychological and Brain Sciences

Categorization as nonparametric Bayesian density estimation 1 Running head: CATEGORIZATION AS NONPARAMETRIC BAYESIAN DENSITY ESTIMATION Categorization as nonparametric Bayesian density estimation Thomas

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

### Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

### A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

### Modeling Transfer Learning in Human Categorization with the Hierarchical Dirichlet Process

Modeling Transfer Learning in Human Categorization with the Hierarchical Dirichlet Process Kevin R. Canini kevin@cs.berkeley.edu Computer Science Division, University of California, Berkeley, CA 94720

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### arxiv:1410.4984v1 [cs.dc] 18 Oct 2014

Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University

### Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,

### An Alternative Prior Process for Nonparametric Bayesian Clustering

An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna M. Wallach Department of Computer Science University of Massachusetts Amherst Shane T. Jensen Department of Statistics The Wharton

### Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Jeffrey W. Miller Brenda Betancourt Abbas Zaidi Hanna Wallach Rebecca C. Steorts Abstract Most generative models for

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### A tutorial on Bayesian model selection. and on the BMSL Laplace approximation

A tutorial on Bayesian model selection and on the BMSL Laplace approximation Jean-Luc (schwartz@icp.inpg.fr) Institut de la Communication Parlée, CNRS UMR 5009, INPG-Université Stendhal INPG, 46 Av. Félix

### Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Bayesian inference for population prediction of individuals without health insurance in Florida

Bayesian inference for population prediction of individuals without health insurance in Florida Neung Soo Ha 1 1 NISS 1 / 24 Outline Motivation Description of the Behavioral Risk Factor Surveillance System,

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

### An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

### Discovering Structure in Continuous Variables Using Bayesian Networks

In: "Advances in Neural Information Processing Systems 8," MIT Press, Cambridge MA, 1996. Discovering Structure in Continuous Variables Using Bayesian Networks Reimar Hofmann and Volker Tresp Siemens AG,

### When scientists decide to write a paper, one of the first

Colloquium Finding scientific topics Thomas L. Griffiths* and Mark Steyvers *Department of Psychology, Stanford University, Stanford, CA 94305; Department of Brain and Cognitive Sciences, Massachusetts

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### Counting the number of Sudoku s by importance sampling simulation

Counting the number of Sudoku s by importance sampling simulation Ad Ridder Vrije University, Amsterdam, Netherlands May 31, 2013 Abstract Stochastic simulation can be applied to estimate the number of

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Sample Size Calculation for Finding Unseen Species

Bayesian Analysis (2009) 4, Number 4, pp. 763 792 Sample Size Calculation for Finding Unseen Species Hongmei Zhang and Hal Stern Abstract. Estimation of the number of species extant in a geographic region

### Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### ABC methods for model choice in Gibbs random fields

ABC methods for model choice in Gibbs random fields Jean-Michel Marin Institut de Mathématiques et Modélisation Université Montpellier 2 Joint work with Aude Grelaud, Christian Robert, François Rodolphe,

### Supporting Online Material for

www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

### Lecture 10: Sequential Data Models

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Computing with Finite and Infinite Networks

Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,

### A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

### UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA

### Efficient Bayesian Methods for Clustering

Efficient Bayesian Methods for Clustering Katherine Ann Heller B.S., Computer Science, Applied Mathematics and Statistics, State University of New York at Stony Brook, USA (2000) M.S., Computer Science,

### Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge

### Combining information from different survey samples - a case study with data collected by world wide web and telephone

Combining information from different survey samples - a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N-0314 Oslo Norway E-mail:

### Supplement to Regional climate model assessment. using statistical upscaling and downscaling techniques

Supplement to Regional climate model assessment using statistical upscaling and downscaling techniques Veronica J. Berrocal 1,2, Peter F. Craigmile 3, Peter Guttorp 4,5 1 Department of Biostatistics, University

### Machine Learning Overview

Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 1. As a scientific Discipline 2. As an area of Computer Science/AI

### CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

### Latent Dirichlet Markov Allocation for Sentiment Analysis

Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

### A credibility method for profitable cross-selling of insurance products

Submitted to Annals of Actuarial Science manuscript 2 A credibility method for profitable cross-selling of insurance products Fredrik Thuring Faculty of Actuarial Science and Insurance, Cass Business School,

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures

Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures Peter Orbanz and Daniel M. Roy Abstract. The natural habitat of most Bayesian methods is data represented by exchangeable sequences

### Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

### A non-parametric two-stage Bayesian model using Dirichlet distribution

Safety and Reliability Bedford & van Gelder (eds) 2003 Swets & Zeitlinger, Lisse, ISBN 90 5809 551 7 A non-parametric two-stage Bayesian model using Dirichlet distribution C. Bunea, R.M. Cooke & T.A. Mazzuchi

### THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### The Optimality of Naive Bayes

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

### Sampling for Bayesian computation with large datasets

Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### Probabilistic Modelling, Machine Learning, and the Information Revolution

Probabilistic Modelling, Machine Learning, and the Information Revolution Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

### Master s thesis tutorial: part III

for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General

### Methods of Data Analysis Working with probability distributions

Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### Selection Sampling from Large Data sets for Targeted Inference in Mixture Modeling

Selection Sampling from Large Data sets for Targeted Inference in Mixture Modeling Ioanna Manolopoulou, Cliburn Chan and Mike West December 29, 2009 Abstract One of the challenges of Markov chain Monte

### Non-parametric Bayesian Methods

Non-parametric Bayesian Methods Uncertainty in Artificial Intelligence Tutorial July 25 Zoubin Ghahramani Gatsby Computational Neuroscience Unit 1 University College London, UK Center for Automated Learning

### Probabilistic Matrix Factorization

Probabilistic Matrix Factorization Ruslan Salakhutdinov and Andriy Mnih Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada {rsalakhu,amnih}@cs.toronto.edu Abstract

### Preferences in college applications a nonparametric Bayesian analysis of top-10 rankings

Preferences in college applications a nonparametric Bayesian analysis of top-0 rankings Alnur Ali Microsoft Redmond, WA alnurali@microsoft.com Marina Meilă Dept of Statistics, University of Washington

### Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable