The Chinese Restaurant Process

Save this PDF as:

Size: px
Start display at page:

Transcription

1 COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process. After a brief review of finite mixture models, we describe the Chinese Restaurant Process mixture, where the latent variables are distributed according to a Chinese Restaurant Process. We end by noting a connection between Dirchlet processes and the Chinese Restaurant Process mixture. The Chinese Restaurant Process We will define a distribution on the space of partitions of the positive integers, N. This would induce a distribution on the partitions of the first n integers, for every n N. Imagine a restaurant with countably infinitely many tables, labelled 1, 2,.... Customers walk in and sit down at some table. The tables are chosen according to the following random process. 1. The first customer always chooses the first table. 2. The nth customer chooses the first unoccupied table with probability α c, and an occupied table with probability, where c is the n 1+α n 1+α number of people sitting at that table. In the above, α is a scalar parameter of the process. One might check that the above does define a probability distribution. Let us denote by k n the number of different tables occupied after n customers have walked in. Then 1 k n n and it follows from the above description that precisely tables 1,..., k n are occupied. Example A possible arrangement of 10 customers is shown in Figure 1. Denote by z i the table occupied by the customer i. The probability of this arrangement is 1

2 1, 3, 8 2, 5, 9, 10 4, 6, 7... Figure 1: The Chinese restaurant process. Circles represent tables and the numbers around them are the customers sitting at that table. Pr(z 1,..., z 10 ) = Pr(z 1 ) Pr(z 2 z 1 )... Pr(z 10 z 1,..., z 9 ) = α α 1 α α 1 + α 2 + α 3 + α 4 + α 5 + α 6 + α 7 + α 8 + α 9 + α We can make the following observations: 1. The probability of a seating is invariant under permutations. Permuting the customers permutes the numerators in the above computation, while the denominators remains the same. This property is known as exchangeability. 2. Any seating arrangement creates a partition. For example, the above seating arrangement partitions customers 1,..., 10 into the following three groups (1, 3, 8), (2, 5, 9, 10), (4, 6, 7). Exchangeability now implies that two seating arrangements whose partitions consist of the same number of components with identical sizes will have the same probability. For instance, the probability of any seating arrangement of ten customers where three tables are occupied, with three customers each on two of the tables and the remaining four on the third table, will have the same probability as the seating in our example. Thus we could define a distribution on the space of all partitions of the integer n, where n is the total number of customers. The number of partitions is given by the partition function p(n), which has no simple closed form. Asymptotically, p(n) = exp(o( n)). 2

3 3. The expected number of occupied tables for n customers grows logarithmically. In particular E[k n α] = O(α log n) This can be seen as follows: Let X i be the IRV for the event that the i th customer starts a new table. The probability of this happening is Pr[X i = 1] = α/(i 1+α). Since k n = i X i, by linearity of expectatin the summation is equal to i α/(α + i 1) which is upper bounded by O(αH n ) where H n is the nth harmonic sum. By taking the limit as n goes to infinity, we could perhaps define a distribution on the space of all natural numbers. However, technical difficulties might arise while dealing with distributions over infinite sequences, and appropriate sigma algebras have to be chosen. Chapters 1-4 of [5], and the lecture notes from ORFE 551 (sent to course mailing list) are recommended for reading up basic measure theory. Review: Finite mixture models Finite mixture models are latent variable models. To model data via finite mixtures of distributions, the basic steps are 1. Posit hidden random variables 2. Construct a joint distribution of observed and hidden random variables 3. Compute the posterior distribution of the hidden variables given the observed variables. Examples include Gaussian mixtures, Kalman filter, Factor analysis, Hidden Markov models, etc. We review the Gaussian mixture model. A Gaussian mixture with K components, for fixed K, can be described by the following generative process: 1. Choose cluster proportions π Dir(α), where Dir(α) is a dirichlet prior, with parameter α, over distributions over k points. 2. Choose K means µ k N(0, σ 2 µ) 3

4 3. For each data point: (a) Choose cluster assignment z Mult(π) (b) Choose x N(µ z, σ 2 X ). The above gives a model for the joint distribution Pr(π, µ 1:K, z 1:N x 1:N α, σµ, 2 σx 2 ). We will drop α, σµ, 2 σx 2 from now on, and assume they are known and fixed. The posterior distribution, given data x 1:N is Pr(π, µ 1:K, z 1:N x 1:N ) and has the followin: 1. This decomposed the data over the latent space, thus revealing underlying structure when present. 2. The posterior helps predict a new data point via the predictive distribution Pr(x x 1:N ). For simplicity, we show how to calculate this quantity when the cluster proportions π is fixed. Pr(x x 1:N, π) = z = z = z Pr(x, z, µ z x 1:N, π)dµ z µ z π z Pr(x µ z ) Pr(µ z x 1:N )dµ z µ z π z Pr(x µ z ) Pr(µ z x 1:N )dµ z µ z We could compute Pr(µ z x 1:N )dµ z from the posterior Pr(µ 1:K, z 1:N x 1:N, π) by marginalizing out the z 1:N and the µ k for k z. This would enable us to complete the above calculation to obtain the predictive distribution. Chinese Restaurant Process Mixture A Chinese restaurant process mixture is constructed by the following procedure: 1. Endow each table with a mean, µ k N(0, σ2 µ), k = 1, 2,.... 2a. Customer n sits down at table z n CRP(α; z 1,..., z n 1 ). 4

5 1, 3, 8 2, 5, 9, 10 4, 6, 7 µ 1 µ 2 µ 3... Figure 2: The Chinese restaurant process mixture. Each table k has a mean µ k and customers sitting at it are distributed according to that mean. Tables 1 through 3 and customers 1 through 10 are pictured. 2b. A datapoint is drawn x n N(µ z n, σ 2 x). We will consider the posterior and predictive distributions. The hidden variables are the infinite collection of means, µ 1, µ 2,..., and the cluster assignments z 1, z 2,.... Consider the posterior on these hidden variables given the first N datapoints, p(µ 1:N, z 1:N x 1:N, θ), where we define θ = (σ 2 µ, σ 2 x, α) to contain the fixed parameters of the model. Note that we only need to care about the means of the first N tables because the rest will have posterior distribution N(0, σ 2 µ) unchanged from the prior. Similarly, we only need to care about the cluser assignments of the first N customers because the rest will have posterior equal to the prior. The predictive distribution given the data and some additional hidden variables is p(x x 1:N, µ 1:N+1, z 1:N, θ) = 1+K N z=1 p(z z 1:N )p(x z, µ z). These hidden variables may be integrated out with the posterior on the hidden variables to give the predictive distribution conditioned only on the data. Note that, whereas in the finite mixture model the cluster proportions were modeled explicitly, here the cluster proportions are within the z variables. Also note that permuting the x 1:N results in the same predictive distribution, so we have exchangeability here as we did earlier. Why is exchangeability important? Having exchangeability is as though we drew a parameter from a prior and then drew data independently and 5

6 identically from that prior. Thus, the data are independent conditioned on the parameter, but are not independent in general. This is weaker than assuming independence. Specifically, DeFinetti s exchangeability theorem [1] states that the exchangeability of a random sequence x 1, x 2,... is equivalent to having a parameter θ drawn from a distribution F ( ) and then choosing x n iid from the distribution implied by θ. That is, θ F ( ) x n iid θ. We may apply this idea to the Chinese restaurant process mixture, which is really a distribution on distributions, or a distribution on the (µ z1, µ z2,...). The random means µ z1, µ z2,... are exchangeable, so this implies that their distribution may be expressed in the form given by DeFinetti s exchangeability theorem. Their distribution is given by G DirichletProcess(αG 0 ) µ zi iid G. Here G 0 is the distribution of the µ on the reals, e.g., N(0, σ 2 u). Note that we get repeated values when sampling the z i whenever a customer sits at an already occupied table. Then customer i has µ zi identical to the µ zj of all customers j previously seated at the table. In fact, G is an almost surely discrete distribution on the reals with a countably infinite number of atoms. The reading for next week, [2, 4], is on how to compute the posterior of the hidden parameters given the data, and also includes two new perspectives on the Chinese restaurant process. One perspective is the one just described, of the Chinese restaurant process as a Dirichlet process, and the other is as an infinite limit of finite mixture models. In the reading, focus on [4]. In addition, a good general reference on Bayesian statistics that may be helpful in the course is [3]. References [1] Jose-M. Bernardo and Adrian F. M. Smith. Bayesian theory. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Ltd., Chichester,

7 [2] Michael D. Escobar and Mike West. Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc., 90(430): , [3] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian data analysis. Texts in Statistical Science Series. Chapman & Hall/CRC, Boca Raton, FL, second edition, [4] Radford M. Neal. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist., 9(2): , [5] David Williams. Probability with martingales. Cambridge Mathematical Textbooks. Cambridge University Press, Cambridge,

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

Set theory, and set operations

1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

LECTURE 4. Last time: Lecture outline

LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

MITES Physics III Summer Introduction 1. 3 Π = Product 2. 4 Proofs by Induction 3. 5 Problems 5

MITES Physics III Summer 010 Sums Products and Proofs Contents 1 Introduction 1 Sum 1 3 Π Product 4 Proofs by Induction 3 5 Problems 5 1 Introduction These notes will introduce two topics: A notation which

The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

15 Markov Chains: Limiting Probabilities

MARKOV CHAINS: LIMITING PROBABILITIES 67 Markov Chains: Limiting Probabilities Example Assume that the transition matrix is given by 7 2 P = 6 Recall that the n-step transition probabilities are given

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

General theory of stochastic processes

CHAPTER 1 General theory of stochastic processes 1.1. Definition of stochastic process First let us recall the definition of a random variable. A random variable is a random number appearing as a result

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

1 Limiting distribution for a Markov chain

Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easy-to-check

Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Jeffrey W. Miller Brenda Betancourt Abbas Zaidi Hanna Wallach Rebecca C. Steorts Abstract Most generative models for

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

CS540 Machine learning Lecture 14 Mixtures, EM, Non-parametric models

CS540 Machine learning Lecture 14 Mixtures, EM, Non-parametric models Outline Mixture models EM for mixture models K means clustering Conditional mixtures Kernel density estimation Kernel regression GMM

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Sequences and Series Overview Number of instruction days: 4 6 (1 day = 53 minutes) Content to Be Learned Write arithmetic and geometric sequences both recursively and with an explicit formula, use them

Mathematical Induction

Mathematical Induction MAT30 Discrete Mathematics Fall 016 MAT30 (Discrete Math) Mathematical Induction Fall 016 1 / 19 Outline 1 Mathematical Induction Strong Mathematical Induction MAT30 (Discrete Math)

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

Lecture 4: Probability Spaces

EE5110: Probability Foundations for Electrical Engineers July-November 2015 Lecture 4: Probability Spaces Lecturer: Dr. Krishna Jagannathan Scribe: Jainam Doshi, Arjun Nadh and Ajay M 4.1 Introduction

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

On the Least Prime Number in a Beatty Sequence. Jörn Steuding (Würzburg) - joint work with Marc Technau - Salamanca, 31 July 2015

On the Least Prime Number in a Beatty Sequence Jörn Steuding (Würzburg) - joint work with Marc Technau - Salamanca, 31 July 2015 i. Beatty Sequences ii. Prime Numbers in a Beatty Sequence iii. The Least

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

, for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

To discuss this topic fully, let us define some terms used in this and the following sets of supplemental notes.

INFINITE SERIES SERIES AND PARTIAL SUMS What if we wanted to sum up the terms of this sequence, how many terms would I have to use? 1, 2, 3,... 10,...? Well, we could start creating sums of a finite number

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

Lecture 8: Random Walk vs. Brownian Motion, Binomial Model vs. Log-Normal Distribution

Lecture 8: Random Walk vs. Brownian Motion, Binomial Model vs. Log-ormal Distribution October 4, 200 Limiting Distribution of the Scaled Random Walk Recall that we defined a scaled simple random walk last

Gibbs Sampling and Online Learning Introduction

Statistical Techniques in Robotics (16-831, F14) Lecture#10(Tuesday, September 30) Gibbs Sampling and Online Learning Introduction Lecturer: Drew Bagnell Scribes: {Shichao Yang} 1 1 Sampling Samples are

Probabilities and Random Variables

Probabilities and Random Variables This is an elementary overview of the basic concepts of probability theory. 1 The Probability Space The purpose of probability theory is to model random experiments so

Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

Lectures 5-6: Taylor Series

Math 1d Instructor: Padraic Bartlett Lectures 5-: Taylor Series Weeks 5- Caltech 213 1 Taylor Polynomials and Series As we saw in week 4, power series are remarkably nice objects to work with. In particular,

Advanced Quantitative Methods: Mathematics (p)review University College Dublin 22 January 2013 1 Course outline 2 3 Outline Course outline 1 Course outline 2 3 Introduction Course outline Topic: advanced

Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.

Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than

OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder

MARKOV DECISIO PROCESSES Ulrich Rieder University of Ulm, Germany Keywords: Markov decision problem, stochastic dynamic program, total reward criteria, average reward, optimal policy, optimality equation,

4. Factor polynomials over complex numbers, describe geometrically, and apply to real-world situations. 5. Determine and apply relationships among syn

I The Real and Complex Number Systems 1. Identify subsets of complex numbers, and compare their structural characteristics. 2. Compare and contrast the properties of real numbers with the properties of

11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

Contents. TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics. Yuming Jiang. Basic Concepts Random Variables

TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics Yuming Jiang 1 Some figures taken from the web. Contents Basic Concepts Random Variables Discrete Random Variables Continuous Random

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

Statistics 100 Binomial and Normal Random Variables

Statistics 100 Binomial and Normal Random Variables Three different random variables with common characteristics: 1. Flip a fair coin 10 times. Let X = number of heads out of 10 flips. 2. Poll a random

Sample Size Calculation for Finding Unseen Species

Bayesian Analysis (2009) 4, Number 4, pp. 763 792 Sample Size Calculation for Finding Unseen Species Hongmei Zhang and Hal Stern Abstract. Estimation of the number of species extant in a geographic region

A REVIEW OF CONSISTENCY AND CONVERGENCE OF POSTERIOR DISTRIBUTION

A REVIEW OF CONSISTENCY AND CONVERGENCE OF POSTERIOR DISTRIBUTION By Subhashis Ghosal Indian Statistical Institute, Calcutta Abstract In this article, we review two important issues, namely consistency

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

Anomaly detection for Big Data, networks and cyber-security

Anomaly detection for Big Data, networks and cyber-security Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nick Heard (Imperial College London),

Preliminary Version: December 1998

On the Number of Prime Numbers less than a Given Quantity. (Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse.) Bernhard Riemann [Monatsberichte der Berliner Akademie, November 859.] Translated

Bracken County Schools Curriculum Guide Math

Unit 1: Expressions and Equations (Ch. 1-3) Suggested Length: Semester Course: 4 weeks Year Course: 8 weeks Program of Studies Core Content 1. How do you use basic skills and operands to create and solve

A Curious Case of a Continued Fraction

A Curious Case of a Continued Fraction Abigail Faron Advisor: Dr. David Garth May 30, 204 Introduction The primary purpose of this paper is to combine the studies of continued fractions, the Fibonacci

L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

Thomas L. Griffiths. Department of Psychology. University of California, Berkeley. Adam N. Sanborn. Department of Psychological and Brain Sciences

Categorization as nonparametric Bayesian density estimation 1 Running head: CATEGORIZATION AS NONPARAMETRIC BAYESIAN DENSITY ESTIMATION Categorization as nonparametric Bayesian density estimation Thomas

Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution

Görür D, Rasmussen CE. Dirichlet process Gaussian mixture models: Choice of the base distribution. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 5(4): 615 66 July 010/DOI 10.1007/s11390-010-1051-1 Dirichlet

Discussion of Çakmakli and Altug: Contructing Coincident Economic Indicators for Emerging Economies

: Contructing Coincident Economic Indicators for Emerging Economies Peter European Commission, Joint Research Centre (JRC); CEU and IE-CERS, HAS October 16-17, 2014, Istanbul Disclaimer The views expressed

61. REARRANGEMENTS 119

61. REARRANGEMENTS 119 61. Rearrangements Here the difference between conditionally and absolutely convergent series is further refined through the concept of rearrangement. Definition 15. (Rearrangement)

Fifth Problem Assignment

EECS 40 PROBLEM (24 points) Discrete random variable X is described by the PMF { K x p X (x) = 2, if x = 0,, 2 0, for all other values of x Due on Feb 9, 2007 Let D, D 2,..., D N represent N successive

INTRODUCTION TO THE CONVERGENCE OF SEQUENCES

INTRODUCTION TO THE CONVERGENCE OF SEQUENCES BECKY LYTLE Abstract. In this paper, we discuss the basic ideas involved in sequences and convergence. We start by defining sequences and follow by explaining

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

Combining information from different survey samples - a case study with data collected by world wide web and telephone

Combining information from different survey samples - a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N-0314 Oslo Norway E-mail:

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

Algebraic Computation Models. Algebraic Computation Models

Algebraic Computation Models Ζυγομήτρος Ευάγγελος ΜΠΛΑ 201118 Φεβρουάριος, 2013 Reasons for using algebraic models of computation The Turing machine model captures computations on bits. Many algorithms

Pythagorean Triples, Complex Numbers, Abelian Groups and Prime Numbers

Pythagorean Triples, Complex Numbers, Abelian Groups and Prime Numbers Amnon Yekutieli Department of Mathematics Ben Gurion University email: amyekut@math.bgu.ac.il Notes available at http://www.math.bgu.ac.il/~amyekut/lectures

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

MARTINGALES AND GAMBLING Louis H. Y. Chen Department of Mathematics National University of Singapore

MARTINGALES AND GAMBLING Louis H. Y. Chen Department of Mathematics National University of Singapore 1. Introduction The word martingale refers to a strap fastened belween the girth and the noseband of

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected

State of Stress at Point

State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,

Limit processes are the basis of calculus. For example, the derivative. f f (x + h) f (x)

SEC. 4.1 TAYLOR SERIES AND CALCULATION OF FUNCTIONS 187 Taylor Series 4.1 Taylor Series and Calculation of Functions Limit processes are the basis of calculus. For example, the derivative f f (x + h) f

Applications of R Software in Bayesian Data Analysis

Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

An Analysis of Rank Ordered Data

An Analysis of Rank Ordered Data Krishna P Paudel, Louisiana State University Biswo N Poudel, University of California, Berkeley Michael A. Dunn, Louisiana State University Mahesh Pandit, Louisiana State

Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

Lecture 4: BK inequality 27th August and 6th September, 2007

CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

PROBABILITIES AND PROBABILITY DISTRIBUTIONS

Published in "Random Walks in Biology", 1983, Princeton University Press PROBABILITIES AND PROBABILITY DISTRIBUTIONS Howard C. Berg Table of Contents PROBABILITIES PROBABILITY DISTRIBUTIONS THE BINOMIAL

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

The Concept of Exchangeability and its Applications

Departament d Estadística i I.O., Universitat de València. Facultat de Matemàtiques, 46100 Burjassot, València, Spain. Tel. 34.6.363.6048, Fax 34.6.363.6048 (direct), 34.6.386.4735 (office) Internet: bernardo@uv.es,

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =