# Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Save this PDF as:

Size: px
Start display at page:

Download "Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm"

## Transcription

2 3. Season Headache Flu 4. Season Headache Flu, Dehydration 5. Season Nausea Dehydration 6. Season Nausea Dehydration, Headache 7. Flu Dehydration 8. Flu Dehydration Season, Headache 9. Flu Dehydration Season 10. Flu Dehydration Season, Nausea 11. Chills Nausea 12. Chills Nausea Headache Part 2: Factorized Joint Distributions [4 points] 1. Using the directed model shown in Figure 1, write down the factorized form of the joint distribution over all of the variables, P (S, F, D, C, H, N, Z). 2. Using the undirected model shown in Figure 2, write down the factorized form of the joint distribution over all of the variables, assuming the model is parameterized by one factor over each node and one over each edge in the graph. P (S = winter) P (S = summer) P (F = true S) P (F = false S) S = winter S = summer P (D = true S) P (D = false S) S = winter S = summer P (C = true F ) P (C = false F ) F = true F = false P (H = true F, D) P (H = false F, D) F = true, D = true F = true, D = false F = false, D = true F = false, D = false P (Z = true H) P (Z = false H) H = true H = false P (N = true D, Z) P (N = false D, Z) D = true, Z = true D = true, Z = false D = false, Z = true D = false, Z = false Table 1: Conditional probability tables for the Bayesian network shown in Figure 1. 2

3 Part 3: Evaluating Probability Queries [7 points] Assume you are given the conditional probability tables listed in Table 1 for the model shown in Figure 1. Evaluate each of the probabilities queries listed below, and show your calculations. 1. What is the probability that you have the flu, when no prior information is known? 2. What is the probability that you have the flu, given that it is winter? 3. What is the probability that you have the flu, given that it is winter and that you have a headache? 4. What is the probability that you have the flu, given that it is winter, you have a headache, and you know that you are dehydrated? 5. Does knowing you are dehydrated increase or decrease your likelihood of having the flu? Intuitively, does this make sense? Part 4: Bayesian Networks vs. Markov Networks [2 points] Now consider the undirected model shown in Figure Are there any differences between the set of marginal independencies encoded by the directed and undirected versions of this model? If not, state the full set of marginal independencies encoded by both models. If so, give one example of a difference. 2. Are there any differences between the set of conditional independencies encoded by the directed and undirected versions of this model? If so, give one example of a difference. Season Flu Dehydration Chills Headache Nausea Dizziness Figure 2: A Markov network that represents a joint distribution over the variables Season, Flu, Dehydration, Chills, Headache, Nausea, and Dizziness. 2 Bayesian Networks [25 points] Part 1: Constructing Bayesian Networks [8 points] In this problem you will construct your own Bayesian network (BN) for a few different modeling scenarios described as word problems. By standard convention, we will use shaded circles to represent observed quantities, clear circles to represent random variables, and uncircled symbols to represent distribution parameters. 3

4 In order to do this problem, you will first need to understand plate notation, which is a useful tool for drawing large BNs with many variables. Plates can be used to denote repeated sets of random variables. For example, suppose we have the following generative process: Draw Y Normal(µ, Σ) For m = 1,..., M: Draw X m Normal(Y, Σ) This BN contains M + 1 random variables, which includes M repeated variables X 1,..., X M that all have Y as a parent. In the BN, we draw the repeated variables by placing a box around a single node, with an index in the box describing the number of copies; we ve drawn this in Figure 3. Y Xm m = 1,...,M Figure 3: An example of a Bayesian network drawn with plate notation. For each of the modeling scenarios described below, draw a corresponding BN. Make sure to label your nodes using the variable names given below, and use plate notation if necessary. 1. (Gaussian Mixture Model). Suppose you want to model a set of clusters within a population of N entities, X 1,..., X N. We assume there are K clusters θ 1,..., θ K, and that each cluster represents a vector and a matrix, θ k = {µ k, Σ k }. We also assume that each entity X n belongs to one cluster, and its membership is given by an assignment variable Z n {1,..., K}. Here s how the variables in the model relate. Each entity X n is drawn from a so-called mixture distribution, which in this case is a Gaussian distribution, based on its individual cluster assignment and the entire set of clusters, written X n Normal(µ Zn, Σ Zn ). Each cluster assignment Z n has a prior, given by Z n Categorical(β). Finally, each cluster θ k also has a prior, given by θ k Normal-invWishart(µ 0, λ, Φ, ν) = Normal(µ 0, 1 λσ) invwishart(φ, ν). 2. (Bayesian Logistic Regression). Suppose you want to model the underlying relationship between a set of N input vectors X 1,..., X N and a corresponding set of N binary outcomes Y 1,..., Y N. We assume there is a single vector β which dictates the relationship between each input vector and its associated output variable. In this model, each output is drawn with Y n Bernoulli(invLogit(X n β)). Additionally, the vector β has a prior, given by β Normal(µ, Σ). Part 2: Inference in Bayesian Networks [12 points] In this problem you will derive formulas for inference tasks in Bayesian networks. Consider the Bayesian network given in Figure 4. 4

5 X3 X4 X5 X2 X6 X1 Figure 4: A Bayesian network over the variables X 1,..., X 6. Note that X 1 is observed (which is denoted by the fact that it s shaded in) and the remaining variables are unobserved. For each of the following questions, write down an expression involving the variables X 1,..., X 6 that could be computed by directly plugging in their local conditional probability distributions. First, give expressions for the following three posterior distributions over a particular variable given the observed evidence X1 = x1. 1. P (X 2 = x 2 X 1 = x 1 ) 2. P (X 3 = x 3 X 1 = x 1 ) 3. P (X 5 = x 5 X 1 = x 1 ) Second, give expressions for the following three conditional probability queries. Note that these types of expressions are useful for the inference algorithms that we ll learn later in the class. 4. P (X 2 = x 2 X 1 = x 1, X 3 = x 3, X 4 = x 4, X 5 = x 5, X 6 = x 6 ) 5. P (X 3 = x 3 X 1 = x 1, X 2 = x 2, X 4 = x 4, X 5 = x 5, X 6 = x 6 ) 6. P (X 5 = x 5 X 1 = x 1, X 2 = x 2, X 3 = x 3, X 4 = x 4, X 6 = x 6 ) Part 3: On Markov Blankets [5 points] In this problem you will prove a key property of Markov blankets in Bayesian networks. Recall that the Markov blanket of a node in a BN consists of the node s children, parents, and coparents (i.e. the children s other parents). Also recall that there are four basic types of two-edge trails in a BN, which are illustrated in Figure 5: the causal trail (head-to-tail), evidential trail (tail-to-head), common cause (tail-to-tail), and common effect (head-to-head). Causal Trail Evidential Trail Common Cause Common Effect Figure 5: Illustration of the four basic types of two-edge trails in a BN. Using the four trail types, prove the following property of BNs: given its Markov blanket, a node in a Bayesian network is conditionally independent of every other set of nodes. 5

6 es 3 Restricted Boltzmann Machines [25 points] Restricted Boltzmann Machines (RBMs) are a class of Markov networks that have been used in several applications, including image feature extraction, collaborative filtering, and recently in deep belief networks. An RBM is a bipartite Markov network consisting of a visible (observed) layer and a hidden layer, where each node is a binary random variable. One way to look at an RBM is that it models latent factors that can be learned from input features. For example, suppose we have samples of binary user ratings (like vs. dislike) on 5 movies: Finding Nemo (V 1 ), Avatar (V 2 ), Star Trek (V 3 ), Aladdin (V 4 ), and Frozen (V 5 ). We can construct the following RBM: Figure 6: An example RBM with 5 visible units and 2 hidden units. Here, the bottom layer consists of visible nodes V1,..., V5 that are random variables representing the binary ratings for the 5 movies, and H1, H2 are two hidden units that represent latent factors to be learned during training (e.g., H1 might be associated with Disney movies, and H2 could represent the adventure genre). If we are using an RBM for image feature extraction, the visible layer could instead denote binary values associated with each pixel, and the hidden layer would represent the latent features. However, for this problem we will stick with the movie example. In the following questions, let V = (V1,..., V5) be a vector of ratings (e.g. the observation v = (1, 0, 0, 0, 1) implies that a user likes only Finding Nemo and Aladdin). Similarly, let H = (H1, H2) be a vector of latent factors. Note that all the random variables are binary and take on states in {0, 1}. The joint distribution of a configuration is given by p(v = v, H = h) = 1 Z e E(v,h) (1) where E(v, h) = wijvihj aivi bjhj ij i j is the energy function, {wij}, {ai}, {bj} are model parameters, and Z = Z({wij}, {ai}, {bi}) = v,h e E(v,h) is the partition function, where the summation runs over all joint assignments to V and H. 1. [7 pts] Using Equation (1), show that p(h V ), the distribution of the hidden units conditioned on all of the visible units, can be factorized as p(h V ) = p(hj V ) (2) j where ( p(hj = 1 V = v) = σ bj + ) wijvi i and σ(s) = 1+es is the sigmoid function. Note that p(hj = 0 V = v) = 1 p(hj = 1 V = v). 6

7 2. [3 pts] Give the factorized form of p(v H), the distribution of the visible units conditioned on all of the hidden units. This should be similar to what s given in part 1, and so you may omit the derivation. 3. [2 pts] Can the marginal distribution over hidden units p(h) be factorized? If yes, give the factorization. If not, give the form of p(h) and briefly justify. 4. [4 pts] Based on your answers so far, does the distribution in Equation (1) respect the conditional independencies of Figure (6)? Explain why or why not. Are there any independencies in Figure 6 that are not captured in Equation (1)? 5. [7 pts] We can use the log-likelihood of the visible units, log p(v = v), as the criterion to learn the model parameters {w ij }, {a i }, {b j }. However, this maximization problem has no closed form solution. One popular technique for training this model is called contrastive divergence and uses an approximate gradient descent method. Compute the gradient of the log-likelihood objective with respect to w ij by showing the following: log p(v = v) w ij = h p(h = h V = v)v i h j v,h p(v = v, H = h)v i h j = E [V i H j V = v] E [V i H j ] where E [V i H j V = v] can be readily evaluated using Equation (2), but E [V i H j ] is tricky as the expectation is taken over not just H j but also V i. Hint 1: To save some writing, do not expand E(v, h) until you have E(v,h) w ij. Hint 2: The partition function, Z, is a function of w ij. 6. [2 pts] After training, suppose H 1 = 1 corresponds to Disney movies, and H 2 = 1 corresponds to the adventure genre. Which w ij do you expect to be positive, where i indexes the visible nodes and j indexes the hidden nodes? List all of them. 4 Image Denoising [25 points] This is a programming problem involving Markov networks (MNs) applied to the task of image denoising. Suppose we have an image consisting of a 2-dimensional array of pixels, where each pixel value Z i is binary, i.e. Z i {+1, 1}. Assume now that we make a noisy copy of the image, where each pixel in the image is flipped with 10% probability. A pixel in this noisy image is denoted by X i. We show the original image and the image with 10% noise in Figure 7. Given the observed array of noisy pixels, our goal is to recover the original array of pixels. To solve this problem, we model the original image and noisy image with the following MN. We have a latent variable Z i for each noise-free pixel, and an observed variable X i for each noisy pixel. Each variable Z i has an edge leading to its immediate neighbors (to the Z i associated with pixels above, below, to the left, and to the right, when they exist). Additionally, each variable Z i has an edge leading to its associated observed pixel X i. We illustrate this MN in Figure 8. Denote the full array of latent (noise-free) pixels as Z and the full array of observed (noisy) pixels as X. We define the energy function for this model as E(Z = z, X = x) = h i z i β {i,j} z i z j ν i z i x i (3) where the first and third summations are over the entire array of pixels, the second summation is over all pairs of latent variables connected by an edge, and h R, β R +, and ν R + denote constants that must be chosen. 7

8 Figure 7: The original binary image is shown on the left, and a noisy version of the image in which a randomly selected 10% of the pixels have been flipped is shown on the right. xij zij Figure 8: Illustration of the Markov network for image denoising. Using the binary image data saved in hw1 images.mat, your task will be to infer the true value of each pixel (+1 or 1) by optimizing the above energy function. To do this, initialize the Z i s to their noisy values, and then iterate through each Z i and check whether setting it s value to +1 or 1 yields a lower energy (higher probability). Repeat this process, making passes through all of the pixels, until the total energy of the model has converged. You must specify values of the constants h R, β R +, and ν R +. Report the error rate (fraction of pixels recovered incorrectly) that you achieve by comparing your denoised image to the original image that we provide, for three different settings of the three constants. Include a figure of your best denoised image in your writeup. Also make sure to submit a zipped copy of your code. The TAs will give a special prize to the student who is able to achieve the lowest error on this task. Hint 1: When evaluating whether +1 or 1 is a better choice for a particular pixel Z i, you do not need to evaluate the entire energy function, as this will be computationally very expensive. Instead, just compute the contribution by the terms that are affected by the value of Z i. Hint 2: If you d like to try and compete to achieve the best performance, you can work to find good parameters, or even modify the algorithm in an intelligent way (be creative!). However, if you come up with a modified algorithm, you should separately report the new error rate you achieve, and also turn in a second.m file (placed in your zipped code directory) with the modified algorithm. 8

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 23, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

### Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### Introduction to latent variable models

Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

### Generalized Linear Model Theory

Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

### Hidden Markov Models. and. Sequential Data

Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Probabilistic Graphical Models Final Exam

Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Graphical Models. CS 343: Artificial Intelligence Bayesian Networks. Raymond J. Mooney. Conditional Probability Tables.

Graphical Models CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin 1 If no assumption of independence is made, then an exponential number of parameters must

### Introduction to Deep Learning Variational Inference, Mean Field Theory

Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### Bayesian vs. Markov Networks

Bayesian vs. Markov Networks Le Song Machine Learning II: dvanced Topics CSE 8803ML, Spring 2012 Conditional Independence ssumptions Local Markov ssumption Global Markov ssumption 𝑋 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 𝑃𝑎𝑋

### Machine Learning

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

### Quantum Computation CMU BB, Fall 2015

Quantum Computation CMU 5-859BB, Fall 205 Homework 3 Due: Tuesday, Oct. 6, :59pm, email the pdf to pgarriso@andrew.cmu.edu Solve any 5 out of 6. [Swap test.] The CSWAP (controlled-swap) linear operator

### Text mining and topic models

Text mining and topic models Charles Elan elan@cs.ucsd.edu February 12, 2014 Text mining means the application of learning algorithms to documents consisting of words and sentences. Text mining tass include

### ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction. Avinash Kak Purdue University. January 4, :19am

ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction Avinash Kak Purdue University January 4, 2017 11:19am An RVL Tutorial Presentation originally presented in Summer 2008

### Bayesian network inference

Bayesian network inference Given: Query variables: X Evidence observed variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query variables osterior Xe MA estimate

### Regime Switching Models: An Example for a Stock Market Index

Regime Switching Models: An Example for a Stock Market Index Erik Kole Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam April 2010 In this document, I discuss in detail

### Lecture 10: Sequential Data Models

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### CIS 391 Introduction to Artificial Intelligence Practice Midterm II With Solutions (From old CIS 521 exam)

CIS 391 Introduction to Artificial Intelligence Practice Midterm II With Solutions (From old CIS 521 exam) Problem Points Possible 1. Perceptrons and SVMs 20 2. Probabilistic Models 20 3. Markov Models

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### CSC 321 H1S Study Guide (Last update: April 3, 2016) Winter 2016

1. Suppose our training set and test set are the same. Why would this be a problem? 2. Why is it necessary to have both a test set and a validation set? 3. Images are generally represented as n m 3 arrays,

### Towards running complex models on big data

Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### Lecture 9: Continuous

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 9: Continuous Latent Variable Models 1 Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets?

### Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

### Math 312 Lecture Notes Markov Chains

Math 312 Lecture Notes Markov Chains Warren Weckesser Department of Mathematics Colgate University Updated, 30 April 2005 Markov Chains A (finite) Markov chain is a process with a finite number of states

### CHAPTER 6 NEURAL NETWORK BASED SURFACE ROUGHNESS ESTIMATION

CHAPTER 6 NEURAL NETWORK BASED SURFACE ROUGHNESS ESTIMATION 6.1. KNOWLEDGE REPRESENTATION The function of any representation scheme is to capture the essential features of a problem domain and make that

### Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Segmentation of 2D Gel Electrophoresis Spots Using a Markov Random Field

Segmentation of 2D Gel Electrophoresis Spots Using a Markov Random Field Christopher S. Hoeflich and Jason J. Corso csh7@cse.buffalo.edu, jcorso@cse.buffalo.edu Computer Science and Engineering University

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Markov Chains, part I

Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X

### Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse

### Lecture 9. 1 Introduction. 2 Random Walks in Graphs. 1.1 How To Explore a Graph? CS-621 Theory Gems October 17, 2012

CS-62 Theory Gems October 7, 202 Lecture 9 Lecturer: Aleksander Mądry Scribes: Dorina Thanou, Xiaowen Dong Introduction Over the next couple of lectures, our focus will be on graphs. Graphs are one of

### 3. Markov property. Markov property for MRFs. Hammersley-Clifford theorem. Markov property for Bayesian networks. I-map, P-map, and chordal graphs

Markov property 3-3. Markov property Markov property for MRFs Hammersley-Clifford theorem Markov property for ayesian networks I-map, P-map, and chordal graphs Markov Chain X Y Z X = Z Y µ(x, Y, Z) = f(x,

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### Model Selection in Undirected Graphical Models with the Elastic Net

Model Selection in Undirected Graphical Models with the Elastic Net Mihai Cucuringu, Jesús Puente, and David Shue August 16, 2010 Abstract Structure learning in random fields has attracted considerable

### Solving systems of linear equations over finite fields

Solving systems of linear equations over finite fields June 1, 2007 1 Introduction 2 3 4 Basic problem Introduction x 1 + x 2 + x 3 = 1 x 1 + x 2 = 1 x 1 + x 2 + x 4 = 1 x 2 + x 4 = 0 x 1 + x 3 + x 4 =

### CSCC11 Assignment 2: Probability, Estimation, and Classification

CSCC11 Assignment 2: Probability, Estimation, and Classification Written Part (5%): due Friday, October 7th, 11am In the following questions be sure to show all your work. Answers without derivations will

### Discovering Structure in Continuous Variables Using Bayesian Networks

In: "Advances in Neural Information Processing Systems 8," MIT Press, Cambridge MA, 1996. Discovering Structure in Continuous Variables Using Bayesian Networks Reimar Hofmann and Volker Tresp Siemens AG,

### Examples and Proofs of Inference in Junction Trees

Examples and Proofs of Inference in Junction Trees Peter Lucas LIAC, Leiden University February 4, 2016 1 Representation and notation Let P(V) be a joint probability distribution, where V stands for a

### MS&E 226: Small Data

MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

### Structured Learning and Prediction in Computer Vision. Contents

Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

### BINARY PRINCIPAL COMPONENT ANALYSIS IN THE NETFLIX COLLABORATIVE FILTERING TASK

BINARY PRINCIPAL COMPONENT ANALYSIS IN THE NETFLIX COLLABORATIVE FILTERING TASK László Kozma, Alexander Ilin, Tapani Raiko Helsinki University of Technology Adaptive Informatics Research Center P.O. Box

### Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

### Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge

### 1 A simple example. A short introduction to Bayesian statistics, part I Math 218, Mathematical Statistics D Joyce, Spring 2016

and P (B X). In order to do find those conditional probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 18, Mathematical

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Learning High-Order MRF Priors of Color Images

Julian J. McAuley (Presenting) National ICT Australia, Canberra, ACT 0200, Australia University of New South Wales, Sydney, NSW 2052, Australia j.mcauley@student.unsw.edu.au Tibério S. Caetano National

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### Lecture 4: Thresholding

Lecture 4: Thresholding c Bryan S. Morse, Brigham Young University, 1998 2000 Last modified on Wednesday, January 12, 2000 at 10:00 AM. Reading SH&B, Section 5.1 4.1 Introduction Segmentation involves

### Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes

### Simulation of Non-linear Stochastic Equation Systems

Simulation of Non-linear Stochastic Equation Systems Veit Köppen 1, Hans-J. Lenz 2 (Freie Universität Berlin) Abstract In this paper, we combine data generated by Monte-Carlo simulation, which is based

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### An Adaptive Index Structure for High-Dimensional Similarity Search

An Adaptive Index Structure for High-Dimensional Similarity Search Abstract A practical method for creating a high dimensional index structure that adapts to the data distribution and scales well with

### A Formalization of the Turing Test Evgeny Chutchev

A Formalization of the Turing Test Evgeny Chutchev 1. Introduction The Turing test was described by A. Turing in his paper [4] as follows: An interrogator questions both Turing machine and second participant

### Lecture 15: Time Series Modeling Steven Skiena. skiena

Lecture 15: Time Series Modeling Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Modeling and Forecasting We seek to

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Homework 15 Solutions

PROBLEM ONE (Trees) Homework 15 Solutions 1. Recall the definition of a tree: a tree is a connected, undirected graph which has no cycles. Which of the following definitions are equivalent to this definition

### CSC321 Introduction to Neural Networks and Machine Learning. Lecture 21 Using Boltzmann machines to initialize backpropagation.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton Some problems with backpropagation The amount of information

### Spectral graph theory

Spectral graph theory Uri Feige January 2010 1 Background With every graph (or digraph) one can associate several different matrices. We have already seen the vertex-edge incidence matrix, the Laplacian

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Markov chains (cont d)

6.867 Machine learning, lecture 19 (Jaaola) 1 Lecture topics: Marov chains (cont d) Hidden Marov Models Marov chains (cont d) In the contet of spectral clustering (last lecture) we discussed a random wal

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding