Classification: Naive Bayes and Logistic Regression

Size: px
Start display at page:

Download "Classification: Naive Bayes and Logistic Regression"

Transcription

1 Classification: Naive Bayes and Logistic Regression Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder SEPTEMBER 17, 2014 Slides adapted from Hinrich Schütze and Lauren Hannah Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 1 of 34

2 By the end of today... You ll be able to frame many standard nlp tasks as classification problems Apply logistic regression (given weights) to classify data Learn naïve bayes from data Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 2 of 34

3 Classification Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 3 of 34

4 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

5 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Examples are represented in this space. (e.g., each document has some subset of the vocabulary; more in a second) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

6 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Examples are represented in this space. (e.g., each document has some subset of the vocabulary; more in a second) A fixed set of classes C = {c 1,c 2,...,c J } Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

7 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Examples are represented in this space. (e.g., each document has some subset of the vocabulary; more in a second) A fixed set of classes C = {c 1,c 2,...,c J } The classes are human-defined for the needs of an application (e.g., spam vs. ham). Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

8 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Examples are represented in this space. (e.g., each document has some subset of the vocabulary; more in a second) A fixed set of classes C = {c 1,c 2,...,c J } The classes are human-defined for the needs of an application (e.g., spam vs. ham). A training set D of labeled documents with each labeled document d 2 X C Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

9 Classification Formal definition of Classification Given: A universe X our examples can come from (e.g., English documents with a predefined vocabulary) Examples are represented in this space. (e.g., each document has some subset of the vocabulary; more in a second) A fixed set of classes C = {c 1,c 2,...,c J } The classes are human-defined for the needs of an application (e.g., spam vs. ham). A training set D of labeled documents with each labeled document d 2 X C Using a learning method or learning algorithm, we then wish to learn a classifier that maps documents to classes: : X! C Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 4 of 34

10 Classification Topic classification (d 0 )=China regions industries subject areas classes: UK China poultry coffee elections sports d 0 training set: congestion London Olympics Beijing feed chicken roasting beans recount votes diamond baseball test set: first private Chinese airline Parliament tourism pate arabica seat forward Big Ben Great Wall ducks robusta run-off soccer Windsor Mao bird flu Kenya TV ads team the Queen communist turkey harvest campaign captain Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 5 of 34

11 Classification Examples of how search engines use classification Standing queries (e.g., Google Alerts) Language identification (classes: English vs. French etc.) The automatic detection of spam pages (spam vs. nonspam) The automatic detection of sexually explicit content (sexually explicit vs. not) Sentiment detection: is a movie or product review positive or negative (positive vs. negative) Topic-specific or vertical search restrict search to a vertical like related to health (relevant to vertical vs. not) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 6 of 34

12 Classification Classification methods: 1. Manual Manual classification was used by Yahoo in the beginning of the web. Also: ODP, PubMed Very accurate if job is done by experts Consistent when the problem size and team is small Scaling manual classification is difficult and expensive.! We need automatic methods for classification. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 7 of 34

13 Classification Classification methods: 2. Rule-based There are IDE type development enviroments for writing very complex rules efficiently. (e.g., Verity) Often: Boolean combinations (as in Google Alerts) Accuracy is very high if a rule has been carefully refined over time by a subject expert. Building and maintaining rule-based classification systems is expensive. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 8 of 34

14 Classification Classification methods: 3. Statistical/Probabilistic As per our definition of the classification problem text classification as a learning problem Supervised learning of a the classification function classifying new documents and its application to We will look at a couple of methods for doing this: Naive Bayes, Logistic Regression, SVM, Decision Trees No free lunch: requires hand-classified training data But this manual classification can be done by non-experts. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 9 of 34

15 Logistic Regression Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 10 of 34

16 Logistic Regression Generative vs. Discriminative Models Goal, given observation x, compute probability of label y, p(y x) Naïve Bayes (later) uses Bayes rule to reverse conditioning What if we care about p(y x)? We need a more general framework... Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 11 of 34

17 Logistic Regression Generative vs. Discriminative Models Goal, given observation x, compute probability of label y, p(y x) Naïve Bayes (later) uses Bayes rule to reverse conditioning What if we care about p(y x)? We need a more general framework... That framework is called logistic regression Logistic: A special mathematical function it uses Regression: Combines a weight vector with observations to create an answer More general cookbook for building conditional probability distributions Naïve Bayes (later today) is a special case of logistic regression Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 11 of 34

18 Logistic Regression Logistic Regression: Definition Weight vector i Observations X i Bias 0 (like intercept in linear regression) P(Y = 0 X)= P(Y = 1 X)= For shorthand, we ll say that X P(Y = 0 X)= ( ( 0 + Where (z)= 1 î 1 + exp 0 + P ó (1) i ix i î exp 0 + P ó i ix i î 1 + exp 0 + P ó (2) i ix i X P(Y = 1 X)=1 ( ( exp[ z] i ix i )) (3) i ix i )) (4) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 12 of 34

19 Logistic Regression What s this exp? Exponential Logistic exp[x] is shorthand for e x e is a special number, about e x is the limit of compound interest formula as compounds become infinitely small It s the function whose derivative is itself The logistic function is (z)= 1 1+e z Looks like an S Always between 0 and 1. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 13 of 34

20 Logistic Regression What s this exp? Exponential Logistic exp[x] is shorthand for e x e is a special number, about e x is the limit of compound interest formula as compounds become infinitely small It s the function whose derivative is itself The logistic function is (z)= 1 1+e z Looks like an S Always between 0 and 1. Allows us to model probabilities Different from linear regression Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 13 of 34

21 Logistic Regression Example Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 14 of 34

22 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria Example 1: Empty Document? X = {} What does Y = 1 mean? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

23 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria Example 1: Empty Document? X = {} P(Y = 0)= P(Y = 1)= 1 1+exp[0.1] = exp[0.1] 1+exp[0.1] = What does Y = 1 mean? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

24 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria What does Y = 1 mean? Example 1: Empty Document? X = {} P(Y = 0)= 1 1+exp[0.1] = 0.48 P(Y = 1)= exp[0.1] 1+exp[0.1] =.52 Bias 0 encodes the prior probability of a class Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

25 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria Example 2 X = {Mother, Nigeria} What does Y = 1 mean? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

26 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria What does Y = 1 mean? Example 2 X = {Mother, Nigeria} P(Y = 0)= 1 1+exp[ ] = exp[ ] P(Y = 1)= 1+exp[ ] = Include bias, and sum the other weights Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

27 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria What does Y = 1 mean? Example 2 X = {Mother, Nigeria} P(Y = 0)= 0.11 P(Y = 1)= exp[ ] = exp[ ] 1+exp[ ] = Include bias, and sum the other weights Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

28 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria Example 3 X = {Mother, Work, Viagra, Mother} What does Y = 1 mean? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

29 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria What does Y = 1 mean? Example 3 X = {Mother, Work, Viagra, Mother} P(Y = 0)= 1 1+exp[ ] = P(Y = 1)= exp[ ] 1+exp[ ] = Multiply feature presence by weight Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

30 Logistic Regression Example Logistic Regression Example feature coefficient weight bias viagra mother work nigeria What does Y = 1 mean? Example 3 X = {Mother, Work, Viagra, Mother} P(Y = 0)= 1 1+exp[ ] = 0.60 P(Y = 1)= exp[ ] 1+exp[ ] = 0.30 Multiply feature presence by weight Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 15 of 34

31 Logistic Regression Example How is Logistic Regression Used? Given a set of weights ~, we know how to compute the conditional likelihood P(y, x) Find the set of weights ~ that maximize the conditional likelihood on training data (where y is known) A subset of a more general class of methods called maximum entropy models (next week) Intuition: higher weights mean that this feature implies that this feature is a good this is the class you want for this observation Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 16 of 34

32 Logistic Regression Example How is Logistic Regression Used? Given a set of weights ~, we know how to compute the conditional likelihood P(y, x) Find the set of weights ~ that maximize the conditional likelihood on training data (where y is known) A subset of a more general class of methods called maximum entropy models (next week) Intuition: higher weights mean that this feature implies that this feature is a good this is the class you want for this observation Naïve Bayes is a special case of logistic regression that uses Bayes rule and conditional probabilities to set these weights Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 16 of 34

33 Motivating Naïve Bayes Example Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 17 of 34

34 Motivating Naïve Bayes Example A Classification Problem Suppose that I have two coins, C 1 and C 2 Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: C1: C2: C1: 0 1 C1: C2: C2: Now suppose I am given a new sequence, 0 0 1; which coin is it from? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 18 of 34

35 Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: different numbers of covariates for each observation number of covariates can be large However, there is some structure: Easy to get P(C 1 ), P(C 2 ) Also easy to get P(X i = 1 C 1 ) and P(X i = 1 C 2 ) By conditional independence, P(X = 010 C 1 )=P(X 1 = 0 C 1 )P(X 2 = 1 C 1 )P(X 2 = 0 C 1 ) Can we use these to get P(C 1 X = 001)? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 19 of 34

36 Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: different numbers of covariates for each observation number of covariates can be large However, there is some structure: Easy to get P(C 1 )= 4/7, P(C 2 )= 3/7 Also easy to get P(X i = 1 C 1 ) and P(X i = 1 C 2 ) By conditional independence, P(X = 010 C 1 )=P(X 1 = 0 C 1 )P(X 2 = 1 C 1 )P(X 2 = 0 C 1 ) Can we use these to get P(C 1 X = 001)? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 19 of 34

37 Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: different numbers of covariates for each observation number of covariates can be large However, there is some structure: Easy to get P(C 1 )= 4/7, P(C 2 )= 3/7 Also easy to get P(X i = 1 C 1 )= 12/16 and P(X i = 1 C 2 )= 6/18 By conditional independence, P(X = 010 C 1 )=P(X 1 = 0 C 1 )P(X 2 = 1 C 1 )P(X 2 = 0 C 1 ) Can we use these to get P(C 1 X = 001)? Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 19 of 34

38 Motivating Naïve Bayes Example A Classification Problem Summary: have P(data class), want P(class data) Solution: Bayes rule! P(class data)= P(data class)p(class) P(data) = P(data class)p(class) P C class=1 P(data class)p(class) To compute, we need to estimate P(data class), P(class) for all classes Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 20 of 34

39 Motivating Naïve Bayes Example Naive Bayes Classifier This works because the coin flips are independent given the coin parameter. What about this case: want to identify the type of fruit given a set of features: color, shape and size color: red, green, yellow or orange (discrete) shape: round, oval or long+skinny (discrete) size: diameter in inches (continuous) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 21 of 34

40 Motivating Naïve Bayes Example Naive Bayes Classifier Conditioned on type of fruit, these features are not necessarily independent: Given category apple, the color green has a higher probability given size < 2 : P(green size < 2, apple) > P(green apple) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 22 of 34

41 Motivating Naïve Bayes Example Naive Bayes Classifier Using chain rule, P(apple green,round,size = 2) P(green,round,size = 2 apple)p(apple) = P P(green,round,size = 2 fruit j)p(fruit j) fruits / P(green round,size = 2,apple)P(round size = 2,apple) P(size = 2 apple)p(apple) But computing conditional probabilities is hard! There are many combinations of (color, shape, size) for each fruit. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 23 of 34

42 Motivating Naïve Bayes Example Naive Bayes Classifier Idea: assume conditional independence for all features given class, P(green round,size = 2,apple)=P(green apple) P(round green,size = 2,apple)=P(round apple) P(size = 2 green,round,apple)=p(size = 2 apple) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 24 of 34

43 Naive Bayes Definition Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 25 of 34

44 Naive Bayes Definition The Naive Bayes classifier The Naive Bayes classifier is a probabilistic classifier. We compute the probability of a document d being in a class c as follows: Y P(c d) / P(c) P(w i c) 1appleiapplen d Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 26 of 34

45 Naive Bayes Definition The Naive Bayes classifier The Naive Bayes classifier is a probabilistic classifier. We compute the probability of a document d being in a class c as follows: Y P(c d) / P(c) P(w i c) 1appleiapplen d Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 26 of 34

46 Naive Bayes Definition The Naive Bayes classifier The Naive Bayes classifier is a probabilistic classifier. We compute the probability of a document d being in a class c as follows: Y P(c d) / P(c) P(w i c) 1appleiapplen d n d is the length of the document. (number of tokens) P(w i c) is the conditional probability of term w i occurring in a document of class c P(w i c) as a measure of how much evidence w i contributes that c is the correct class. P(c) is the prior probability of c. If a document s terms do not provide clear evidence for one class vs. another, we choose the c with higher P(c). Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 26 of 34

47 Naive Bayes Definition Maximum a posteriori class Our goal is to find the best class. The best class in Naive Bayes classification is the most likely or maximum a posteriori (MAP) class c map : c map = argmax ˆP(c j d)=argmax c j 2C c j 2C ˆP(c j ) Y 1appleiapplen d ˆP(wi c j ) We write ˆP for P since these values are estimates from the training set. Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 27 of 34

48 Naive Bayes Definition Naive Bayes Classifier Why conditional independence? estimating multivariate functions (like P(X 1,...,X m Y)) is mathematically hard, while estimating univariate ones is easier (like P(X i Y)) need less data to fit univariate functions well univariate estimators differ much less than multivariate estimator (low variance)... but they may end up finding the wrong values (more bias) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 28 of 34

49 Naive Bayes Definition Naïve Bayes conditional independence assumption To reduce the number of parameters to a manageable size, recall the Naive Bayes conditional independence assumption: P(d c j )=P(hw 1,...,w nd i c j )= Y 1appleiapplen d P(X i = w i c j ) We assume that the probability of observing the conjunction of attributes is equal to the product of the individual probabilities P(X i = w i c j ). Our estimates for these priors and conditional probabilities: ˆP(cj )= N c+1 N+ C and ˆP(w c)= T cw +1 ( P w 0 2V T cw 0)+ V Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 29 of 34

50 Naive Bayes Definition Implementation Detail: Taking the log Multiplying lots of small probabilities can result in floating point underflow. From last time lg is logarithm base 2; ln is logarithm base e. lgx = a, 2 a = x lnx = a, e a = x (5) Since ln(xy)=ln(x)+ln(y), we can sum log probabilities instead of multiplying probabilities. Since ln is a monotonic function, the class with the highest score does not change. So what we usually compute in practice is: Y c map = argmax c j 2C [ˆP(cj ) ˆP(wi c j )] 1appleiapplen d arg max c j 2C [ln ˆP(cj ) + X 1appleiapplen d ln ˆP(wi c j )] Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 30 of 34

51 Naive Bayes Definition Implementation Detail: Taking the log Multiplying lots of small probabilities can result in floating point underflow. From last time lg is logarithm base 2; ln is logarithm base e. lgx = a, 2 a = x lnx = a, e a = x (5) Since ln(xy)=ln(x)+ln(y), we can sum log probabilities instead of multiplying probabilities. Since ln is a monotonic function, the class with the highest score does not change. So what we usually compute in practice is: Y c map = argmax c j 2C [ˆP(cj ) ˆP(wi c j )] 1appleiapplen d arg max c j 2C [ln ˆP(cj ) + X 1appleiapplen d ln ˆP(wi c j )] Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 30 of 34

52 Naive Bayes Definition Implementation Detail: Taking the log Multiplying lots of small probabilities can result in floating point underflow. From last time lg is logarithm base 2; ln is logarithm base e. lgx = a, 2 a = x lnx = a, e a = x (5) Since ln(xy)=ln(x)+ln(y), we can sum log probabilities instead of multiplying probabilities. Since ln is a monotonic function, the class with the highest score does not change. So what we usually compute in practice is: Y c map = argmax c j 2C [ˆP(cj ) ˆP(wi c j )] 1appleiapplen d arg max c j 2C [ln ˆP(cj ) + X 1appleiapplen d ln ˆP(wi c j )] Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 30 of 34

53 Wrapup Outline 1 Classification 2 Logistic Regression 3 Logistic Regression Example 4 Motivating Naïve Bayes Example 5 Naive Bayes Definition 6 Wrapup Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 31 of 34

54 Wrapup Equivalence of Naïve Bayes and Logistic Regression Consider Naïve Bayes and logistic regression with two classes: (+) and (-). Naïve Bayes Logistic Regression Y ˆP(c + ) ˆP(w i c + ) i Y ˆP(c ) ˆP(w i c ) i X i X i! 1 ix i = Ä 1 + exp 0 + P ä i ix i! Ä ix i = exp 0 + P ä i ix i Ä 1 + exp 0 + P ä i ix i These are actually the same if p(c+ ) w 0 = ln 1 p(c + + P ) j ln 1 P(wj c + ) 1 P(w j c ) P(wj c and w j = + )(1 P(w j c )) ln P(w j c )(1 P(w j c + )) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 32 of 34

55 Wrapup Contrasting Naïve Bayes and Logistic Regression Naïve Bayes easier Naïve Bayes better on smaller datasets Logistic regression better on medium-sized datasets On huge datasets, it doesn t really matter (data always win) Optional reading by Ng and Jordan has proofs and experiments Logistic regression allows arbitrary features (biggest difference!) Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 33 of 34

56 Wrapup Contrasting Naïve Bayes and Logistic Regression Naïve Bayes easier Naïve Bayes better on smaller datasets Logistic regression better on medium-sized datasets On huge datasets, it doesn t really matter (data always win) Optional reading by Ng and Jordan has proofs and experiments Logistic regression allows arbitrary features (biggest difference!) Don t need to memorize (or work through) previous slide just understand that naïve Bayes is a special case of logistic regression Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 33 of 34

57 Wrapup In class Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 34 of 34

58 Wrapup In class Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 34 of 34

59 Wrapup In class Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 34 of 34

60 Wrapup In class Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 34 of 34

61 Wrapup Next time... Maximum Entropy: Mathematical foundations to logistic regression How to learn the best setting of weights Extracting features from words Natural Language Processing: Jordan Boyd-Graber Boulder Classification: Naive Bayes and Logistic Regression 35 of 34

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Multi-Class and Structured Classification

Multi-Class and Structured Classification Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Reasoning Component Architecture

Reasoning Component Architecture Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

An Introduction to Machine Learning and Natural Language Processing Tools

An Introduction to Machine Learning and Natural Language Processing Tools An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo) 8/24/2010-8/26/2010 Some reasonably reliable

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

More information

3F3: Signal and Pattern Processing

3F3: Signal and Pattern Processing 3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Discrete Structures for Computer Science

Discrete Structures for Computer Science Discrete Structures for Computer Science Adam J. Lee adamlee@cs.pitt.edu 6111 Sennott Square Lecture #20: Bayes Theorem November 5, 2013 How can we incorporate prior knowledge? Sometimes we want to know

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class Problem 1. (10 Points) James 6.1 Problem 2. (10 Points) James 6.3 Problem 3. (10 Points) James 6.5 Problem 4. (15 Points) James 6.7 Problem 5. (15 Points) James 6.10 Homework 4 Statistics W4240: Data Mining

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Challenges of Cloud Scale Natural Language Processing

Challenges of Cloud Scale Natural Language Processing Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent

More information

Classification with Hybrid Generative/Discriminative Models

Classification with Hybrid Generative/Discriminative Models Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

CSC 411: Lecture 07: Multiclass Classification

CSC 411: Lecture 07: Multiclass Classification CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

More information

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

More information

Linear Programming Notes VII Sensitivity Analysis

Linear Programming Notes VII Sensitivity Analysis Linear Programming Notes VII Sensitivity Analysis 1 Introduction When you use a mathematical model to describe reality you must make approximations. The world is more complicated than the kinds of optimization

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

More information

Chapter 4 Online Appendix: The Mathematics of Utility Functions

Chapter 4 Online Appendix: The Mathematics of Utility Functions Chapter 4 Online Appendix: The Mathematics of Utility Functions We saw in the text that utility functions and indifference curves are different ways to represent a consumer s preferences. Calculus can

More information

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies FreeChapter InformationBasedLearning 2015/6/12 18:02 Page i #1 FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies John D. Kelleher Brian Mac Namee

More information

Using Names To Check Accuracy of Race and Gender Coding in NAEP

Using Names To Check Accuracy of Race and Gender Coding in NAEP Using Names To Check Accuracy of Race and Gender Coding in NAEP Jennifer Czuprynski Kali, James Bethel, John Burke, David Morganstein, and Sharon Hirabayashi Westat Keywords: Data quality, Coding errors,

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Note on growth and growth accounting

Note on growth and growth accounting CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

Probabilistic Strategies: Solutions

Probabilistic Strategies: Solutions Probability Victor Xu Probabilistic Strategies: Solutions Western PA ARML Practice April 3, 2016 1 Problems 1. You roll two 6-sided dice. What s the probability of rolling at least one 6? There is a 1

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information