Consistent Binary Classification with Generalized Performance Metrics
|
|
|
- Ashley Kelly
- 10 years ago
- Views:
Transcription
1 Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014
2 Problem and Motivation (1/3) State-of-the-art understanding of optimal decision making and consistent algorithms for binary classification is limited. It is well-known that accuracy (0-1 loss) is maximized (minimized) by thresholding P(Y = 1 x) at 0.5. Such a characterization is lacking for many utility measures used in practice.
3 Problem and Motivation (2/3) Most performance measures are based on the four fundamental population quantities: Examples include F β, Jaccard coefficient, and other cost-sensitive measures.
4 Problem and Motivation (3/3) Goals: 1. Develop a general framework for analyzing performance measures 2. Characterize optimal decision functions for a large family of utility measures 3. Develop efficient, and provably consistent, algorithms for maximizing measures in practice
5 A Family of Generalized Performance Metrics (1/3) Let θ : X {0, 1} denote a classifier, and P be a fixed unknown distribution over labeled data X {0, 1}. We define the following ratio family of performance metrics: L(θ, P) = a 0 + a 11 TP + a 10 FP + a 01 FN + a 00 TN b 0 + b 11 TP + b 10 FP + b 01 FN + b 00 TN where a 0, b 0, a ij, b ij, i, j {0, 1} are non-negative constants and: TP := TP(θ, P) = P(Y = 1, θ = 1), FP := FP(θ, P) = P(Y = 0, θ = 1), FN := FN(θ, P) = P(Y = 1, θ = 0), TN := TN(θ, P) = P(Y = 0, θ = 0).
6 A Family of Generalized Performance Metrics (2/3) Example metrics in this family: AM = F β = Jaccard Coefficient = Weighted Accuracy = (1 π)tp + πtn 2π(1 π) (1 + β 2 )TP (1 + β 2 )TP + β 2 FN + FP TP TP + FN + FP w 1 TP + w 2 TN w 1 TP + w 2 TN + w 3 FP + w 4 FN
7 A Family of Generalized Performance Metrics (3/3) Let γ(θ) := P(θ = 1) and π := P(Y = 1). Observing that: FP = γ(θ) TP, FN = π TP(θ), TN = 1 γ(θ) π + TP we get the following equivalent, simpler representation of the family: L(θ, P) = c 0 + c 1 TP + c 2 γ(θ) d 0 + d 1 TP + d 2 γ(θ), for certain constants c 0, c 1, c 2, d 0, d 1, d 2.
8 Optimal Classifier (1/2) Optimal (Bayes) decision function for a given metric L is: θ = arg max θ Θ L(θ, P). Main Result 1. Given a performance metric L, or equivalently, the constants {c 0, c 1, c 2 } and {d 0, d 1, d 2 }, let L := L(θ ) and let: δ = d 2L c 2 c 1 d 1 L The Bayes classifier θ takes the form θ (x) = sign(p(y = 1 x) δ ).
9 Optimal Classifier (2/2) Implication: Optimal decision function for a metric in our family can be found among the thresholded classifiers: θ arg max L(I (P(Y = 1 x) δ), P), δ (0,1) where I (P(Y = 1 x) δ) is the classifier that thresholds the conditional at δ.
10 Recovered and New Results (1/2)
11 Recovered and New Results (2/2) Simulated results showing η(x) := P(Y = 1 x), optimal threshold δ and Bayes classifier θ F 1 Weighted Accuracy η(x) δ =0.34 θ η(x) δ =0.50 θ TP 2TP +FP +FN x TP +2TN 2TP +FP +FN +2TN x
12 Maximizing L in Practice (1/3) Given iid sample (X i, Y i ), i = 1, 2,..., n, we would want to maximize the empirical measure: L n (θ) = c 1TP n (θ) + c 2 γ n (θ) + c 0 d 1 TP n (θ) + d 2 γ n (θ) + d 0, where TP n (θ) = 1 n n i=1 θ(x i)y i and γ n (θ) = 1 n n i=1 θ(x i). However, maximizing L n (θ) is often NP-hard. Main Result 1 suggests two simple procedures for estimating θ from training data
13 Maximizing L in Practice (2/3) Algorithm 1: Two-Step EUM Input: Training examples S = {X i, Y i } n i=1 and utility measure L. 1. Split the training data S into two sets S 1 and S Estimate η(x) := P(Y = 1 x) using S 1, define θ δ = sign(ˆη(x) δ) 3. Compute δ = arg max δ (0,1) L n ( θ δ ) on S Return: θ δ 1-d optimization in Step 3 can be done efficiently L n changes only on O(n) discrete thresholds
14 Maximizing L in Practice (3/3) The second method is based on minimizing a surrogate l of the weighted 0-1 loss: Algorithm 2: Weighted ERM l δ (t, y) = (1 δ)1 {y=1} l(t, 1) + δ1 {y=0} l(t, 0). Input: Training examples S = {X i, Y i } n i=1, prediction function class Φ {φ : X R} and utility measure L. 1. Split the training data S into two sets S 1 and S Compute δ = arg max δ (0,1) L n ( θ δ ) on S 2. Sub-algorithm: θ δ (x) := sign( φ δ (x)) where 1 S1 φ δ (x) = arg min φ Φ S 1 i=1 l δ(φ(x i ), Y i ). 3. Return: θ δ
15 Consistency of Empirical Estimation (1/2) For consistency w.r.t. L metric, we need estimated θ to satisfy L L( θ) p 0. Theorem (Uniform convergence of L n ). Consider the function class of all thresholded decisions Θ = {I (φ(x) δ) δ (0, 1)} for a [0, 1]-valued function φ : X [0, 1]. For sufficiently large n that is a function of constants associated with L, ɛ and ρ, with prob. at least 1 ρ, sup L n (θ) L(θ) < ɛ. θ Θ
16 Consistency of Empirical Estimation (2/2) Main Result 2. If the estimate η(x) satisfies η(x) p η(x), Algorithm 1 is L-consistent. Main Result 3. Let l : R : [0, ) be a classification-calibrated convex (margin) loss and let l δ be the corresponding weighted loss for a given δ used in Algorithm 2. Then, Algorithm 2 is L-consistent.
17 Experimental Results Evaluate Algorithms 1 and 2 on two metrics, F 1 and Weighted Accuracy 2(TP+TN) 2(TP+TN)+FP+FN. Compare the two algorithms with standard ERM (regularized logistic regression). On datasets listed below: 1. Letters: 26 classes (English alphabet), instances 2. Scene: 6 classes (scene types), 2230 images 3. Web Page: 2 classes (spam/non-spam), pages 4. Image: 2 classes, 2068 images 5. Spambase: 2 classes (spam/non-spam), s
18 Experimental Results: F 1
19 Experimental Results: Weighted Accuracy
20 Open Problems & Future Directions There exist other utility metrics that are not in our family, but have similar thresholded optimal classifiers (Check out Poster??!) Raises the question Identify/characterize the entire family of utility metrics with simple optimal decision functions Develop surrogate theory for L Obtain convergence rates for L(ˆθ) p L(θ ) as ˆθ p θ Multi-label classification setting: Can extend the definition L in more than one way! Do similar results hold in this setting?
Consistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Oluwasanmi Koyejo Department of Psychology, Stanford University [email protected] Pradeep Ravikumar Department of Computer Science,
Performance Metrics. number of mistakes total number of observations. err = p.1/1
p.1/1 Performance Metrics The simplest performance metric is the model error defined as the number of mistakes the model makes on a data set divided by the number of observations in the data set, err =
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Bilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Polarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
Interactive Machine Learning. Maria-Florina Balcan
Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
A Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Message-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
How performance metrics depend on the traffic demand in large cellular networks
How performance metrics depend on the traffic demand in large cellular networks B. B laszczyszyn (Inria/ENS) and M. K. Karray (Orange) Based on joint works [1, 2, 3] with M. Jovanovic (Orange) Presented
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Introduction to Detection Theory
Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
L3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-17-AUC
Introduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
Probabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
Week 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More Data Mining with Weka
More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Lecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
Classification with Hybrid Generative/Discriminative Models
Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer
Performance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
CSC574 - Computer and Network Security Module: Intrusion Detection
CSC574 - Computer and Network Security Module: Intrusion Detection Prof. William Enck Spring 2013 1 Intrusion An authorized action... that exploits a vulnerability... that causes a compromise... and thus
6.231 Dynamic Programming and Stochastic Control Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.231 Dynamic Programming and Stochastic Control Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.231
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 [email protected] Abstract
Machine-Learning for Big Data: Sampling and Distributed On-Line Algorithms. Stéphan Clémençon
Machine-Learning for Big Data: Sampling and Distributed On-Line Algorithms Stéphan Clémençon LTCI UMR CNRS No. 5141 - Telecom ParisTech - Journée Traitement de Masses de Données du Laboratoire JL Lions
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Airport Planning and Design. Excel Solver
Airport Planning and Design Excel Solver Dr. Antonio A. Trani Professor of Civil and Environmental Engineering Virginia Polytechnic Institute and State University Blacksburg, Virginia Spring 2012 1 of
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Specification of the Bayesian CRM: Model and Sample Size. Ken Cheung Department of Biostatistics, Columbia University
Specification of the Bayesian CRM: Model and Sample Size Ken Cheung Department of Biostatistics, Columbia University Phase I Dose Finding Consider a set of K doses with labels d 1, d 2,, d K Study objective:
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Using Graph Theory to Analyze Gene Network Coherence
Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela [email protected] Norberto Díaz-Díaz [email protected] José A. Lagares José A. Sánchez Jesús S. Aguilar 1 Outlines Introduction Proposed
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
One-sided Support Vector Regression for Multiclass Cost-sensitive Classification
One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu [email protected] Hsuan-Tien Lin [email protected] Department of Computer Science and Information
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
Problem Set 6 - Solutions
ECO573 Financial Economics Problem Set 6 - Solutions 1. Debt Restructuring CAPM. a Before refinancing the stoc the asset have the same beta: β a = β e = 1.2. After restructuring the company has the same
Single item inventory control under periodic review and a minimum order quantity
Single item inventory control under periodic review and a minimum order quantity G. P. Kiesmüller, A.G. de Kok, S. Dabia Faculty of Technology Management, Technische Universiteit Eindhoven, P.O. Box 513,
Online Semi-Supervised Learning
Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu [email protected] Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised
AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi
AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Tweaking Naïve Bayes classifier for intelligent spam detection
682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. [email protected] 2 School of Computing, Information
A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT
New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:
Less naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:[email protected] also CoSiNe Connectivity Systems
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
Un point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
