Semi-Supervised Support Vector Machines and Application to Spam Filtering

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Semi-Supervised Support Vector Machines and Application to Spam Filtering"

Transcription

1 Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery Challenge

2 1 Introduction 2 Training a S 3 VM Why It Matters Some S 3 VM Training Methods Gradient-based Optimization The Continuation S 3 VM 3 Overview of SSL Assumptions of SSL A Crude Overview of SSL Combining Methods 4 Application to Spam Filtering Naive Application Proper Model Selection 5 Conclusions

3 find a linear classification boundary

4

5 not robust wrt input noise!

6 SVM: maximum margin classifier w min w,b 1 2 w w } {{ } regularizer s.t. y i (w x i + b) 1

7 S 3 VM (TSVM): semi-supervised (transductive) SVM min w,b,(y j ) 1 2 w w } {{ } regularizer s.t. y i (w x i + b) 1 y j (w x j + b) 1

8 soft margin S 3 VM min w,b,(y j ),(ξ k ) 1 2 w w +C i ξ i +C j ξ j s.t. ξ i 0 ξ j 0 y i (w x i + b) 1 ξ i y j (w x j + b) 1 ξ j

9 Two Moons toy data easy for human (0% error) hard for S 3 VMs! S 3 VM optimization method test error objective value global min. {Branch & Bound 0.0% 7.81 find local minima CCCP S 3 VM light S 3 VM cs 3 VM 64.0% 66.2% 59.3% 45.7% objective function is good for SSL try to find better local minima!

10 min w,b,(y j ),(ξ k ) s.t. 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 Mixed Integer Programming [Bennett, Demiriz; NIPS 1998] global optimum found by standard optimization packages (eg CPLEX) combinatorial & NP-hard! only works for small sized problems

11 min w,b,(y j ),(ξ k ) s.t. 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 S 3 VM light [T. Joachims; ICML 1999] train SVM on labeled points, predict y j s in prediction, always make sure that #{y j = +1} # unlabeled points = #{y i = +1} # labeled points (1) with stepwise increasing C do 1 train SVM on all points, using labels (y i ), (y j ) 2 predict new y j s s.t. balancing constraint (*)

12 min w,b,(y j ),(ξ k ) s.t. 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 Balancing constraint required to avoid degenerate solutions!

13 min w,b,(y j ),(ξ k ) s.t. Effective Loss Functions ξ j = { ξ i = min min y j {+1, 1} 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 } 1 y i (w x i + b), 0 { } 1 y j (w x j + b), 0 loss functions ξ i y i (w x i + b) ξ j w x j + b

14 min w,b,(y j ),(ξ k ) s.t. 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 Resolving the Constraints 1 2 w w + C i ) l l (y i (w x i + b) + C j ) l u (w x j + b loss functions l l l u

15 1 2 w w + C i ) l l (y i (w x i + b) + C j ) l u (w x j + b CCCP-S 3 VM [R. Collobert et al.; ICML 2006] CCCP: Concave Convex Procedure objective = convex function + concave function starting from SVM solution, iterate: 1 approximate concave part by linear function at given point 2 solve resulting convex problem [Fung, Mangasarian; 1999] similar approach restricted to linear S 3 VMs

16 1 2 w w + C i ) l l (y i (w x i + b) + C j ) l u (w x j + b S 3 VM as Unconstrained Differentiable Optimization Problem original loss functions l l l u smooth loss functions l l l u

17 1 2 w w + C i ) l l (y i (w x i + b) + C j ) l u (w x j + b S 3 VM [Chapelle, Zien; AISTATS 2005] simply do gradient descent! thereby stepwise increase C conts 3 VM [Chapelle et al.; ICML 2006]... in more detail on next slides!

18 1 2 w w + C i ) l l (y i (w x i + b) + C j ) l u (w x j + b Hard Balancing Constraint S 3 VM light constraint equivalent constraint #{y j = +1} # unlabeled points = 1 ( ) sign w x j + b m j } {{ } average prediction #{y i = +1} # labeled points = 1 y i n i } {{ } average label

19 Making the Balancing Constraint Linear hard / non-linear soft / linear 1 ( ) sign w x j + b m j } {{ } average prediction = 1 w x j + b m j } {{ } mean output on unlabeled points 1 y i n i } {{ } average label = 1 y i n i } {{ } average label Implementing the linear soft balancing: center the unlabeled data: j x j = 0 just fix b; unconstrained optimization over w!

20 The Continuation Method in a Nutshell Procedure 1 smooth function until convex 2 find minimum 3 track minimum while decreasing amount of smoothing Illustration

21 Smoothing the S 3 VM Objective f ( ) Convolution of f ( ) with Gaussian of width γ/2: f γ (w) = (πγ) d/2 f (w t) exp( t 2 /γ)dt Closed form solution! Smoothing Sequence choose γ 0 > γ 1 >... γ p 1 > γ p = 0 choose γ 0 such that f γ0 ( ) is convex choose γ p 1 such that f γp 1 ( ) f γp ( ) = f ( ) p = 10 steps (equidistant on log scale) sufficient

22 Handling Non-Linearity Consider non-linear map Φ(x), kernel k(x i, x j ) = Φ(x i ) Φ(x j ). Representer Theorem: S 3 VM solution is in span E of data points E := span{φ(x i )} = R n+m Implementation 1 expand basis vectors v i of E: v i = k A ik Φ(x k ) 2 orthonormality gives: (A A) 1 = K solve for A, eg by KPCA or Choleski 3 project data Φ(x i ) on basis V = (v j ) j : x i = V Φ(x i ) = (A) i

23 Comparison of S 3 VM Optimization Methods averaged over splits (and pairs of classes) fixed hyperparams (close to hard margin) similar results for other hyperparameter settings [Chapelle, Chi, Zien; ICML 2006]

24 Why would unlabeled data be useful at all? Uniform data do not help.

25 Why would unlabeled data be useful at all? Uniform data do not help.

26 Cluster Assumption Points in the same cluster are likely to be of the same class. Algorithmic idea: Low Density Separation

27 Manifold Assumption The data lie on (close to) a low-dimensional manifold. [images from The Geometric Basis of Semi-Supervised Learning, Sindhwani, Belkin, Niyogi in Semi-Supervised Learning Chapelle, Schölkopf, Zien] Algorithmic idea: use Nearest-Neighbor Graph

28 Assumption: Independent Views Exist There exist subsets of features, called views, each of which is independent of the others given the class; is sufficient for classification. view 2 view 1 Algorithmic idea: Co-Training

29 Assumption Approach Example Algorithm Cluster Assumption Low Density Separation S 3 VM; Entropy Regularization; Data-Dependent Regularization;... Manifold Assumption Graphbased Methods build weighted graph (w kl ) min w kl (y k y l ) 2 (y j ) k l relax y j to be real QP Independent Views Co-Training train two predictors y (1) j, y (2) j couple objectives by adding j ( y (1) j y (2) j ) 2

30 Discriminative Learning (Diagnostic Paradigm) model p(y x) (or just boundary: { x p(y x) = 1 2 } ) examples: S 3 VM, graph-based methods Generative Learning (Sampling Paradigm) model p(x y) predict via Bayes: p(y x) = missing data problem p(y)p(x y) y p(y )p(x y ) EM algorithm (expectation-maximization) is a natural tool successful for text data [Nigam et al.; Machine Learning, 2000]

31 SSL Book MIT Press, Sept edited by B. Schölkopf, O. Chapelle, A. Zien contains many state-of-art algorithms by top researchers extensive SSL benchmark online material: sample chapters benchmark data more information

32 SSL Book Text Benchmark error [%] AUC [%] l=10 l=100 l=10 l=100 1-NN SVM MVU + 1-NN LEM + 1-NN QC + CMN Discrete Reg TSVM SGT Cluster-Kernel LDS Laplacian RLS

33 Combining S 3 VM with Graph-based Regularizer LapSVM [1]: modify kernel using graph, then train SVM combination with S 3 VM even better [2] MNIST, 3 vs 5 [1] Beyond the Point Clound ; Sindhwani, Niyogi, Belkin; ICML 2005 [2] A Continuation Method for S 3 VM ; Chapelle, Chi, Zien; ICML 2006 Combining S 3 VM with Co-Training SSL for Structured Output Variables ; Brefeld, Scheffer; ICML 2006

34 min w,b,(y j ),(ξ k ) s.t. 1 2 w w + C i ξ i + C j ξ j y i (w x i + b) 1 ξ i ξ i 0 y j (w x j + b) 1 ξ j ξ j 0 How to set C? data fitting, y i w x i 1, and regularization, min w 2 : w x i = O(1) w 2 Var[x] 1 balance influence: w 2 Cξ i C Var[x] 1 How to set C? C = C C # unlabeled points = λ # labeled points C

35 Naive Application: Transductive setting on each user/inbox: use inbox of given user as unlabeled data test data = unlabeled data Guess the model: Var[x] 1, so set C = 1 C = C linear kernel Results: AUC (rank) [rank in unofficial list] task A task B S 3 VM light 94.53% (4) [6] 92.34% (2) [4] S 3 VM 96.72% (1) [3] 93.74% (2) [4] conts 3 VM 96.01% (1) [3] 93.56% (2) [4]

36 Model selection: C {10 2, 10 1, 10 0, 10 +1, } C {10 2, 10 1, 10 0, 10 +1, } C cross-validation (3-fold for task A; 5-fold for task B) Results: AUC for conts 3 VM task A task B C = C = 1 (guessed model) 96.01% 93.56% model selection 89.31% 90.09% significant drop in accuracy! CV relys on iid assumption: that the data are independent identically distributed

37 Take Home Messages S 3 VM implements low density separation (margin maximization) optimization technique matters (non-convex objective) works well for text classification (texts form clusters) S 3 VM-based hybrids may be even better for spam filtering, further methods needed to cope with non-iid situation (mail inboxes)! Thank you!

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Tutorial on Semi-Supervised Learning

Tutorial on Semi-Supervised Learning Tutorial on Semi-Supervised Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin, Madison, USA Theory and Practice of Computational Learning Chicago, 2009 Xiaojin Zhu (Univ. Wisconsin,

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Graph-based Semi-supervised Learning

Graph-based Semi-supervised Learning Graph-based Semi-supervised Learning Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS 2012 La Palma Motivation Large

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

When Semi-Supervised Learning Meets Ensemble Learning

When Semi-Supervised Learning Meets Ensemble Learning When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou Email: zhouzh@nju.edu.cn LAMDA Group National Key Laboratory for Novel Software Technology, Nanjing University, China The presentation

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model 10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks Dong-Hyun Lee sayit78@gmail.com Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul,

More information

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Large-Scale Sparsified Manifold Regularization

Large-Scale Sparsified Manifold Regularization Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

A crash course in probability and Naïve Bayes classification

A crash course in probability and Naïve Bayes classification Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Introduction to Statistical Machine Learning

Introduction to Statistical Machine Learning CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt

More information

Tutorial on Conditional Random Fields for Sequence Prediction. Ariadna Quattoni

Tutorial on Conditional Random Fields for Sequence Prediction. Ariadna Quattoni Tutorial on Conditional Random Fields for Sequence Prediction Ariadna Quattoni RoadMap Sequence Prediction Problem CRFs for Sequence Prediction Generalizations of CRFs Hidden Conditional Random Fields

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

A semi-supervised Spam mail detector

A semi-supervised Spam mail detector A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Breaking SVM Complexity with Cross-Training

Breaking SVM Complexity with Cross-Training Breaking SVM Complexity with Cross-Training Gökhan H. Bakır Max Planck Institute for Biological Cybernetics, Tübingen, Germany gb@tuebingen.mpg.de Léon Bottou NEC Labs America Princeton NJ, USA leon@bottou.org

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Classification with Hybrid Generative/Discriminative Models

Classification with Hybrid Generative/Discriminative Models Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection NIPS 2008 Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori (Nagoya University) Shohei Hido (IBM) Masashi Sugiyama (Tokyo Institute of Technology)

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016

ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016 ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016 When in doubt on the accuracy of these notes, please cross check with the instructor s

More information

Self-Paced Learning for Latent Variable Models

Self-Paced Learning for Latent Variable Models Self-Paced Learning for Latent Variable Models M. Pawan Kumar Benjamin Packer Daphne Koller Computer Science Department Stanford University {pawan,bpacker,koller}@cs.stanford.edu Abstract Latent variable

More information

Wifi Localization with Gaussian Processes

Wifi Localization with Gaussian Processes Wifi Localization with Gaussian Processes Brian Ferris University of Washington Joint work with: Dieter Fox Julie Letchner Dirk Haehnel Why Location? Assisted Cognition Project: Indoor/outdoor navigation

More information

From Maxent to Machine Learning and Back

From Maxent to Machine Learning and Back From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent 2007 1 / 36 50 Years Ago... The principles and mathematical methods of statistical

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Manifold Learning with Variational Auto-encoder for Medical Image Analysis Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Manifold

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Distributed Structured Prediction for Big Data

Distributed Structured Prediction for Big Data Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning

More information

Interactive Machine Learning. Maria-Florina Balcan

Interactive Machine Learning. Maria-Florina Balcan Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Feature engineering. Léon Bottou COS 424 4/22/2010

Feature engineering. Léon Bottou COS 424 4/22/2010 Feature engineering Léon Bottou COS 424 4/22/2010 Summary Summary I. The importance of features II. Feature relevance III. Selecting features IV. Learning features Léon Bottou 2/29 COS 424 4/22/2010 I.

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Model Trees for Classification of Hybrid Data Types

Model Trees for Classification of Hybrid Data Types Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,

More information

Robust personalizable spam filtering via local and global discrimination modeling

Robust personalizable spam filtering via local and global discrimination modeling Knowl Inf Syst DOI 10.1007/s10115-012-0477-x REGULAR PAPER Robust personalizable spam filtering via local and global discrimination modeling Khurum Nazir Junejo Asim Karim Received: 29 June 2010 / Revised:

More information

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A p-dimensional space, in which each dimension is a

More information

Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Automatic Web Page Classification

Automatic Web Page Classification Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

Maximum-Margin Matrix Factorization

Maximum-Margin Matrix Factorization Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial

More information

Direct Loss Minimization for Structured Prediction

Direct Loss Minimization for Structured Prediction Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Probabilistic Graphical Models Final Exam

Probabilistic Graphical Models Final Exam Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The

More information

Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression

Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science and Singapore-MIT Alliance, National University of Singapore,

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Feature Selection with Decision Tree Criterion

Feature Selection with Decision Tree Criterion Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

CS229 Titanic Machine Learning From Disaster

CS229 Titanic Machine Learning From Disaster CS229 Titanic Machine Learning From Disaster Eric Lam Stanford University Chongxuan Tang Stanford University Abstract In this project, we see how we can use machine-learning techniques to predict survivors

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

More information