Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Size: px
Start display at page:

Download "Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu"

Transcription

1 Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research

2 Motivation PAC learning: distribution fixed over time (training and test). IID assumption. On-line learning: no distributional assumption. worst-case analysis (adversarial). mixed training and test. Performance measure: mistake model, regret. 2

3 Prediction with expert advice Linear classification This Lecture 3

4 General On-Line Setting For t=1 to T do receive instance predict y t Y. receive label incur loss Classification: Regression: y t Y. L(y t,y t ). x t X. Y ={0, 1},L(y, y )= y Y R,L(y, y )=(y y) 2. Objective: minimize total loss y. T t=1 L( y t,y t ). 4

5 Prediction with Expert Advice For t=1 to T do receive instance predict y t Y. receive label incur loss and advice Objective: minimize regret, i.e., difference of total loss incurred and that of best expert. Regret(T )= y t Y. L(y t,y t ). T t=1 x t X y t,i Y,i [1,N]. L(y t,y t ) N min i=1 T t=1 L(y t,i,y t ). 5

6 Mistake Bound Model Definition: the maximum number of mistakes a learning algorithm Lmakes to learn c is defined by M L (c) = max mistakes(l, c). x 1,...,x T Definition: for any concept class C the maximum number of mistakes a learning algorithm L makes is M L (C) =max M L(c). c C A mistake bound is a bound M on M L (C). 6

7 Halving Algorithm see (Mitchell, 1997) Halving(H) 1 H 1 H 2 for t 1 to T do 3 Receive(x t ) 4 y t MajorityVote(H t,x t ) 5 Receive(y t ) 6 if y t = y t then 7 H t+1 {c H t : c(x t )=y t } 8 return H T +1 7

8 Halving Algorithm - Bound Theorem: Let H be a finite hypothesis set, then M Halving(H) log 2 H. Proof: At each mistake, the hypothesis set is reduced at least by half. (Littlestone, 1988) 8

9 VC Dimension Lower Bound Theorem: Let for H. Then, opt(h) (Littlestone, 1988) be the optimal mistake bound VCdim(H) opt(h) M Halving(H) log 2 H. Proof: for a fully shattered set, form a complete binary tree of the mistakes with height VCdim(H). 9

10 Weighted Majority Algorithm Weighted-Majority(N experts) 1 for i 1 to N do 2 w 1,i 1 3 for t 1 to T do 4 Receive(x t ) 5 y t 1 P N yt,i =1 w t P N y t,i =0 w t 6 Receive(y t ) 7 if y t = y t then 8 for i 1 to N do 9 if (y t,i = y t ) then 10 w t+1,i w t,i 11 else w t+1,i w t,i 12 return w T +1 (Littlestone and Warmuth, 1988) y t,y t,i {0, 1}. [0, 1). weighted majority vote 10

11 Weighted Majority - Bound Theorem: Let m t be the number of mistakes made by the WM algorithm till time t and m t that of the best expert. Then, for all t, Thus, Realizable case: m t log N + m t log 1 log 2 1+ m t O(log N)+constant best expert. m t Halving algorithm: =0. O(log N).. 11

12 Weighted Majority - Proof Potential: Upper bound: after each error, Thus, t = Lower bound: for any expert i, Comparison: N i=1 w t,i. t+1 1/2+1/2 t = 1+ m 2 1+ t t N. 2 m t 1+ 2 m t N t. t w t,i = m t,i. m t log log N + m t log m t log 1+ log N + m t log 1. 12

13 Weighted Majority - Notes Advantage: remarkable bound requiring no assumption. Disadvantage: no deterministic algorithm can achieve a regret R T = o(t ) with the binary loss. better guarantee with randomized WM. better guarantee for WM with convex losses. 13

14 Exponential Weighted Average Algorithm: weight update: prediction: y t = w t+1,i w t,i e L(by t,i,y t ) = e L t,i. P N i=1 w t,iy t,i P N i=1 w t,i Theorem: assume that Lis convex in its first argument and takes values in. Then, for any and any sequence y 1,...,y T Y, the regret at satisfies Regret(T ) For = 8 log N/T, total loss incurred by expert i up to time t. [0, 1] >0 T log N + T 8. Regret(T ) (T/2) log N. 14

15 Exponential Weighted Avg - Proof Potential: Upper bound: t t 1 =log =log =log t = log N i=1 w t 1,i e L(by t,i,y t ) N E [e wt 1 i=1 w t 1,i L(by t,i,y t ) ] N i=1 w t,i. E wt 1 exp L(y t,i,y t ) E wt 1 [L(y t,i,y t )] E wt 1 [L(y t,i,y t )] E [L(y t,i,y t )] + w t 1 L( E wt 1 [y t,i ],y t )+ = L(y t,y t ) (Hoeffding s ineq.) (convexity of first arg. of L) 15

16 Exponential Weighted Avg - Proof T Upper bound: summing up the inequalities yields Lower bound: 0 = log Comparison: T t=1 T N i=1 e 0 T t=1 L T,i log N log N max i=1 e L T,i log N N min i=1 L T,i log N L( y t,y t ) L( y t,y t )+ = T t=1 2 T 8. N min i=1 L T,i log N. L( y t,y t )+ N min i=1 L T,i log N + T T 8

17 Exponential Weighted Avg - Notes Advantage: bound on regret per bound is of the form R T log(n). T = O T Disadvantage: choice of requires knowledge of horizon T. 17

18 Doubling Trick Idea: divide time into periods with k =0,...,n, T 2 n 1, and choose in each period. of length [2 k, 2 k+1 1] 2 k k = 8logN 2 k Theorem: with the same assumptions as before, for any T, the following holds: Regret(T ) (T/2) log N + log N/2. 18

19 Doubling Trick - Proof By the previous theorem, for any, I k =[2 k, 2 k+1 1] L Ik N min i=1 L I k,i 2 k /2 log N. Thus, with L T = n k=0 L Ik n k=0 N min i=1 L I k,i + N min i=1 L T,i + n k=0 n k=0 2 k (log N)/2 2 k 2 (log N)/2. n n+1 2 k = = 2(n+1)/ T ( T +1) T i=0 19

20 Notes Doubling trick used in a variety of other contexts and proofs. More general method, learning parameter function of time: t = (8 log N)/t. Constant factor improvement: Regret(T ) 2 (T/2) log N + (1/8) log N. 20

21 Prediction with expert advice Linear classification This Lecture 21

22 Perceptron Algorithm (Rosenblatt, 1958) Perceptron(w 0 ) 1 w 1 w 0 typically w 0 = 0 2 for t 1 to T do 3 Receive(x t ) 4 y t sgn(w t x t ) 5 Receive(y t ) 6 if (y t = y t ) then 7 w t+1 w t + y t x t more generally y t x t, >0 8 else w t+1 w t 9 return w T +1 22

23 Separating Hyperplane Margin and errors w x=0 w x=0 ρ ρ y i (w x i ) w 23

24 Perceptron = Stochastic Gradient Descent with Objective function: convex but not differentiable. F (w) = 1 T Stochastic gradient: for each Here: T t=1 f(w, x) =max 0, y(w x). max 0, y t (w x t ) = E [f(w, x)] x D b x t, the update is w t w f(w t, x t ) if di erentiable w t+1 w t otherwise, where >0 is a learning rate parameter. w t + y t x t if y t (w t x t ) < 0 w t+1 w t otherwise. 24

25 Perceptron Algorithm - Bound (Novikoff, 1962) Theorem: Assume that x t R for all t [1,T] and that for some >0 and v R N, for all t [1,T], y t (v x t ) v Then, the number of mistakes made by the perceptron algorithm is bounded by R 2 / 2. Proof: Let I be the set of ts at which there is an update and let M be the total number of updates.. 25

26 Summing up the assumption inequalities gives: M v t I y tx t v = v t I (w t+1 w t ) v = v w T +1 v w T +1 (definition of updates) (Cauchy-Schwarz ineq.) = w tm + y tm x tm (t m largest t in I) = w tm 2 + x tm 2 +2y tm w tm x tm w tm 2 + R 2 1/2 0 1/2 MR 2 1/2 = MR. (applying the same to previous ts ini) 26

27 Notes: bound independent of dimension and tight. convergence can be slow for small margin, it can be in (2 N ). among the many variants: voted perceptron algorithm. Predict according to where c t is the number of iterations w t survives. {x t : t I} are the support vectors for the perceptron algorithm. non-separable case: does not converge. sign ( t I c t w t ) x, 27

28 Perceptron - Leave-One-Out Analysis Theorem: Let h S be the hypothesis returned by the perceptron algorithm for sample S =(x 1,...,x T ) and let M(S) be the number of updates defining h S. Then, min(m(s),r E S D m[r(h S)] E m+1/ 2 2 m+1) S D m+1 m +1 Proof: Let S D m+1 be a sample linearly separable and let x S. If h S {x} misclassifies x, then x must be a support vector for (update at x). Thus, h S R loo (perceptron) M(S) m +1.. D 28

29 SVMs - Leave-One-Out Analysis Theorem: let h S be the optimal hyperplane for a sample S and let N SV (S) be the number of support vectors defining. Then, h S min(n SV (S),R E S D m[r(h S)] E m+1/ 2 2 m+1) S D m+1 m +1 Proof: one part proven in lecture 4. The other part due to for x i misclassified by SVMs. i 1/R 2 m+1 (Vapnik, 1995). 29

30 Comparison Bounds on expected error, not high probability statements. Leave-one-out bounds not sufficient to distinguish SVMs and perceptron algorithm. Note however: same maximum margin m+1 can be used in both. but different radius R m+1 of support vectors. Difference: margin distribution. 30

31 M T Non-Separable Case - L1 Bound Theorem: let I denote the set of rounds at which the Perceptron algorithm makes an update when processing x 1,...,x T and let M T = I. Then, inf >0, u 2 1 t I 1 when x t R for all t I, this implies M T inf >0, u 2 1 y t (u x t ) (MM and Rostamizadeh, 2013) + + R + L (u) 1 2, t I x t 2. where L (u) = 1 y t (u x t ) + t I. 31

32 Proof: for any t, 1 up these inequalities for M T t I y t (u x t ) yields:, summing upper-bounding t I (y tu x t ) as in the proof for separable case shows the first inequality. the second inequality is obtained by solving which gives 1 t I y t (u x t ) 1 + y t (u x t ) + M T L (u) 1 + R M T, q R R M +4 L (u) 1 T 2. t I + y t (u x t ). 32

33 Non-Separable Case - L2 Bound (Freund and Schapire, 1998; MM and Rostamizadeh, 2013) Theorem: let I denote the set of rounds at which the Perceptron algorithm makes an update when processing x 1,...,x T and let M T = I. Then, M T inf >0, u 2 1 L (u) L (u) t I x t 2 2. when x t R for all t I, this implies M T inf >0, u 2 1 R + L (u) 2 2, where L (u) = 1 y t (u x t ) + t I. 33

34 Proof: Reduce problem to separable case in higher y dimension. Let l t = 1 t u x t, for. + 1 t I t [1,T] Mapping (similar to trivial mapping): (N +t)th component x t = x t,1. x t,n x t = x t,1. x t,n u u = u 1 Ẓ.. u NZ y 1 l 1 Z y T. l T Z u =1 = Z = 1+ 2 L (u)

35 Observe that the Perceptron algorithm makes the same predictions and makes updates at the same rounds when processing x 1,...,x T. For any t I, y t (u x t )=y t = y tu x t Z u x t Z + y t l t Z + l t Z = 1 Z y tu x t +[ y t (u x t )] + Z. Summing up and using the proof in the separable case yields: M T Z y t (u x t ) x t 2. t I t I 35

36 The inequality can be rewritten as M 2 T L (u) 2 r 2 +M 2 2 T = r2 2 + r2 L (u) 2 + M 2 T +M 2 2 T L (u) 2, where r = t I x t 2. Selecting to minimize the bound gives and leads to 2 = L (u) 2r M T M 2 T r M T L (u) r + M T L (u) 2 =( r + M T L (u) 2 ) 2. Solving the second-degree inequality r M T M T L (u) 2 0 yields directly the first statement. The second one results from replacing r with M T R. 36

37 Dual Perceptron Algorithm Dual-Perceptron( 0 ) 1 0 typically 0 = 0 2 for t 1 to T do 3 Receive(x t ) 4 y t sgn( 5 Receive(y t ) 6 if (y t = y t ) then 7 t t +1 8 return T s=1 sy s (x s x t )) 37

38 Kernel Perceptron Algorithm K PDS kernel. Kernel-Perceptron( 0 ) 1 0 typically 0 = 0 2 for t 1 to T do 3 Receive(x t ) 4 y t sgn( 5 Receive(y t ) 6 if (y t = y t ) then 7 t t +1 8 return T s=1 sy s K(x s,x t )) (Aizerman et al., 1964) 38

39 Winnow Algorithm Winnow( ) 1 w 1 1/N 2 for t 1 to T do 3 Receive(x t ) 4 y t sgn(w t x t ) y t { 1, +1} 5 Receive(y t ) 6 if (y t = y t ) then 7 Z t N i=1 w t,i exp( y t x t,i ) 8 for i 1 to N do 9 w t+1,i w t,i exp( y t x t,i ) Z t 10 else w t+1 w t 11 return w T +1 (Littlestone, 1988) 39

40 Notes Winnow = weighted majority: for y t,i =x t,i { 1, +1}, sgn(w t x t ) coincides with the majority vote. multiplying by e or e the weight of correct or incorrect experts, is equivalent to multiplying by =e 2 the weight of incorrect ones. Relationships with other algorithms: e.g., boosting and Perceptron (Winnow and Perceptron can be viewed as special instances of a general family). 40

41 Winnow Algorithm - Bound Theorem: Assume that x t R for all t [1,T] and that for some >0 and v R N, v 0 for all t [1,T], y t (v x t ) v 1. Then, the number of mistakes made by the Winnow algorithm is bounded by 2(R 2 / 2 )logn. Proof: Let I be the set of ts at which there is an update and let M be the total number of updates. 41

42 Winnow Algorithm - Bound Potential: t = Upper bound: for each t in I, N i=1 t+1 t = N i=1 = N i=1 =logz t v i v log v i/ v w t,i. v i v 1 log w t,i w t+1,i v i v 1 log N i=1 Z t exp( y t x t,i ) v i v 1 y t x t,i N log i=1 w t,i exp( y t x t,i ) =loge exp( y t x t ) wt (relative entropy) (Hoe ding) log exp( 2 (2R ) 2 /8) + y t w t x t 2 R 2 /2. 42

43 Winnow Algorithm - Bound Upper bound: summing up the inequalities yields Lower bound: note that 1 = N i=1 and for all, T +1 1 M( 2 R 2 /2 ). v i v 1 log v i/ v 1 1/N =logn + N t t 0 i=1 v i v 1 log v i v 1 (property or relative entropy). log N Thus, T log N = log N. Comparison: we obtain For log N M( 2 R 2 /2 ). = R 2 M 2logN R

44 Notes Comparison with perceptron bound: dual norms: norms for x t and v. similar bounds with different norms. each advantageous in different cases: Winnow bound favorable when a sparse set of experts can predict well. For example, if and, log N vs N. Perceptron favorable in opposite situation. x t {±1} N v =e 1 44

45 Conclusion On-line learning: wide and fast-growing literature. many related topics, e.g., game theory, text compression, convex optimization. online to batch bounds and techniques. online version of batch algorithms, e.g., regression algorithms (see regression lecture). 45

46 References Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, Nicolò Cesa-Bianchi, Alex Conconi, Claudio Gentile: On the Generalization Ability of On- Line Learning Algorithms. IEEE Transactions on Information Theory 50(9): Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, Yoav Freund and Robert Schapire. Large margin classification using the perceptron algorithm. In Proceedings of COLT ACM Press, Nick Littlestone. From On-Line to Batch Learning. COLT 1989: Nick Littlestone. "Learning Quickly When Irrelevant Attributes Abound: A New Linearthreshold Algorithm" Machine Learning (2)

47 References Nick Littlestone, Manfred K. Warmuth: The Weighted Majority Algorithm. FOCS 1989: Tom Mitchell. Machine Learning, McGraw Hill, Mehryar Mohri and Afshin Rostamizadeh. Perceptron Mistake Bounds. arxiv: , Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, Polytechnic Institute of Brooklyn. Rosenblatt, Frank, The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp , Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York,

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Online Classification on a Budget

Online Classification on a Budget Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

4.1 Introduction - Online Learning Model

4.1 Introduction - Online Learning Model Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

How to Use Expert Advice

How to Use Expert Advice NICOLÒ CESA-BIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California

More information

Online Learning with Switching Costs and Other Adaptive Adversaries

Online Learning with Switching Costs and Other Adaptive Adversaries Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

More information

Interactive Machine Learning. Maria-Florina Balcan

Interactive Machine Learning. Maria-Florina Balcan Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection

More information

KERNEL methods have proven to be successful in many

KERNEL methods have proven to be successful in many IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 8, AUGUST 2004 2165 Online Learning with Kernels Jyrki Kivinen, Alexer J. Smola, Robert C. Williamson, Member, IEEE Abstract Kernel-based algorithms

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

CS229T/STAT231: Statistical Learning Theory (Winter 2015)

CS229T/STAT231: Statistical Learning Theory (Winter 2015) CS229T/STAT231: Statistical Learning Theory (Winter 2015) Percy Liang Last updated Wed Oct 14 2015 20:32 These lecture notes will be updated periodically as the course goes on. Please let us know if you

More information

Monotone multi-armed bandit allocations

Monotone multi-armed bandit allocations JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multi-armed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Online Learning, Stability, and Stochastic Gradient Descent

Online Learning, Stability, and Stochastic Gradient Descent Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

The Online Set Cover Problem

The Online Set Cover Problem The Online Set Cover Problem Noga Alon Baruch Awerbuch Yossi Azar Niv Buchbinder Joseph Seffi Naor ABSTRACT Let X = {, 2,..., n} be a ground set of n elements, and let S be a family of subsets of X, S

More information

The Set Covering Machine

The Set Covering Machine Journal of Machine Learning Research 3 (2002) 723-746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Achieving All with No Parameters: AdaNormalHedge

Achieving All with No Parameters: AdaNormalHedge JMLR: Workshop and Conference Proceedings vol 40:1 19, 2015 Achieving All with No Parameters: AdaNormalHedge Haipeng Luo Department of Computer Science, Princeton University Robert E. Schapire Microsoft

More information

The Advantages and Disadvantages of Online Linear Optimization

The Advantages and Disadvantages of Online Linear Optimization LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

1 Portfolio Selection

1 Portfolio Selection COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with

More information

Strongly Adaptive Online Learning

Strongly Adaptive Online Learning Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance

More information

Follow the Leader with Dropout Perturbations

Follow the Leader with Dropout Perturbations JMLR: Worshop and Conference Proceedings vol 35: 26, 204 Follow the Leader with Dropout Perturbations Tim van Erven Département de Mathématiques, Université Paris-Sud, France Wojciech Kotłowsi Institute

More information

Practical Online Active Learning for Classification

Practical Online Active Learning for Classification Practical Online Active Learning for Classification Claire Monteleoni Department of Computer Science and Engineering University of California, San Diego cmontel@cs.ucsd.edu Matti Kääriäinen Department

More information

List of Publications by Claudio Gentile

List of Publications by Claudio Gentile List of Publications by Claudio Gentile Claudio Gentile DiSTA, University of Insubria, Italy claudio.gentile@uninsubria.it November 6, 2013 Abstract Contains the list of publications by Claudio Gentile,

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

How To Learn From Noisy Distributions On Infinite Dimensional Spaces

How To Learn From Noisy Distributions On Infinite Dimensional Spaces Learning Kernel Perceptrons on Noisy Data using Random Projections Guillaume Stempfel, Liva Ralaivola Laboratoire d Informatique Fondamentale de Marseille, UMR CNRS 6166 Université de Provence, 39, rue

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Journal of Machine Learning Research 7 2006) 551 585 Submitted 5/05; Published 3/06 Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Joseph Keshet Shai Shalev-Shwartz Yoram Singer School of

More information

How To Train A Classifier With Active Learning In Spam Filtering

How To Train A Classifier With Active Learning In Spam Filtering Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Introduction to Online Optimization

Introduction to Online Optimization Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical

More information

4 Learning, Regret minimization, and Equilibria

4 Learning, Regret minimization, and Equilibria 4 Learning, Regret minimization, and Equilibria A. Blum and Y. Mansour Abstract Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive

More information

Revenue Optimization against Strategic Buyers

Revenue Optimization against Strategic Buyers Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Online Learning: Theory, Algorithms, and Applications. Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz

Online Learning: Theory, Algorithms, and Applications. Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz Online Learning: Theory, Algorithms, and Applications Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz Submitted to the Senate of the Hebrew University July 2007 This work

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

Optimal Strategies and Minimax Lower Bounds for Online Convex Games

Optimal Strategies and Minimax Lower Bounds for Online Convex Games Optimal Strategies and Minimax Lower Bounds for Online Convex Games Jacob Abernethy UC Berkeley jake@csberkeleyedu Alexander Rakhlin UC Berkeley rakhlin@csberkeleyedu Abstract A number of learning problems

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

Mind the Duality Gap: Logarithmic regret algorithms for online optimization Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago sham@tti-c.org Shai Shalev-Shartz Toyota Technological Institute at

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Mutual Online Concept Learning for Multiple Agents

Mutual Online Concept Learning for Multiple Agents Mutual Online Concept Learning for Multiple Agents Jun Wang Les Gasser Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Champaign, IL 61820, USA {junwang4,

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1

More information

Online Feature Selection for Mining Big Data

Online Feature Selection for Mining Big Data Online Feature Selection for Mining Big Data Steven C.H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin School of Computer Engineering, Nanyang Technological University, Singapore Department of Computer Science

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

The Multiplicative Weights Update method

The Multiplicative Weights Update method Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

FilterBoost: Regression and Classification on Large Datasets

FilterBoost: Regression and Classification on Large Datasets FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley Machine Learning Department Carnegie Mellon University Pittsburgh, PA 523 jkbradle@cs.cmu.edu Robert E. Schapire Department

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

Online Learning and Online Convex Optimization. Contents

Online Learning and Online Convex Optimization. Contents Foundations and Trends R in Machine Learning Vol. 4, No. 2 (2011) 107 194 c 2012 S. Shalev-Shwartz DOI: 10.1561/2200000018 Online Learning and Online Convex Optimization By Shai Shalev-Shwartz Contents

More information

Sparse Online Learning via Truncated Gradient

Sparse Online Learning via Truncated Gradient Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahoo-inc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda

More information

Factoring & Primality

Factoring & Primality Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

More information

An Alternative Ranking Problem for Search Engines

An Alternative Ranking Problem for Search Engines An Alternative Ranking Problem for Search Engines Corinna Cortes 1, Mehryar Mohri 2,1, and Ashish Rastogi 2 1 Google Research, 76 Ninth Avenue, New York, NY 10011 2 Courant Institute of Mathematical Sciences,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Boosting. riedmiller@informatik.uni-freiburg.de

Boosting. riedmiller@informatik.uni-freiburg.de . Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Fast Kernel Classifiers with Online and Active Learning

Fast Kernel Classifiers with Online and Active Learning Journal of Machine Learning Research 6 (2005) 1579 1619 Submitted 3/05; Published 9/05 Fast Kernel Classifiers with Online and Active Learning Antoine Bordes NEC Laboratories America 4 Independence Way

More information

CSC 411: Lecture 07: Multiclass Classification

CSC 411: Lecture 07: Multiclass Classification CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass

More information

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승 Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승 How much energy do we need for brain functions? Information processing: Trade-off between energy consumption and wiring cost Trade-off between energy consumption

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information