Simple and efficient online algorithms for real world applications


 Scot Simon
 1 years ago
 Views:
Transcription
1 Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Centro de Visión por Computador
2 Something about me PhD in Robotics at LIRALab, University of Genova PostDoc in Machine Learning
3 Something about me PhD in Robotics at LIRALab, University of Genova PostDoc in Machine Learning I like theoretical motivated algorithms
4 Something about me PhD in Robotics at LIRALab, University of Genova PostDoc in Machine Learning I like theoretical motivated algorithms But they must work well too! ;)
5 Outline 1 2 OM2 Projectron BBQ
6 Outline 1 2 OM2 Projectron BBQ
7 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training
8 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier
9 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier Learning with limited feedback: selecting publicity banners
10 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier Learning with limited feedback: selecting publicity banners Active learning: asking for specific samples to label
11 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier Learning with limited feedback: selecting publicity banners Active learning: asking for specific samples to label Interactive learning: the agent and the human are learning at the same time
12 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier Learning with limited feedback: selecting publicity banners Active learning: asking for specific samples to label Interactive learning: the agent and the human are learning at the same time These problems cannot be solved with standard batch learning!
13 What do they have in common? Spam filtering: minimize the number of wrongly classified mails, while training Shifting distributions: the Learner must track the best classifier Learning with limited feedback: selecting publicity banners Active learning: asking for specific samples to label Interactive learning: the agent and the human are learning at the same time These problems cannot be solved with standard batch learning! But they can in the online learning framework!
14 Learning in an artificial agent We have the Learner and the Teacher. The Learner observes examples in a sequence of rounds, and constructs the classification function incrementally.
15 Learning in an artificial agent Given an input, the Learner predicts its label.
16 Learning in an artificial agent Then the Teacher reveals the true label.
17 Learning in an artificial agent The Learner compares its prediction with the true label and update its knowledge. The aim of the learner is to minimize the number of mistakes.
18 Machine learning point of view on online learning The Teacher is a black box It can chose the examples arbitrarily  in the worst case the choice can be adversarial! There is no IID assumption on the data! Useful to model interaction between the user and the algorithm or nonstationary data Performance is measured while training : no separate testing set Update after each sample: efficiency of the update is important See [CesaBianchi & Lugosi, 2006] for a full introduction to the theory of online learning. universitylogo
19  Formal setting Learning goes on in a sequence of T rounds Instances: x t X Images, sounds, etc. Labels: y t Y Labels, numbers, structured output, etc. Prediction rule, f t (x) = ŷ for binary classification ŷ = sign( w, x ) Loss, l(ŷ, y) R +
20 Regret bounds The learner must minimize its loss on the T observed samples T l(ŷ t, y t ) t=1
21 Regret bounds The learner must minimize its loss on the T observed samples, compared to the loss of the best fixed predictor T l(ŷ t, y t ) min f t=1 T l (f (x t ), y t ) + R T t=1
22 Regret bounds The learner must minimize its loss on the T observed samples, compared to the loss of the best fixed predictor T l(ŷ t, y t ) min f t=1 T l (f (x t ), y t ) + R T t=1 We want the regret, R T, to be small: a small regret indicates that the performance of the learner is not too far from the one of the best fixed classifier
23 A loss for everyone Mistake, l 01 (w, x, y) := 1(y sign( w, x )) Hinge, l hinge (w, x, y) := max (0, 1 y w, x ) Logistic regression, l logreg (w, x, y) := log (1 + exp ( y w, x )) exponential loss, etc.
24 Let s start from the Perceptron for t = 1, 2,..., T do Receive new instance x t Predict ŷ t = sign( w, x t ) Receive label y t if y t ŷ t then w = w + y t x t end if end for
25 The mistake bound of the Perceptron Let (x 1, y 1 ),, (x T, y T ) be any sequence of instancelabel pairs, y t { 1, +1}, and x t R. The number of mistakes of the Perceptron is bounded by where L = T i l hinge (u, x i, y i ) min L + u 2 R 2 + u R L u } {{ } R T
26 The mistake bound of the Perceptron Let (x 1, y 1 ),, (x T, y T ) be any sequence of instancelabel pairs, y t { 1, +1}, and x t R. The number of mistakes of the Perceptron is bounded by where L = T i l hinge (u, x i, y i ) min L + u 2 R 2 + u R L u } {{ } R T If the problem is linearly separable the maximum number of mistakes is R 2 u 2, regardless of the ordering of the samples!
27 The mistake bound of the Perceptron Let (x 1, y 1 ),, (x T, y T ) be any sequence of instancelabel pairs, y t { 1, +1}, and x t R. The number of mistakes of the Perceptron is bounded by where L = T i l hinge (u, x i, y i ) min L + u 2 R 2 + u R L u } {{ } R T If the problem is linearly separable the maximum number of mistakes is R 2 u 2, regardless of the ordering of the samples! Video universitylogo
28 Pros and Cons of the Perceptron Pros :) Very efficient! It can be easily trained with huge datasets Theoretical guarantee on the maximum number of mistakes
29 Pros and Cons of the Perceptron Pros :) Cons :( Very efficient! It can be easily trained with huge datasets Theoretical guarantee on the maximum number of mistakes Linear hyperplane only Binary classification only
30 Pros and Cons of the Perceptron Pros :) Cons :( Very efficient! It can be easily trained with huge datasets Theoretical guarantee on the maximum number of mistakes Linear hyperplane only Binary classification only Let s generalize the Perceptron!
31 Nonlinear classifiers using Kernels! Suppose to transform the inputs through a nonlinear transform φ(x), to the feature space A linear classifier in the feature space will result in a nonlinear classifier in the input space We can do even better: the algorithms just need to access to φ(x 1 ), φ(x 2 ), so if we have such a function, K (x 1, x 2 ), we do not need φ()!
32 Kernel Perceptron for t = 1, 2,..., T do Receive new instance ( x t ) Predict ŷ t = sign x i S y ik (x i, x t ) Receive label y t if y t ŷ t then Add x t to the support set S end if end for
33 Follow The Regularized Leader algorithm is just a particular case of a more general algorithm: the follow the regularized leader (FTRL) algorithm In FTRL, at each time step we predict with the approximate batch solution using a linearization of the loss, instead of the true loss functions Regret bounds will come almost for free! See [Kakade et al., 2009] for FTRL and [Orabona&Crammer, 2010] for an even more general algorithm
34 The general algorithm for t = 1, 2,..., T do Receive new instance x t Predict ŷ t Receive label y t θ = θ η t l(w, x t, y t ) w = g t (θ) end for F. Orabona, and K. Crammer. New Adaptive for Online Classification. Accepted in Neural Information Processing Systems (NIPS) 2010
35 The general algorithm for t = 1, 2,..., T do Receive new instance x t Predict ŷ t Receive label y t θ = θ η t l(w, x t, y t ) w = g t (θ) end for Suppose g t (θ) = 1 2 θ 2 g t (θ) = θ Let s use the hinge loss l(w, x t, y t ) = y t x t η t = 1 on mistakes, 0 otherwise We recovered the Perceptron algorithm! F. Orabona, and K. Crammer. New Adaptive for Online Classification. Accepted in Neural Information Processing Systems (NIPS) 2010
36 The Recipe Create the online algorithm that best suits your needs! Define a task this will define a (convex) loss function. Define which charateristic you would like your solution to have this will define a (strongly convex) regularizer. Compute the gradient of the loss. Compute the gradient of the fenchel dual of the regularizer. Just code it!
37 Some examples Multiclassmultilabel classification, M classes, M different classifiers w i. l( w, x, Y) = max(1 + max y / Y w y, x min y Y w y, x, 0) Do you prefer sparse classifiers? Use the regularizer w 2 p with 1 < p 2. K different kernels? Use the regularizer [ w 1 2,..., w K 2 ] 2 with 1 < p 2. p
38 Batch Solutions with Online What most of the persons do when they want a batch solution: they stop the online algorithm and use the last solution found. This is wrong: the last solution can be arbitrarly bad! If the data are IID you can simply use the averaged solution [CesaBianchi et al., 2004]. Alternative way: several epochs of training, until convergence.
39 Batch Solutions with Online What most of the persons do when they want a batch solution: they stop the online algorithm and use the last solution found. This is wrong: the last solution can be arbitrarly bad! If the data are IID you can simply use the averaged solution [CesaBianchi et al., 2004]. Alternative way: several epochs of training, until convergence. Video
40 Outline OM2 Projectron BBQ 1 2 OM2 Projectron BBQ
41 Multi Kernel Learning online? OM2 Projectron BBQ Cue 1 w 1 Cue 2 w 2... Σ Prediction Cue F w F We have F different features, e.g. color, shape, etc.
42 Multi Kernel Learning online? OM2 Projectron BBQ Cue 1 w 1 Cue 2 w 2... Σ Prediction Cue F w F We have F different features, e.g. color, shape, etc. Solution: Online MultiClass MultiKernel learning algorithm L. Jie, F. Orabona, M. Fornoni, B. Caputo, and N. CesaBianchi. OM2: An Online Multiclass Multikernel Learning Algorithm. In Proc. of the 4th IEEE for Computer Vision Workshop (in CVPR10) universitylogo
43 OM2: Pseudocode OM2 Projectron BBQ Input: q Initialize: θ 1 = 0, w 1 = 0 for t = 1, 2,..., T do Receive new instance x t Predict ŷ t = argmax y=1,...,m Receive label y t z t = φ(x t, y t ) φ(x t, ŷ t ) if l( w t, x t, y t ) > 0 then η t = min { 2 w t zt 1, 1 z t 2 2,q else η t = 0 θ t+1 = θ t + η t z ( t w j t+1 = 1 q end for w t φ(x t, y) } ) q 2 θ j t+1 2 θ j θ t+1 2,q t+1, j = 1,, F Code available! universitylogo
44 Experiments OM2 Projectron BBQ We compared OM2 to OMCL [Jie et al. ACCV09], and to PAI [Crammer et al. JMLR06] using the best feature and the sum of the kernels. We also used SILP [Sonnenburg et al. JMLR06], a stateoftheart MKL batch solver. We used the Caltech101 with 39 different kernels, as in [Gehler and Nowozin ICCV09].
45 OM2 Projectron BBQ Caltech101: online performance 100 Caltech 101 (5 splits, 30 training examples per category) Online average training error rate OM 2, p=1.01 OMCL PA I (average kernel) PA I (best kernel) number of training samples Best results with p = OM2 achieves the best performance among the online algorithms.
46 OM2 Projectron BBQ Caltech101: batch performance classification rate on test data Caltech 101 (5 splits, 30 training examples per category) OM 2, p=1.01 OMCL PA I (average kernel) PA I (best kernel) SILP number of epochs Matlab implementation of OM2 takes 45 mins, SILP more than 2 hours. The performance advantage of OM2 over SILP is due the fact that OM2 is based on a native multiclass formulation. universitylogo
47 OM2 Projectron BBQ Learning with bounded complexity Perceptronlike algorithms will never stop updating the solution if the problem is not linearly separable.
48 OM2 Projectron BBQ Learning with bounded complexity Perceptronlike algorithms will never stop updating the solution if the problem is not linearly separable. On the other hand, if we use kernels, sooner or later the memory of the computer will finish...
49 OM2 Projectron BBQ Learning with bounded complexity Perceptronlike algorithms will never stop updating the solution if the problem is not linearly separable. On the other hand, if we use kernels, sooner or later the memory of the computer will finish... Is it possible to bound the complexity of the learner?
50 OM2 Projectron BBQ Learning with bounded complexity Perceptronlike algorithms will never stop updating the solution if the problem is not linearly separable. On the other hand, if we use kernels, sooner or later the memory of the computer will finish... Is it possible to bound the complexity of the learner? Yes! Just update with a noisy version of the gradient If the new vector can be well approximated with the old ones, just update the coefficients of the old ones. F. Orabona, J. Keshet, and B. Caputo. Bounded KernelBased. Journal of Machine Learning Research, 2009 universitylogo
51 The Projectron Algorithm OM2 Projectron BBQ for t = 1, 2,..., T do Receive new instance x t Predict ŷ t = sign( w, x t ) Receive label y t if y t ŷ t then w = w + y t x t w = w + y t P(x t ) if δ t = w w η then w = w else w = w end if end if end for It is possible to calculate the projection even using Kernels. The algorithm has a mistake bound and a bounded memory growth. Code available! universitylogo
52 OM2 Projectron BBQ Average Online Error vs Budget Size Adult9, samples, 123 features, Gaussian Kernel Number of Mistakes (%) A9A Gaussian Kernel RBP Forgetron Plain Perceptron PA I Projectron Size of the support set = (36.9) Size of the Support Set universitylogo
53 OM2 Projectron BBQ Robot Navigation & Phoneme Recognition Place Recognition IDOL2 database 5 rooms CRFH features Phoneme recognition Subset of the TIMIT corpus 55 phonemes MFCC + + features
54 Place recognition OM2 Projectron BBQ 1 IDOL2 exp χ 2 kernel 0.9 Accuracy on Test Set (%) Perceptron Projectron++ η 1 Projectron++ η 2 Projectron++ η Training step
55 Place recognition OM2 Projectron BBQ Size of the Support Set Perceptron Projectron++ η 1 Projectron++ η 2 Projectron++ η 3 IDOL2 exp χ 2 kernel Training step
56 Phoneme recognition OM2 Projectron BBQ Phoneme recognition Gaussian kernel Projectron++ PA I 0.31 Number of Mistakes (%) Size of the Support Set x 10 4
57 Phoneme recognition OM2 Projectron BBQ Phoneme recognition Gaussian kernel Projectron++ PA I Error on Test Set (%) Size of the Support Set x 10 4
58 OM2 Projectron BBQ Semisupervised online learning We are given N training samples and a regression problem, {x i, y i } N i=1, y i [ 1, 1] Obtaining the output for a given sample can be expensive, so we want to ask for as few as possible labels. We assume the outputs y t are realizations of random variables Y t such that E Y t = u x t for all t, where u R d is a fixed and unknown vector such that u = 1. The order of the data is still adversarial, but the outputs are generated by a stochastic source
59 OM2 Projectron BBQ Semisupervised online learning We are given N training samples and a regression problem, {x i, y i } N i=1, y i [ 1, 1] Obtaining the output for a given sample can be expensive, so we want to ask for as few as possible labels. We assume the outputs y t are realizations of random variables Y t such that E Y t = u x t for all t, where u R d is a fixed and unknown vector such that u = 1. The order of the data is still adversarial, but the outputs are generated by a stochastic source Solution: Bound on Bias Query algorithm (BBQ) N. CesaBianchi, C. Gentile and F. Orabona. Robust Bounds for Classification via Selective Sampling. In Proc. of the International Conference on Machine Learning (ICML), 2009 universitylogo
60 OM2 Projectron BBQ The Parametric BBQ Algorithm Parameters: 0 < ε, δ < 1 for t = 1, 2,..., T do Receive new instance x t Predict ŷ t = w x ( t ) r = x t I + S t 1 St 1 + x t x 1 t xt ( ) q = St 1 I + S t 1 St 1 + x t x 1 t xt ( ) s = I + S t 1 St 1 + x t x 1 t xt if [ ε r s ] + < q t(t + 1) 2 ln then 2δ Query label y t Update w with a Regularized Least Square S t = [S t, x t ] end if end for The algorithm follows the general framework Every time a query is not issued the predicted output is far from the correct one at most by ε
61 OM2 Projectron BBQ Regret Bound of Parametric BBQ Theorem If Parametric BBQ is run with input ε, δ (0, 1) then: with probability at least 1 δ, t t ε holds on all time steps t when no query is issued; the number N T of queries issued after any number T of steps is bounded as ( ( d N T = O ε 2 ln T ) ) ln(t /δ) ln. δ ε
62 Synthetic Experiment OM2 Projectron BBQ Maximum error Theoretical maximum error Measured maximum error Fraction of queried labels ε Fraction of queried labels We tested Parametric BBQ. 10,000 random examples on the unit circle in R 2. The labels were generated according to our noise model using a randomly selected hyperplane u with unit norm. universitylogo
63 Real World Experiments OM2 Projectron BBQ F measure SOLE Random Parametric BBQ Fraction of queried labels F measure SOLE 0.48 Random Parametric BBQ Fraction of queried labels Fmeasure and fraction of queried labels for different algorithms on Adult9 dataset (left)(gaussian Kernel) and RCV1 (right)(linear kernel). universitylogo
64 Summary OM2 Projectron BBQ Many real world problem cannot be solved using the standard batch framework The online learning framework offers a useful tool in these cases A general algorithm that covers many of the previous known online learning algorithms has been presented The framework allows to easily design algorithms for specific problems
65 OM2 Projectron BBQ Thanks for your attention Code: My website:
Efficient Projections onto the l 1 Ball for Learning in High Dimensions
Efficient Projections onto the l 1 Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai ShalevShwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTIChicago mcallester@ttic.edu Tamir Hazan TTIChicago tamir@ttic.edu Joseph Keshet TTIChicago jkeshet@ttic.edu Abstract In discriminative
More informationMore Generality in Efficient Multiple Kernel Learning
Manik Varma manik@microsoft.com Microsoft Research India, Second Main Road, Sadashiv Nagar, Bangalore 560 080, India Bodla Rakesh Babu rakeshbabu@research.iiit.net CVIT, International Institute of Information
More informationA Support Vector Method for Multivariate Performance Measures
Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate
More informationLearning to Select Features using their Properties
Journal of Machine Learning Research 9 (2008) 23492376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of
More informationTwo faces of active learning
Two faces of active learning Sanjoy Dasgupta dasgupta@cs.ucsd.edu Abstract An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost.
More informationUniversità degli Studi di Bologna
Università degli Studi di Bologna DEIS Biometric System Laboratory Incremental Learning by Message Passing in Hierarchical Temporal Memory Davide Maltoni Biometric System Laboratory DEIS  University of
More informationWhy Does Unsupervised Pretraining Help Deep Learning?
Journal of Machine Learning Research 11 (2010) 625660 Submitted 8/09; Published 2/10 Why Does Unsupervised Pretraining Help Deep Learning? Dumitru Erhan Yoshua Bengio Aaron Courville PierreAntoine Manzagol
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wentau Yih Dav Zimak Department of Computer Science University of Illinois at UrbanaChampaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationThe Set Covering Machine
Journal of Machine Learning Research 3 (2002) 723746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,
More informationAsymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)
Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) Anshumali Shrivastava Department of Computer Science Computing and Information Science Cornell University Ithaca, NY 4853, USA
More informationLarge Linear Classification When Data Cannot Fit In Memory
Large Linear Classification When Data Cannot Fit In Memory ABSTRACT HsiangFu Yu Dept. of Computer Science National Taiwan University Taipei 106, Taiwan b93107@csie.ntu.edu.tw KaiWei Chang Dept. of Computer
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationMostSurely vs. LeastSurely Uncertain
MostSurely vs. LeastSurely Uncertain Manali Sharma and Mustafa Bilgic Computer Science Department Illinois Institute of Technology, Chicago, IL, USA msharm11@hawk.iit.edu mbilgic@iit.edu Abstract Active
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationRectified Linear Units Improve Restricted Boltzmann Machines
Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair vnair@cs.toronto.edu Geoffrey E. Hinton hinton@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON
More informationLarge Scale Online Learning of Image Similarity Through Ranking
Journal of Machine Learning Research 11 (21) 1191135 Submitted 2/9; Revised 9/9; Published 3/1 Large Scale Online Learning of Image Similarity Through Ranking Gal Chechik Google 16 Amphitheatre Parkway
More informationStreaming Submodular Maximization: Massive Data Summarization on the Fly
Streaming Submodular Maximization: Massive Data Summarization on the Fly Ashwinkumar Badanidiyuru Cornell Unversity ashwinkumarbv@gmail.com Baharan Mirzasoleiman ETH Zurich baharanm@inf.ethz.ch Andreas
More informationMINING DATA STREAMS WITH CONCEPT DRIFT
Poznan University of Technology Faculty of Computing Science and Management Institute of Computing Science Master s thesis MINING DATA STREAMS WITH CONCEPT DRIFT Dariusz Brzeziński Supervisor Jerzy Stefanowski,
More informationObject Detection with Discriminatively Trained Part Based Models
1 Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an object detection system based on mixtures
More informationLearning to Find PreImages
Learning to Find PreImages Gökhan H. Bakır, Jason Weston and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstraße 38, 72076 Tübingen, Germany {gb,weston,bs}@tuebingen.mpg.de
More informationImage denoising: Can plain Neural Networks compete with BM3D?
Image denoising: Can plain Neural Networks compete with BM3D? Harold C. Burger, Christian J. Schuler, and Stefan Harmeling Max Planck Institute for Intelligent Systems, Tübingen, Germany http://people.tuebingen.mpg.de/burger/neural_denoising/
More informationSubmodular Function Maximization
Submodular Function Maximization Andreas Krause (ETH Zurich) Daniel Golovin (Google) Submodularity 1 is a property of set functions with deep theoretical consequences and far reaching applications. At
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationRecovering 3D Human Pose from Monocular Images
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. SUBMITTED FOR REVIEW. 1 Recovering 3D Human Pose from Monocular Images Ankur Agarwal and Bill Triggs Abstract We describe a learning based
More informationDeep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)
UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial) Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto
More informationLearning LocallyAdaptive Decision Functions for Person Verification
Learning LocallyAdaptive Decision Functions for Person Verification Zhen Li UIUC zhenli3@uiuc.edu Thomas S. Huang UIUC huang@ifp.uiuc.edu Shiyu Chang UIUC chang87@uiuc.edu Liangliang Cao IBM Research
More informationAdapting Visual Category Models to New Domains
Adapting Visual Category Models to New Domains Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell UC Berkeley EECS and ICSI, Berkeley, CA {saenko,kulis,mfritz,trevor}@eecs.berkeley.edu Abstract.
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More information