ECS289: Scalable Machine Learning

Size: px
Start display at page:

Download "ECS289: Scalable Machine Learning"

Transcription

1 ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016

2 Outline Kernel Methods Solving the dual problem Kernel approximation

3 Support Vector Machines (SVM) SVM is a widely used classifier. Given: Training data points x 1,, x n. Each x i R d is a feature vector: Consider a simple case with two classes: y i {+1, 1}. Goal: Find a hyperplane to separate these two classes of data: if y i = 1, w T x i 1 ξ i ; y i = 1, w T x i 1 + ξ i.

4 Support Vector Machines (SVM) Given training data x 1,, x n R d with labels y i {+1, 1}. SVM primal problem (find optimal w R d ): min w,ξ 1 2 w T w + C n i=1 ξ i s.t. y i (w T x i ) 1 ξ i, ξ i 0, i = 1,..., n, SVM dual problem (find optimal α R n ): 1 min α 2 αt Qα e T α s.t.0 α i C, i = 1,..., n Q: n-by-n kernel matrix, Q ij = y i y j x T i x j Each α i corresponds to one training data point. Primal-dual relationship: w = i α iy i x i

5 Non-linearly separable problems What if the data is not linearly separable? Solution: map data x i to higher dimensional(maybe infinite) feature space ϕ(x i ), where they are linearly separable.

6 Support Vector Machines (SVM) SVM primal problem: min w,ξ 1 2 w T w + C n i=1 ξ i s.t. y i (w T ϕ(x i )) 1 ξ i, ξ i 0, i = 1,..., n, The dual problem for SVM: 1 min α 2 αt Qα e T α, s.t. 0 α i C, for i = 1,..., n, where Q ij = y i y j ϕ(x i ) T ϕ(x j ) and e = [1,..., 1] T. Kernel trick: define K(x i, x j ) = ϕ(x i ) T ϕ(x j ). At optimum: w = n i=1 α iy i ϕ(x i ),

7 Various types of kernels Gaussian kernel: K(x i, y j ) = e γ x i x j 2 2 ; Polynomial kernel: K(x i, x j ) = (γx T i x j + c) d. Other kernels for specific problems: Graph kernels (Vishwanathan et al., Graph Kernels, JMLR, 2010) Pyramid kernel for image matching (Grauman and Darrell, The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In ICCV, 2005) String kernel (Lodhi et al., Text classification using string kernels. JMLR, 2002).

8 General Kernelized ERM L2-Regularized Empirical Risk Minimization: min w 1 2 w 2 + C n l i (w T ϕ(x i )) i=1 x 1,, x n : training samples ϕ(x i ): nonlinear mapping to a higher dimensional space Dual problem: min α 1 2 αt Qα + n l i ( α i ) i=1 where Q R n n and Q ij = ϕ(x i ) T ϕ(x j ) = K(x i, x j ).

9 Kernel Ridge Regression Given training samples (x i, y i ), i = 1,, n. min w 1 2 w n (w T ϕ(x i ) y i ) 2 i=1 Dual problem: min α αt Qα + α 2 2α T y

10 Scalability Challenge for solving kernel SVMs (for dataset with n samples): Space: O(n 2 ) for storing the n-by-n kernel matrix (can be reduced in some cases); Time: O(n 3 ) for computing the exact solution.

11 Greedy Coordinate Descent for Kernel SVM

12 Nonlinear SVM SVM primal problem: min w,ξ 1 2 w T w + C n i=1 ξ i s.t. y i (w T ϕ(x i )) 1 ξ i, ξ i 0, i = 1,..., n, w: an infinite dimensional vector, very hard to solve The dual problem for SVM: 1 min α 2 αt Qα e T α, s.t. 0 α i C, for i = 1,..., n, where Q ij = y i y j ϕ(x i ) T ϕ(x j ) and e = [1,..., 1] T. Kernel trick: define K(x i, x j ) = ϕ(x i ) T ϕ(x j ). Example: Gaussian kernel K(x i, x j ) = e γ x i x j 2

13 Nonlinear SVM Can we solve the problem by dual coordinate descent? The vector w = i y iα i ϕ(x i ) may have infinite dimensionality: Cannot maintain w Closed form solution needs O(n) computational time: δ = max( α i min(c α i, 1 (Qα) i Q ii )) (Assume Q is stored in memory) Can we improve coordinate descent using the same O(n) time complexity?

14 Greedy Coordinate Descent The Greedy Coordinate Descent (GCD) algorithm: For t = 1, 2, Compute δ i := argmin δ f (α + δe i ) for all i = 1,..., n 2. Find the best i according to the following criterion: 3. α i α i + δ i i = argmax δi i

15 Greedy Coordinate Descent Other variable selection criterion: The coordinate with the maximum step size: i = argmax δi i The coordinate with maximum objective function reduction: i ( = argmax f (α) f (α + δ i e i ) ) i The coordinate with the maximum projected gradient....

16 Greedy Coordinate Descent How to compute the optimal coordinate? Closed form solution of best δ: ( δi = max α i, min ( C α i, 1 (Qα) ) i ) Q ii Observations: 1 Computing all δ i needs O(n 2 ) time 2 If Qα is stored in memory, computing all δ i only needs O(n) time Maintaining Qα also needs O(n) time after each update

17 Greedy Coordinate Descent Initial: α, z = Qα For t = 1, 2,... For all i = 1,..., n, compute ( δi = max α i, min(c α i, 1 z ) i ) Q ii Let i = argmax i δ i α α + δ i z z + q i δ i (q i is the i -th column of Q) (This is a simplified version of the Sequential Minimal Optimization (SMO) algorithm proposed in Platt el al., 1998) (A similar version is implemented in LIBSVM)

18 How to solve problems with millions of samples? Q R n n cannot be fully stored Have a fixed size of memory to cache the computed columns of Q For each coordinate update: If q i is in memory, directly use it to update If q i is not in memory 1 Kick out the Least Recent Used column 2 Recompute q i and store it in memory 3 Update α i

19 LIBSVM Implemented in LIBSVM: Other functionalities: Multi-class classification Support vector regression Cross-validation

20 Kernel Approximation

21 Kernel Approximation Kernel methods are hard to scale up because Kernel matrix G is an n-by-n matrix Can we approximate the kernel using a low-rank representation? Two main algorithms: Nystrom approximation: Approximate the kernel matrix Random Features: Approximate the kernel function

22 Kernel Matrix Approximation We want to form a low rank approximation of G R n n, where G ij = K(x i, x j ) Can we do SVD?

23 Kernel Matrix Approximation We want to form a low rank approximation of G R n n, where G ij = K(x i, x j ) Can we do SVD? No, SVD needs to form the n-by-n matrix O(n 2 ) space and O(n 3 ) time

24 Kernel Matrix Approximation We want to form a low rank approximation of G R n n, where G ij = K(x i, x j ) Can we do SVD? No, SVD needs to form the n-by-n matrix O(n 2 ) space and O(n 3 ) time Can we do matrix completion or matrix sketching?

25 Kernel Matrix Approximation We want to form a low rank approximation of G R n n, where G ij = K(x i, x j ) Can we do SVD? No, SVD needs to form the n-by-n matrix O(n 2 ) space and O(n 3 ) time Can we do matrix completion or matrix sketching? Main problem: need to generalize to new points

26 Nystrom Approximation Nystrom approximation: Randomly sample m columns of G: [ ] W G12 G = G 21 G 22 W : m-by-m square matrix (observed) G 21 : (n m)-by-m matrix (observed) G 12 = G T 21 (observed) G 22 : (n m)-by-(n m) matrix (not observed) Form a kernel approximation based on these mn elements [ G G W = W G21] [ ] W G 12 W : pseudo-inverse of W

27 Nystrom Approximation Why W? Exact recover the top left m-by-m matrix The kernel approximation: [ ] [ ] W G12 W W [ [ ] ] W G12 W G G 21 G 22 G 12 = 21 G 21 G 21 W G 12

28 Nystrom Approximation Algorithm: 1 Sample m landmark points : v 1,, v m 2 Compute the kernel values between all training data to landmark points: K(x 1, v 1 ) K(x 1, v m ) [ W K(x 2, v 1 ) K(x 2, v m ) C = = G21]... K(x n, v 1 ) K(x n, v m ) 3 Form the kernel approximation G = CW C T (No need to explicitly form the n-by-n matrix) Time complexity: mn kernel evaluations (O(mnd) time if use classical kernels) How to choose m? Trade-off between accuracy v.s computational time and memory space

29 Nystrom Approximation: Training Solve the dual problem using low-rank representation. Example: Kernel ridge regression min α αt Gα + λ α 2 y T α Solve a linear system ( G + λi )α = y Use iterative methods (such as CG), with fast matrix-vector multiplication ( G + λi )p = CW C T p + λp O(nm) time complexity per iteration

30 Nystrom Approximation: Training Another approach: reduce to linear ERM problems! Rewrite as G = CW C T = C(W ) 1/2 (W ) 1/2 C T So the approximate kernel can be represent by linear kernel with feature matrix C(W ) 1/2 : K(x i, x j ) = G ij = ui T u j, K(x j, v 1 ) where u j = (W ) 1/2. K(x j, v m ) The problem is equivalent to a linear models with features u 1,, u n R m Kernel SVM Linear SVM with m features Kernel Ridge Regression Linear Ridge regression with m features

31 Nystrom Approximation: Prediction Given the dual solution α. Prediction for testing sample x: n α i K(xi, x) i=1 or n α i K(x i, x)? i=1

32 Nystrom Approximation: Prediction Given the dual solution α. Prediction for testing sample x: n α i K(xi, x) i=1 Need to use approximate kernel instead of original kernel! Approximate kernel gives much better accuracy! Because training & testing should use the same kernel. Generalization bound: (Alaoui and Mahoney, Fast Randomized Kernel Methods With Statistical Guarantees ) (Rudi et al., Less is More: Nystrom Computational Regularization. NIPS ) (using small number of landmark points can be viewed as a regularization)

33 Nystrom Approximation: Prediction How to define K(x, x i )? K(x 1, v 1 ) K(x 1, v m ) K(v 1, x 1 ) K(v 1, x) G =... K(x n, v 1 ) K(x 1, v m ) W... K(v K(x, v 1 ) K(x 1, v m ) m, x 1 ) K(v m, x) K(v 1, x) Compute u = (W ) 1/2. (need m kernel evaluations) K(v m, x) Approximate kernel can be defined by K(x, x i ) = u T u i

34 Nystrom Approximation: Prediction Prediction: f (x) = n n α i u T u i = u T ( α i u i ) i=1 ( n i=1 α iu i ) can be pre-computed prediction time is m kernel evaluations Original kernel method: need n kernel evaluations for prediction. Summary: Quality Training time Prediction time Original kernel exact n 1.x kernel + kernel training n kernel Nystrom (m) rank m nm kernel + linear training m kernel i=1

35 Nystrom Approximation How to select landmark points (v 1,, v m ) Traditional approach: uniform random sampling from training data Importance sampling (leverage score): Drineas and Mahoney On the Nystrom method for approximating a Gram matrix for improved kernel-based learning. JMLR Gittens and Mahoney Revisiting the Nystrom method for improved large-scale machine learning. ICML 2013 Kmeans sampling (performs extremely well) : Run kmeans clustering and set v 1,, v m to be cluster centers Zhang et al., Improved Nystrom low rank approximation and error analysis. ICML Subspace distance: Lim et al., Double Nystrom method: An efficient and accurate Nystrom scheme for large-scale data sets. ICML 2015.

36 Nystrom Approximation (other related papers) Distributed setting: (Kumar et al., Ensemble Nystrom method. NIPS 2009). Block diagonal + low-rank approximation: (Si et al., Memory efficient kernel approximation. ICML ) Nystrom method for fast prediction: (Hsieh et al., Fast prediction for large-scale kernel machines. NIPS, ) Structured landmark points: (Si et al., Computational Efficient Nystrom Approximation using Fast Transforms. ICML )

37 Kernel Approximation (random features)

38 Random Features Directly approximation the kernel function (Data-independent) Generate nonlinear features reduce the problem to linear model training Several examples: Random Fourier Features: (Rahimi and Recht, Random features for large-scale kernel machines. NIPS 2007) Random features for polynomial kernel: (Kar and Karnick, Random feature maps for dot product kernels. AISTATS )

39 Random Fourier Features Random features for Shift-invariant kernel : A continuous kernel is shift-invariant if K(x, y) = k(x y). Gaussian kernel: K(x, y) = e x y 2 Laplacian kernel: K(x, y) = e x y 1 Shift invariant kernel (if positive definite) can be written as: k(x y) = P(w)e jw T (x y) dw R d for some probability distribution P(w).

40 Random Fourier Features Taking the real part we have k(x y) = P(w) cos(w T x w T y) R d = P(w)(cos(w T x) cos(w T y) + sin(w T x) sin(w T y)) R d = E w P(w) [z w (x) T z w (y)] where z w (x) T = [cos(w T x), sin(w T x)]. z w (x) T z w (y) is an unbiased estimator of K(x, y) if w is sampled from P( )

41 Random Fourier Features Sample m vectors w 1,, w m from P( ) distribution Generate random features for each sample: sin(w1 T x i) cos(w1 T x i) sin(w2 T x i) u i = cos(w2 T x i). sin(w m T x i ) cos(wm T x i ) K(x i, x j ) ui T u j (larger m leads to better approximation)

42 Random Features G UU T, where U = u 1. T un T Rank 2m approximation. The problem can be reduced to linear classification/regression for u 1,, u n. Time complexity: O(nmd) time + linear training (Nystrom: O(nmd + m 3 ) time + linear training ) Prediction time: O(md) per sample (Nystrom: O(md) per sample)

43 Fastfood Random Fourier features require O(md) time to generate m features: Main bottleneck: Computing w T i x for all i = 1,, m Can we compute this in O(m log d) time?

44 Fastfood Random Fourier features require O(md) time to generate m features: Main bottleneck: Computing wi T x for all i = 1,, m Can we compute this in O(m log d) time? Fastfood: proposed in (Le et al., Fastfood Approximating Kernel Expansions in Loglinear Time. ICML ) Reduce time by using structured random features

45 Fastfood Tool: fast matrix-vector multiplication for Hadamard matrix (H d R d d ) H d x requires O(d log d) time Hadamard matrix: H 1 = [1] [ ] 1 1 H 2 = 1 1 [ ] H2 k 1 H H 2 k = 2 k 1 H 2 k 1 H 2 k 1 H d x can be computed efficiently by Fast Hadamard Transform (dynamic programming)

46 Fastfood Sample m = d random features by v W = [ ] 0 v 2 0 H d v d So W x only needs O(d log d) time. Instead of sampling W (d 2 elements), we just sample v 1,, v d from N(0, σ) Each row of W is still N(0, σ) (but rows are not independent) Faster computation, but performance is slightly worse than original random features.

47 Extensions Fastfood: structured random features to improve computational speed: (Le et al., Fastfood Approximating Kernel Expansions in Loglinear Time. ICML ) Structured landmark points for Nystrom approximation: (Si et al., Computational Efficient Nystrom Approximation using Fast Transforms. ICML ) Doubly stochastic gradient with random features: (Dai et al., Scalable Kernel Methods via Doubly Stochastic Gradients. NIPS ) Generalization bound (?)

48 Coming up Choose the paper for presentation (due this Sunday, Oct 23). Questions?

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Modélisation et résolutions numérique et symbolique

Modélisation et résolutions numérique et symbolique Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat Mohab.Safey@lip6.fr Outline Previous course: partial review of what

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Trading Strategies and the Cat Tournament Protocol

Trading Strategies and the Cat Tournament Protocol M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

More information

Transform-based Domain Adaptation for Big Data

Transform-based Domain Adaptation for Big Data Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from

More information

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

Large Linear Classification When Data Cannot Fit In Memory

Large Linear Classification When Data Cannot Fit In Memory Large Linear Classification When Data Cannot Fit In Memory ABSTRACT Hsiang-Fu Yu Dept. of Computer Science National Taiwan University Taipei 106, Taiwan b93107@csie.ntu.edu.tw Kai-Wei Chang Dept. of Computer

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Forecasting and Hedging in the Foreign Exchange Markets

Forecasting and Hedging in the Foreign Exchange Markets Christian Ullrich Forecasting and Hedging in the Foreign Exchange Markets 4u Springer Contents Part I Introduction 1 Motivation 3 2 Analytical Outlook 7 2.1 Foreign Exchange Market Predictability 7 2.2

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

On sequence kernels for SVM classification of sets of vectors: application to speaker verification

On sequence kernels for SVM classification of sets of vectors: application to speaker verification On sequence kernels for SVM classification of sets of vectors: application to speaker verification Major part of the Ph.D. work of In collaboration with Jérôme Louradour Francis Bach (ARMINES) within E-TEAM

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Factorization Machines

Factorization Machines Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osaka-u.ac.jp Abstract In this

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Randomized Robust Linear Regression for big data applications

Randomized Robust Linear Regression for big data applications Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Fast Kernel Classifiers with Online and Active Learning

Fast Kernel Classifiers with Online and Active Learning Journal of Machine Learning Research 6 (2005) 1579 1619 Submitted 3/05; Published 9/05 Fast Kernel Classifiers with Online and Active Learning Antoine Bordes NEC Laboratories America 4 Independence Way

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Evaluation of Machine Learning Techniques for Green Energy Prediction

Evaluation of Machine Learning Techniques for Green Energy Prediction arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Classification using Intersection Kernel Support Vector Machines is Efficient

Classification using Intersection Kernel Support Vector Machines is Efficient To Appear in IEEE Computer Vision and Pattern Recognition 28, Anchorage Classification using Intersection Kernel Support Vector Machines is Efficient Subhransu Maji EECS Department U.C. Berkeley smaji@eecs.berkeley.edu

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Lecture 4: SMO-MKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning

More information

On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers

On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 fbach@cs.berkeley.edu Abstract

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Large Scale Learning to Rank

Large Scale Learning to Rank Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

O(n 3 ) O(4 d nlog 2d n) O(n) O(4 d log 2d n)

O(n 3 ) O(4 d nlog 2d n) O(n) O(4 d log 2d n) O(n 3 )O(4 d nlog 2d n) O(n)O(4 d log 2d n) O(n 3 ) O(n) O(n 3 ) O(4 d nlog 2d n) O(n)O(4 d log 2d n) 1. Fast Gaussian Filtering Algorithm 2. Binary Classification Using The Fast Gaussian Filtering Algorithm

More information

Learning to Find Pre-Images

Learning to Find Pre-Images Learning to Find Pre-Images Gökhan H. Bakır, Jason Weston and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstraße 38, 72076 Tübingen, Germany {gb,weston,bs}@tuebingen.mpg.de

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

Working Set Selection Using Second Order Information for Training Support Vector Machines

Working Set Selection Using Second Order Information for Training Support Vector Machines Journal of Machine Learning Research 6 (25) 889 98 Submitted 4/5; Revised /5; Published /5 Working Set Selection Using Second Order Information for Training Support Vector Machines Rong-En Fan Pai-Hsuen

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS. renren.com. Saturday, December 3, 11 Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

More information

A Study on SMO-type Decomposition Methods for Support Vector Machines

A Study on SMO-type Decomposition Methods for Support Vector Machines 1 A Study on SMO-type Decomposition Methods for Support Vector Machines Pai-Hsuen Chen, Rong-En Fan, and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

The Many Facets of Big Data

The Many Facets of Big Data Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong ACPR 2013 Big Data 1 volume sample size is big feature dimensionality is big 2 variety multiple formats:

More information

Large-scale Machine Learning in Distributed Environments

Large-scale Machine Learning in Distributed Environments Large-scale Machine Learning in Distributed Environments Chih-Jen Lin National Taiwan University ebay Research Labs Tutorial at ACM ICMR, June 5, 2012 Chih-Jen Lin (National Taiwan Univ.) 1 / 105 Outline

More information