Big Data Analytics. Lucas Rego Drumond
|
|
- Gary Underwood
- 8 years ago
- Views:
Transcription
1 Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1 / 31
2 Outline 0. Review 1. Learning from Huge Scale Data 2. Does more data help? Going For Large Scale 1 / 31
3 0. Review Outline 0. Review 1. Learning from Huge Scale Data 2. Does more data help? Going For Large Scale 1 / 31
4 0. Review Prediction Problem: Formally Let X be any set (the predictor space), Y be any set (the target space) Task: Given: some training data D train X Y a loss function l : Y Y R that measures how bad is a prediction ŷ if the true value is y compute a prediction function: ŷ : X Y such that the empirical risk is minimum: risk(ŷ, D test ) := 1 D test (x,y) D test l(y, ŷ(x)) Going For Large Scale 1 / 31
5 0. Review Solving Prediction Tasks Now estimating the parameters β is done by solving the following optimization task: arg min β (x,y) D train l(y, ŷ(x; β)) + λr(β) When solving a prediction task we need to define the following components: A prediction function ŷ(x; β) A loss function l(y, ŷ(x; β)) A regularization function R(β) A learning algorithm to solve the optimization task above. Going For Large Scale 2 / 31
6 0. Review Descent Methods 1: procedure DescentMethod input: f Choose an initial point β (0) R n 2: Get initial point β (0) 3: t 0 The next point is generated 4: repeat using 5: Get Update Direction β (t) A step size µ 6: Get Step Size µ A direction β such that 7: β (t+1) β (t) + µ β (t) 8: t t + 1 f (β (t) + µ β (t 1) ) < f (β (t 1) ) 9: until convergence 10: return β, f (β) 11: end procedure Going For Large Scale 3 / 31
7 0. Review Descent Methods Gradient Descent: GD β = f (β) Stochastic Gradient Descent: For: f (β) = m f i (β) i=1 f i (β) = l(y i, ŷ(x i ; β)) + λ m R(β) The Update Rule is: SGD β = β f i (β) Newton s Method: Newton β = 2 f (β) 1 f (β) Going For Large Scale 4 / 31
8 1. Learning from Huge Scale Data Outline 0. Review 1. Learning from Huge Scale Data 2. Does more data help? Going For Large Scale 5 / 31
9 1. Learning from Huge Scale Data Huge Scale Data The computational complexity of learning algorithms depends (among other things) on the amount of data The larger D train, the longer it takes to compute the gradients Parallelization is not trivial: each update depends on the results of the previous one The whole dataset may not reside in the same machine Going For Large Scale 5 / 31
10 1. Learning from Huge Scale Data Huge Scale dasets X Y Going For Large Scale 6 / 31
11 1. Learning from Huge Scale Data Instance distribution X Y Going For Large Scale 7 / 31
12 1. Learning from Huge Scale Data Feature distribution X Y Going For Large Scale 8 / 31
13 1. Learning from Huge Scale Data Learning from Huge Scale datasets What to do? Use only one portion of the data (subsampling) Learn different models in different portions of the data and combine them (ensembling) Use algorithms that are by nature parallelizable (distributed optimization) Exploit distributed computational models Going For Large Scale 9 / 31
14 2. Does more data help? Outline 0. Review 1. Learning from Huge Scale Data 2. Does more data help? Going For Large Scale 10 / 31
15 2. Does more data help? Subsampling Do we really need so much data? Let us do an experiment: 1. Uniformly sample a subset of the training set D samp D train such that D samp = S 2. Train a model on D samp and evaluate it on D test 3. Increase S and repeat the experiment Going For Large Scale 10 / 31
16 2. Does more data help? Set up Model: logistic regression with L2 regularization Dataset: A9A 1. Binary classification dataset: determine whether a person makes over 50K a year. 2. Total number of training instances: Number of features: Number of test instances: binary.html#a9a Going For Large Scale 11 / 31
17 2. Does more data help? Experiment on the A9A dataset Accuracy on A9A Accuracy Number of Training Instances Going For Large Scale 12 / 31
18 2. Does more data help? Set up Model: logistic regression with L2 regularization Dataset: KDD Cup 2010: Student performance prediction 1. Binary classification dataset: predict whether a student will answer an exercise correctly. 2. Total number of training instances: 8,407, Number of features: 20,216, Number of test instances: 510,302 binary.html#kdd2010(algebra) Going For Large Scale 13 / 31
19 2. Does more data help? Experiment on the KDD Challenge 2010 dataset Accuracy on the KDD Challenge Dataset Accuracy e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Number of Training Instances Going For Large Scale 14 / 31
20 2. Does more data help? Considerations Using a sample of the data can lead to good results with simple models The more model parameters, the more data we need to learn them More complex models will require more data For more complex models we need more efficient algorithms Going For Large Scale 15 / 31
21 Outline 0. Review 1. Learning from Huge Scale Data 2. Does more data help? Going For Large Scale 16 / 31
22 Distributed Learning Assume our data is partitioned accross N machines. Let us denote the data on machine i as D train i N i=1 D train i = D train such that: One solution is to learn one set of local parameters β (i) on each machine and make sure they all agree with a global variable α minimize subject to N i=1 β (i) = α (x,y) D train i l(y, ŷ(x; β (i) )) + λr(α) Going For Large Scale 16 / 31
23 Solving Problems in the Dual Given an equality constrained problem: minimize subject to f (β) Aβ = b The Lagrangian is given by: And the dual: We solve it by 1. ν := arg max β g(ν) 2. β := arg min β L(β, ν ) L(β, ν) = f (β) + ν(aβ b) g(ν) = inf β L(β, ν) Going For Large Scale 17 / 31
24 Dual Ascent We can aplly the gradient method for maximizing the dual: ν t+1 = ν t + µ t g(ν t ) where, given that β = arg min L(β, ν t ): g(ν t ) = Aβ + b Which gives us the following method for solving the dual: β t+1 arg min L(β, ν t ) ν t+1 ν t + µ t (Aβ t+1 + b) Going For Large Scale 18 / 31
25 Dual Decomposition Now suppose f can be rewritten like this: f (β) = f 1 (β 1 ) + f 2 (β 2 ) f N (β N ), β = (β 1, β 2,..., β N ) Partitioning A = [A 1 A N ] so that n i=1 A iβ i = Aβ, we can write the Lagrangian as: L(β, ν) = f (β) + ν(aβ b) L(β, ν) = f 1 (β 1 ) f N (β N ) + ν(a 1 β A n β n b) L(β, ν) = f 1 (β 1 ) + ν T A 1 β f N (β N ) + ν T A N β N ν T b L(β, ν) = L 1 (β 1, ν) L N (β N, ν) ν T b Where: L i (β i, ν) = f i (β i ) + ν T A i β i Going For Large Scale 19 / 31
26 Dual Decomposition The problem With the Lagrange function β t+1 := arg min L(β, ν t ) β L(β, ν) = N L i (β i, ν) ν T b i=1 where L i (β i, ν) = f i (β i ) + ν T A i β i Can be solved by N minimization steps: carried out in parallel β t+1 i := arg min β i L i (β i, ν t ) Going For Large Scale 20 / 31
27 Dual Decomposition Dual decomposition method: β t+1 i arg min β i L i (β i, ν t ) ν t+1 ν t + µ t ( n i=1 A i β t+1 i + b ) At each step: ν t has to be broadcasted β t+1 i are updated in parallel A i β t+1 i are gathered to compute the sum n i=1 A iβ t+1 i Works if assumptions hold but often slow!! Going For Large Scale 21 / 31
28 Method of Multipliers The method of multipliers uses the augmented lagrangian, s > 0: L s (β, ν) = f (β) + ν(aβ b) + ( s 2 ) Aβ b 2 2 and solves the dual problem through the following steps: β t+1 arg min L s (β, ν t ) β ν t+1 ν t + s ( Aβ t+1 + b ) Converges under more relaxed assumptions Augmented Lagrangian not separable because of the additional term (no parallelization) Going For Large Scale 22 / 31
29 Alternating Direction Method of Multipliers (ADMM) ADMM assumes the problem can take the form: minimize f (β) + h(α) subject to Aβ + Bα = c which has the following augmented lagragian: L s (β, α, ν) = f (β) + h(α) + ν T (Aβ + Bα c) + ( s 2 ) Aβ + Bα c 2 2 and solves the dual problem through the following steps: β t+1 arg min L s (β, α t, ν t ) β α t+1 arg min L s (β t+1, α, ν t ) α ν t+1 ν t + s ( Aβ t+1 + Bα t+1 c ) Going For Large Scale 23 / 31
30 Alternating Direction Method of Multipliers (ADMM) β t+1 arg min L s (β, α t, ν t ) β α t+1 arg min L s (β t+1, α, ν t ) α ν t+1 ν t + s ( Aβ t+1 + Bα t+1 c ) At first ADMM seems very similar to the method of multipliers It reduces to the method of multipliers if β and α are optimized jointly if f or h are separable we can now perform the updates of β (or α) in parallel Going For Large Scale 24 / 31
31 ADMM: scaled form We can rewrite the ADMM algorithm in a more convenient form. From the augumented Lagrangian: L s (β, α, ν) = f (β) + h(α) + ν T (Aβ + Bα c) + ( s 2 ) Aβ + Bα c 2 2 we can define: r = Aβ + Bα c so that: ν T r + ( s 2 ) r 2 2 = s 2 r 1 s ν s ν 2 2 = s 2 r + u 2 2 s 2 u 2 2 where u = 1 s ν Going For Large Scale 25 / 31
32 ADMM: scaled form From the augumented Lagrangian: L s (β, α, ν) = f (β) + h 0 (α) + ν T (Aβ + Bα c) + ( s 2 ) Aβ + Bα c 2 2 And r = Aβ + Bα c: ν T r + ( s 2 ) r 2 2 = s 2 r + u 2 2 s 2 u 2 2 where u = 1 s ν, we have: β t+1 arg min f (β) + s β 2 Aβ + Bαt c + u t 2 2 α t+1 arg min h 0 (α) + s α 2 Aβt+1 + Bα c + u t 2 2 u t+1 u t + Aβ t+1 + Bα t+1 c Going For Large Scale 26 / 31
33 Applying ADMM to predictive problems The data is represented it as: 1 x 1,1 x 1,2... x 1,n y 1 1 x 2,1 x 2,2... x 2,n X m,n =..... y = y 2. 1 x m,1 x m,2... x m,n y m We want to learn a model to predict y from X through parameters β: ŷ i = β T x i = β β 1 x i,1 + β 2 x i, β n x i,n Going For Large Scale 27 / 31
34 Applying ADMM to predictive problems Our loss function: L(D train, β) = (x,y) D train l(y, ŷ(x; β)) can easily be decomposed on losses on each subset of the data: minimize N i=1 L(Di train, β) + λr(β) Which can also be written as: minimize N i=1 (x,y) D train i l(y, ŷ(x; β)) + λr(β) Going For Large Scale 28 / 31
35 Example: Ridge Regression General form: minimize N i=1 (x,y) D train i l(y, ŷ(x; β)) + λr(β) Example: Ridge (Linear) Regression: minimize X β y λ β 2 2 = N minimize (ŷ(x; β) y) 2 + λ i=1 (x,y) D train i n j=1 β 2 j Going For Large Scale 29 / 31
36 Applying ADMM to predictive problems Now we can rewrite our problem as minimize L(D train, β) + R(β) minimize L(D train, β) + R(α) subject to β α = 0 And solve it through ADMM: β t+1 arg min L(D train, β) + ν t T (β α t ) + s β 2 β αt 2 2 α t+1 arg min R(α) + ν t T (β t+1 α) + s α 2 βt+1 α 2 2 ν t+1 ν t + s ( β t+1 + α t+1) Going For Large Scale 30 / 31
37 Applying ADMM to predictive problems Given that the loss function is separable: N minimize l(y, ŷ(x; β (i) )) + R(α) i=1 (x,y) D train i subject to β (i) α = 0 We can rewrite the algorithm like: β (i)t+1 arg min β (x,y) D train i l(y, ŷ(x; β (i) )) + ν (i)t T (β α t ) + s 2 β αt 2 2 N α t+1 arg min R(α) + ν (i)t T (β (i) t+1 α) + s α 2 β(i)t+1 α 2 2 i=1 ν (i)t+1 ν (i)t + s (β (i)t+1 + α t+1) And solve for different β i in parallel Going For Large Scale 31 / 31
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Apache Spark Apache Spark 1
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationBig Data Techniques Applied to Very Short-term Wind Power Forecasting
Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 21 Outline
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationBUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.
Tomáš Horváth BUSINESS ANALYTICS Lecture 3 Data Pre-processing Information Systems and Machine Learning Lab University of Hildesheim Germany Overview The aim of this lecture is to describe some data pre-processing
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationParallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationWeekly Sales Forecasting
Weekly Sales Forecasting! San Diego Data Science and R Users Group June 2014 Kevin Davenport! http://kldavenport.com kldavenportjr@gmail.com @KevinLDavenport Thank you to our sponsors: The competition
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationStochastic Optimization for Big Data Analytics: Algorithms and Libraries
Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationPackage acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
More informationFederated Optimization: Distributed Optimization Beyond the Datacenter
Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh J.Konecny@sms.ed.ac.uk H. Brendan McMahan Google, Inc. Seattle, WA 98103
More informationFactoring Quadratic Expressions
Factoring the trinomial ax 2 + bx + c when a = 1 A trinomial in the form x 2 + bx + c can be factored to equal (x + m)(x + n) when the product of m x n equals c and the sum of m + n equals b. (Note: the
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationA Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLarge-Scale Machine Learning with Stochastic Gradient Descent
Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou NEC Labs America, Princeton NJ 08542, USA leon@bottou.org Abstract. During the last decade, the data sizes have grown faster than
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationIntegrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationConsistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationThe Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
More informationOn Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of
More informationLogistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
More informationMachine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34
Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationBayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
More informationThe Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationIFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationTable of Contents. June 2010
June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationResponse prediction using collaborative filtering with hierarchies and side-information
Response prediction using collaborative filtering with hierarchies and side-information Aditya Krishna Menon 1 Krishna-Prasad Chitrapura 2 Sachin Garg 2 Deepak Agarwal 3 Nagaraj Kota 2 1 UC San Diego 2
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationLecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationFactorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osaka-u.ac.jp Abstract In this
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationHow I won the Chess Ratings: Elo vs the rest of the world Competition
How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:
More informationECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node
Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationOnline Lazy Updates for Portfolio Selection with Transaction Costs
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationHeritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage
More informationEfficient variable selection in support vector machines via the alternating direction method of multipliers
Efficient variable selection in support vector machines via the alternating direction method of multipliers Gui-Bo Ye Yifei Chen Xiaohui Xie University of California, Irvine University of California, Irvine
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationNonlinear Algebraic Equations. Lectures INF2320 p. 1/88
Nonlinear Algebraic Equations Lectures INF2320 p. 1/88 Lectures INF2320 p. 2/88 Nonlinear algebraic equations When solving the system u (t) = g(u), u(0) = u 0, (1) with an implicit Euler scheme we have
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More informationOptimization for Machine Learning
Optimization for Machine Learning Lecture 4: SMO-MKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationAnalyze It use cases in telecom & healthcare
Analyze It use cases in telecom & healthcare Chung Min Chen, VP of Data Science The views and opinions expressed in this presentation are those of the author and do not necessarily reflect the position
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationAn interval linear programming contractor
An interval linear programming contractor Introduction Milan Hladík Abstract. We consider linear programming with interval data. One of the most challenging problems in this topic is to determine or tight
More information