Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Similar documents
Steven C.H. Hoi School of Information Systems Singapore Management University

Steven C.H. Hoi. Methods and Applications. School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013

Online Learning Methods for Big Data Analytics

Online Feature Selection for Mining Big Data

DUOL: A Double Updating Approach for Online Learning

Simple and efficient online algorithms for real world applications

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Chapter 4: Artificial Neural Networks

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Machine Learning over Big Data

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Journée Thématique Big Data 13/03/2015

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Similarity Search in a Very Large Scale Using Hadoop and HBase

Azure Machine Learning, SQL Data Mining and R

Introduction to Online Learning Theory

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet

The Scientific Data Mining Process

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Supervised Learning (Big Data Analytics)

Scalable Developments for Big Data Analytics in Remote Sensing

A Simple Introduction to Support Vector Machines

TOWARD BIG DATA ANALYSIS WORKSHOP

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Machine Learning using MapReduce

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Data-stream Mining for Rule-based Access Control. Andrii Shalaginov, 13 th of October 2014 COINS PhD seminar

Unsupervised Data Mining (Clustering)

The Need for Training in Big Data: Experiences and Case Studies

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

Information Management course

Active Learning SVM for Blogs recommendation

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Data, Measurements, Features

II. RELATED WORK. Sentiment Mining

Learning to Process Natural Language in Big Data Environment

Sanjeev Kumar. contribute

Car Insurance. Havránek, Pokorný, Tomášek

Multiple Kernel Learning on the Limit Order Book

Supervised Feature Selection & Unsupervised Dimensionality Reduction

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Decision Trees from large Databases: SLIQ

Learning is a very general term denoting the way in which agents:

Bringing Big Data Modelling into the Hands of Domain Experts

L25: Ensemble learning

Topics in basic DBMS course

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

D-optimal plans in observational studies

Online Semi-Supervised Learning

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Linear Threshold Units

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Content-Based Recommendation

Database Marketing, Business Intelligence and Knowledge Discovery

Big Data Analytics CSCI 4030

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Knowledge Discovery from patents using KMX Text Analytics

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Chapter 6. The stacking ensemble approach

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

MapReduce/Bigtable for Distributed Optimization

Collaborative Filtering. Radek Pelánek

Predicting Flight Delays

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Large-Scale Similarity and Distance Metric Learning

Analytics on Big Data

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Prediction of Stock Performance Using Analytical Techniques

Spark and the Big Data Library

Big Data Text Mining and Visualization. Anton Heijs

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

REVIEW OF ENSEMBLE CLASSIFICATION

Jubatus: An Open Source Platform for Distributed Online Machine Learning

Big Data Analytic and Mining with Machine Learning Algorithm

Big-data Analytics: Challenges and Opportunities

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

Hybrid model rating prediction with Linked Open Data for Recommender Systems

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Class Imbalance Learning in Software Defect Prediction

Challenges for Data Driven Systems

BIG DATA What it is and how to use?

A Survey of Classification Techniques in the Area of Big Data.

Transcription:

Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc.

2

Agenda Introduction Big Data: Opportunities & Challenges Online Learning: What & Why Online Learning for Living Data Analytics Research Challenges Online Feature Selection Online Collaborative Filtering Online Multiple Kernel Learning Conclusions 3

Big Data: Popularity Google Trends Big Hope or Big Hype 4

What is Big Data Volume Velocity Variety Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-vs-of-big-data.jpg 5

Big Data: Big Value Source from McKinsey 6

Big Data: Opportunities 7

Big Data: Challenges Volume Efficiency Handle vast volume of data (million or even billion) with limited computing capacity (CPU/RAM/DISK) Scalability Be able to scale up to handle exploding data (e.g., real-time data streams) Velocity Big Online Data Learning Analytics Variety Adaptability Be able to adapt complex and changing environment to deal with diverse data 8

What is Online Learning? Batch/Offline Learning Online Learning Feedback Learner Update Predictor 9

Example: Perceptron algorithm (Rosenblatt 1958) w 1 w 3 + w 2-10

Why Online Learning? Avoid re-training when adding new data High efficiency Excellent scalability Strong adaptability to changing environments Simple to understand Trivial to implement Easy to be parallelized Theoretical guarantee 11

Agenda Introduction Big Data: Opportunities & Challenges Online Learning: What & Why Online Learning for Living Data Analytics Research Challenges Online Feature Selection Online Collaborative Filtering Online Multiple Kernel Learning Conclusions 12

Challenges of Living Data Analytics High Dimensionality High Sparsity High Variety 13

Online Learning Methods Online Feature Selection (BigMine13, TKDE14) To select a subset of informative features in machine learning tasks for analyzing data with high dimensionality Online Collaborative Filtering (ACML 13, RecSys 13,) To learn from a sequence of rating data (sparse rating matrix) for resolving recommender systems Online Multiple Kernel Learning (ML 13, TPAMI 14) To fuse multiple types of diverse data sources by multiple kernels based machine learning where each kernel represents each type of data/representation 14

Online Feature Selection Feature Selection Select a subset of informative features and remove irrelevant/redundant features for model construction Alleviate the curse of dimensionality, speed up the learning task, improve the interpretability From offline/batch to Online learning Online learner is allowed to maintain a classifier by involving only a small fixed number of features The challenge is how to make accurate prediction on an instance using a small number of active features. 15

Online Feature Selection Family of Online Feature Selection algorithms Key idea: exploring sparse online learning Online Feature Selection (Hoi et al. BigMine13, TKDE14) Online Gradient Descent or Perceptron Sparse projection and truncation 16

Empirical Evaluation Batch Algorithms LIBLINEAR (Fan et al., JMLR'08): state-of-the-art fast linear classification tool FGM (Tan & Tsang ICML 10): state-of-the-art fast feature selection algorithm Online Algorithms Perceptron with truncation OFS: online FS via sparse projection Evaluation Metrics Classification accuracy Training time cost 17

Empirical Evaluation 18

Empirical Evaluation mrmr: mutual information criteria of maxdependency, max-relevance, and minredundancy (Peng et al TPAMI 05, cited over 2000+) 19

Scalability on Ultra-high Dimensional Data D - # dimensions N - # instances F - # non-zero features S - binary file size D = 10K N = 100K F = 30 Million S = 800MB D = 1 Billion N = 1 Million F = 1 Billion S = 13 GB (estimated) 20

Online Collaborative Filtering Collaborative Filtering (CF) It uses known preferences of other users to make prediction to the unknown preferences of other users. Challenges of living analytics Extremely sparse data Data arrive sequentially Batch CF algorithms have some critical limitations (e.g., high retraining cost) 21

Online Collaborative Filtering CF: from batch to online learning The learning process works sequentially for dealing with new rating data instances on the fly Make the recommender system evolve over time Existing approach for OCF Matrix factorization for dealing with sparse data First order algorithms, e.g., online gradient descent But suffer from slow convergence rate We propose second order online CF method Confidence weighted online collaborative filtering (CWOCF) algorithms (Jing, Wang, Hoi, ACML 13) Online Multi-Task Collaborative Filtering algorithms (Wang et al RecSys 13) 22

CWOCF: formulation Main objective function: Learn U and V from partly observed ratings R: Loss functions: 23

CWOCF: online update Assuming: Online learning w.r.t. each received rating: Online Update (w.r.t. RMSE): 24

Empirical Evaluation Compared algorithms: Datesets: 25

Empirical Evaluation 26

Online Multiple Kernel Learning Motivation Variety is a key challenge for living/big data analytics Traditional methods assume data in vector space Real objects often have diverse representations Multiple Kernel Representation Each kernel represents one similarity function Pyramid matching kernels (vision, multimedia) Graph kernels (bio, web/social, etc) Sequence kernels (speech, video, bio, etc) Tree kernels (NLP, etc) 27

Multiple Kernel Learning (MKL) What is Multiple Kernel Learning (MKL) (Lanckriet et al JMLRl04) Kernel method by an optimal combination of multiple kernels Batch MKL Formulation Hard to solve the convex-concave optimization for big data! Can we avoid solving the batch optimization directly? 28

Online MKL (Hoi et al., ML 13) Objective Aims to learn a kernel-based predictor with multiple kernels from a sequence of (multi-modal) data examples Avoid the need of solving complicated optimizations Main idea: a two-step online learning At each iteration, if there is a mistake: Step 1: Online learning with each single kernel Kernel Perceptron (Rosenblatt Frank, 1958, Freund 1999) Step 2: Online update the combination weights Hedge algorithm (Freund and Schapire COLT95) 29

Online MKL for Classification Comparisons Perceptron(u) Perceptron(* best) OM-2(MKL variant) OMKC 30

Online MKL for Multimedia Retrieval Online Multi-Kernel Similarity Learning (Xia et al TPAMI 14) Aim to learn multi-kernel similarity for multimedia retrieval Color Side Info Stream Texture OMKS Contentbased Multimedia Retrieval Local pattern (BoW) 31

Multi-modal Image Retrieval Query OASIS(*) OKS(*) OMKS-U OMKS OASIS(*) OKS(*) OMKS-U OMKS 32

Conclusion Introduction of emerging opportunities and challenges when machine learning meets big data Introduction of online learning, a promising family of machine learning techniques for living analytics with big data Present three online learning techniques to address different real-world challenges of living data analytics tasks 33

Take-Home Message Online learning is promising for living/big data analytics More challenges and opportunities ahead: More effective online learning algorithms Handle more real-world challenges, e.g., sparsity, highdimensionality, concept drifting, noise, etc. Scale up for mining billions of instances using distributed computing (e.g., Hadoop) & parallel programming (e.g., GPU) LIBOL: An open-source Library of Online Learning Algorithms http://libol.stevenhoi.org Also available at JMLR MLOSS 34