Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc.

Agenda Introduction Big Data: Opportunities & Challenges Online Learning: What & Why Online Learning for Living Data Analytics Research Challenges Online Feature Selection Online Collaborative Filtering Online Multiple Kernel Learning Conclusions 3

Big Data: Popularity Google Trends Big Hope or Big Hype 4

What is Big Data Volume Velocity Variety Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-vs-of-big-data.jpg 5

Big Data: Big Value Source from McKinsey 6

Big Data: Opportunities 7

Big Data: Challenges Volume Efficiency Handle vast volume of data (million or even billion) with limited computing capacity (CPU/RAM/DISK) Scalability Be able to scale up to handle exploding data (e.g., real-time data streams) Velocity Big Online Data Learning Analytics Variety Adaptability Be able to adapt complex and changing environment to deal with diverse data 8

What is Online Learning? Batch/Offline Learning Online Learning Feedback Learner Update Predictor 9

Example: Perceptron algorithm (Rosenblatt 1958) w 1 w 3 + w 2-10

Why Online Learning? Avoid re-training when adding new data High efficiency Excellent scalability Strong adaptability to changing environments Simple to understand Trivial to implement Easy to be parallelized Theoretical guarantee 11

Agenda Introduction Big Data: Opportunities & Challenges Online Learning: What & Why Online Learning for Living Data Analytics Research Challenges Online Feature Selection Online Collaborative Filtering Online Multiple Kernel Learning Conclusions 12

Challenges of Living Data Analytics High Dimensionality High Sparsity High Variety 13

Online Learning Methods Online Feature Selection (BigMine13, TKDE14) To select a subset of informative features in machine learning tasks for analyzing data with high dimensionality Online Collaborative Filtering (ACML 13, RecSys 13,) To learn from a sequence of rating data (sparse rating matrix) for resolving recommender systems Online Multiple Kernel Learning (ML 13, TPAMI 14) To fuse multiple types of diverse data sources by multiple kernels based machine learning where each kernel represents each type of data/representation 14

Online Feature Selection Feature Selection Select a subset of informative features and remove irrelevant/redundant features for model construction Alleviate the curse of dimensionality, speed up the learning task, improve the interpretability From offline/batch to Online learning Online learner is allowed to maintain a classifier by involving only a small fixed number of features The challenge is how to make accurate prediction on an instance using a small number of active features. 15

Online Feature Selection Family of Online Feature Selection algorithms Key idea: exploring sparse online learning Online Feature Selection (Hoi et al. BigMine13, TKDE14) Online Gradient Descent or Perceptron Sparse projection and truncation 16

Empirical Evaluation Batch Algorithms LIBLINEAR (Fan et al., JMLR'08): state-of-the-art fast linear classification tool FGM (Tan & Tsang ICML 10): state-of-the-art fast feature selection algorithm Online Algorithms Perceptron with truncation OFS: online FS via sparse projection Evaluation Metrics Classification accuracy Training time cost 17

Empirical Evaluation 18

Empirical Evaluation mrmr: mutual information criteria of maxdependency, max-relevance, and minredundancy (Peng et al TPAMI 05, cited over 2000+) 19

Scalability on Ultra-high Dimensional Data D - # dimensions N - # instances F - # non-zero features S - binary file size D = 10K N = 100K F = 30 Million S = 800MB D = 1 Billion N = 1 Million F = 1 Billion S = 13 GB (estimated) 20

Online Collaborative Filtering Collaborative Filtering (CF) It uses known preferences of other users to make prediction to the unknown preferences of other users. Challenges of living analytics Extremely sparse data Data arrive sequentially Batch CF algorithms have some critical limitations (e.g., high retraining cost) 21

Online Collaborative Filtering CF: from batch to online learning The learning process works sequentially for dealing with new rating data instances on the fly Make the recommender system evolve over time Existing approach for OCF Matrix factorization for dealing with sparse data First order algorithms, e.g., online gradient descent But suffer from slow convergence rate We propose second order online CF method Confidence weighted online collaborative filtering (CWOCF) algorithms (Jing, Wang, Hoi, ACML 13) Online Multi-Task Collaborative Filtering algorithms (Wang et al RecSys 13) 22

CWOCF: formulation Main objective function: Learn U and V from partly observed ratings R: Loss functions: 23

CWOCF: online update Assuming: Online learning w.r.t. each received rating: Online Update (w.r.t. RMSE): 24

Empirical Evaluation Compared algorithms: Datesets: 25

Empirical Evaluation 26

Online Multiple Kernel Learning Motivation Variety is a key challenge for living/big data analytics Traditional methods assume data in vector space Real objects often have diverse representations Multiple Kernel Representation Each kernel represents one similarity function Pyramid matching kernels (vision, multimedia) Graph kernels (bio, web/social, etc) Sequence kernels (speech, video, bio, etc) Tree kernels (NLP, etc) 27

Multiple Kernel Learning (MKL) What is Multiple Kernel Learning (MKL) (Lanckriet et al JMLRl04) Kernel method by an optimal combination of multiple kernels Batch MKL Formulation Hard to solve the convex-concave optimization for big data! Can we avoid solving the batch optimization directly? 28

Online MKL (Hoi et al., ML 13) Objective Aims to learn a kernel-based predictor with multiple kernels from a sequence of (multi-modal) data examples Avoid the need of solving complicated optimizations Main idea: a two-step online learning At each iteration, if there is a mistake: Step 1: Online learning with each single kernel Kernel Perceptron (Rosenblatt Frank, 1958, Freund 1999) Step 2: Online update the combination weights Hedge algorithm (Freund and Schapire COLT95) 29

Online MKL for Classification Comparisons Perceptron(u) Perceptron(* best) OM-2(MKL variant) OMKC 30

Online MKL for Multimedia Retrieval Online Multi-Kernel Similarity Learning (Xia et al TPAMI 14) Aim to learn multi-kernel similarity for multimedia retrieval Color Side Info Stream Texture OMKS Contentbased Multimedia Retrieval Local pattern (BoW) 31

Multi-modal Image Retrieval Query OASIS(*) OKS(*) OMKS-U OMKS OASIS(*) OKS(*) OMKS-U OMKS 32

Conclusion Introduction of emerging opportunities and challenges when machine learning meets big data Introduction of online learning, a promising family of machine learning techniques for living analytics with big data Present three online learning techniques to address different real-world challenges of living data analytics tasks 33

Take-Home Message Online learning is promising for living/big data analytics More challenges and opportunities ahead: More effective online learning algorithms Handle more real-world challenges, e.g., sparsity, highdimensionality, concept drifting, noise, etc. Scale up for mining billions of instances using distributed computing (e.g., Hadoop) & parallel programming (e.g., GPU) LIBOL: An open-source Library of Online Learning Algorithms http://libol.stevenhoi.org Also available at JMLR MLOSS 34