Online Learning Methods for Big Data Analytics

Size: px
Start display at page:

Download "Online Learning Methods for Big Data Analytics"

Transcription

1 Online Learning Methods for Big Data Analytics Steven C.H. Hoi *, Peilin Zhao + * Singapore Management University + Institute for Infocomm Research, A*STAR 17 Dec

2 Agenda PART I: Introduction Big Data: Opportunities & Challenges Online Learning: What and Why Online Learning Applications Overview of OL Methods PART II: Online Learning Methods Traditional Linear OL Algorithms Non-traditional OL Algorithms Kernel-based OL Algorithms Discussions and Open Issues Summary and Take-Home Messages 17/12/2014 Online Learning (Hoi & Zhao) 2

3 Data Science Last few hundred years Thousand years Experiment Theory Last few decades Computation Datadriven Big Data Era Jim /09/2014 Online Learning (Hoi & Zhao) 3

4 Google Trends Big Data: Popularity Big Hype or Big Hope 17/12/2014 Online Learning (Hoi & Zhao) 4

5 What is Big Data "Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." --- Wikipedia Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. ---McKinsey Global Institute 11 Big data refers to data that is too large, complex and dynamic for any conventional data tools to capture, store, manage and analyze. --- WIPRO 17/12/2014 Online Learning (Hoi & Zhao) 5

6 Characteristics of Big Data Gartner Analyst, Doug Laney, published a research paper (2001) titled 3D Data Management: Controlling Data Volume, Velocity, and Variety. Even today, the 3Vs are generally-accepted dimensions of big data. Volume Velocity Variety 17/12/2014 Online Learning (Hoi & Zhao) 6

7 Big Data: Volume 1.28 billion Users (Mar 2014) Source: 17/12/2014 Online Learning (Hoi & Zhao) 7

8 Big Data: Velocity 9B photos/ month 3.5M images per day 150M images /month 250 years of video/day Source: 17/12/2014 Online Learning (Hoi & Zhao) 8

9 Source: 17/12/2014 Online Learning (Hoi & Zhao) 9

10 Big Data: Variety Data Types -Structured -Semi-structured -Unstructured Data Sources -Machine-Machine -Human-machine -Human-Human Source: 17/12/2014 Online Learning (Hoi & Zhao) 10

11 Big Data: Big Value Source from McKinsey 17/12/2014 Online Learning (Hoi & Zhao) 11

12 Big Data Analytics: Opportunities 17/12/2014 Online Learning (Hoi & Zhao) 12

13 Infrastructure Analytics Applications DATA 17/12/2014 Online Learning (Hoi & Zhao) 13

14 Big Data Analytics: Challenges Adaptability Be able to adapt complex and fast-changing environment to deal with diverse data and evolving concepts Scalability Be able to scale up to handle explosively increasing data (e.g., real-time stream data) Online Learning Efficiency Handle vast volume of data (million or even billion) with limited computing capacity (CPU/RAM/DISK) 17/12/2014 Online Learning (Hoi & Zhao) 14

15 What is Online Learning? Batch/Offline Learning Online Learning Feedback Learner Update Predictor 17/12/2014 Online Learning (Hoi & Zhao) 15

16 Online Prediction Task For t=1, 2,, T Receive an instance Predict its class label Receive the true class label Suffer loss Update the prediction model Goal: To minimize the total loss suffered: 17/12/2014 Online Learning (Hoi & Zhao) 16

17 Regret Analysis Denote by the optimal hypothesis from H --- the class of linear classifiers The regret of an online learning algorithm We want the regret to be small and bounded Guarantee to perform nearly as well as the one who observes the entire sequence and choose the best prediction strategy in hindsight 17/12/2014 Online Learning (Hoi & Zhao) 17

18 Online-to-Batch Conversions Online learning algorithms for batch learning An online algorithm that attains low regret can be converted into a batch learning algorithm that attains low risk Theoretical guarantee From Regret Bounds to Risk Bounds (i.i.d. from unknown Q) Various conversion techniques Last hypothesis, averaging, validation, etc 17/12/2014 Online Learning (Hoi & Zhao) 18

19 Why Online Learning? Avoid re-training when adding new data High efficiency Excellent scalability Strong adaptability to changing environments Simple to understand Trivial to implement Easy to be parallelized Theoretical guarantee Online-to-Batch Conversions 17/12/2014 Online Learning (Hoi & Zhao) 19

20 Online Learning: Applications Finance Social Media Computer Vision Online Learning Multimedia Search Cyber Security Recommender Systems 17/12/2014 Online Learning (Hoi & Zhao) 20

21 Online Learning: Applications Online Learning Multimedia Search 17/12/2014 Online Learning (Hoi & Zhao) 21

22 Online Learning for Multimedia Search Web-scale Content-based Multimedia Retrieval I want to buy a pink Gucci purse pattern I want to watch Obama s Gangnam Style visual color audio Image Retrieval Video Search 17/12/2014 Online Learning (Hoi & Zhao) 22

23 Online Learning for Multimedia Search Web-scale Content-based Multimedia Retrieval Interactive Search with online relevance feedback Challenges: Learn effective hypothesis for user search need, and Identify informative examples for soliciting user feedback Solution: online active learning algorithms 17/12/2014 Online Learning (Hoi & Zhao) 23

24 Online Learning for Multimedia Search Collaborative Multimedia Retrieval from Big Data Mining massive-scale side information Relevance feedback logs User click through data in search engines Multimodal data Solutions: Online distance metric learning Online kernel similarity learning Online multimodal similarity learning Applications For improving similarity searching quality in CBIR For improving indexing efficacy in CBIR 17/12/2014 Online Learning (Hoi & Zhao) 24

25 Online Learning: Applications Online Learning Cyber Security 17/12/2014 Online Learning (Hoi & Zhao) 25

26 Online Learning for Cyber Security Online Anomaly Detection (outlier/intrusion/fraud) Examples Fraud credit card transactions Malicious web/ spam filtering Network intrusion detection systems 17/12/2014 Online Learning (Hoi & Zhao) 26

27 Online Learning for Cyber Security Challenges Handle real-time data and has to response instantly Highly class imbalance (#anomalies << # normal) Different misclassification costs Labeling cost could be expensive Anomaly concepts/patterns often evolve over time Solutions Cost-Sensitive Online Learning Online Active Learning Cost-Sensitive Online Active Learning 17/12/2014 Online Learning (Hoi & Zhao) 27

28 Online Learning: Applications Online Learning Recommender Systems 17/12/2014 Online Learning (Hoi & Zhao) 28

29 Online Learning for Recommendation Online Recommender Systems 17/12/2014 Online Learning (Hoi & Zhao) 29

30 Online Learning for Recommendation Challenges Data (user rating) arriving sequentially and rapidly Data (user rating matrix) is extremely sparse User preferences could evolve over time Traditional Approaches Collaborative filtering or content-based techniques Batch learning approaches Suffer from poor re-training with new data Fail to adapt for fast-changing environment Solutions Online Collaborative Filtering/Matrix Factorization Sparse Online Learning for High-dimensional data streams 17/12/2014 Online Learning (Hoi & Zhao) 30

31 Online Learning: Applications Computer Vision Online Learning 17/12/2014 Online Learning (Hoi & Zhao) 31

32 Online Learning for Computer Vision Video Surveillance using Online Learning Visual object tracking from video streams Detect anomalous objects/events from video streams Challenges: Velocity: real-time processing of video streams Lack of feedback: have to assume weak labels Concept drifting (Basharat et al. 2008) 17/12/2014 Online Learning (Hoi & Zhao) 32

33 Online Learning for Computer Vision Large-scale Image Classification / Search The bag-of-visual words (BoW) representation is often not optimal Online learning can be used to optimize the BoW representation Challenges: High dimensionality Massive training data 17/12/2014 Online Learning (Hoi & Zhao) 33

34 Online Learning: Applications Social Media Online Learning 17/12/2014 Online Learning (Hoi & Zhao) 34

35 Online Learning for Social Media Online learning for mining social media streams for business intelligence applications Sentiment classification Public emotion analytics Product sentiment detection Track brand sentiments 17/12/2014 Online Learning (Hoi & Zhao) 35

36 Online Learning for Social Media Microblogging Emotion Prediction Limited training data for each person Combining all data may not fit each individual Collaborative Online Learning (Li et al. 2010) 17/12/2014 Online Learning (Hoi & Zhao) 36

37 Online Learning for Social Media Mining Social Images for Auto Photo Tagging Online learning is used to optimize distance metric for search-based annotation by mining vast social images Hawk Bird Sky Eagle Sun Bird Sky Blue Bird Fly White Cloud Sun Cloud Hawk Fly 17/12/2014 Online Learning (Hoi & Zhao) 37

38 Online Learning: Applications Finance Online Learning 17/12/2014 Online Learning (Hoi & Zhao) 38

39 Online Learning for Finance On-line Portfolio Selection Goal : To make sequential trading decisions of investing wealth over a collection of assets Figure from Challenge Real-time data arrive sequentially while the decision has to be made immediately (e.g. high-frequent trading) 17/12/2014 Online Learning (Hoi & Zhao) 39

40 Online Learning for Finance On-line Portfolio Selection Solution Online learning algorithms to optimize strategies Exploit the mean reversion principle Empirical Results NYSE dataset 36 stocks 22 years, daily data Invest $1 on 1st day Baseline: Market(Buy-And-Hold) ~15 times return Recent OLPS studies (Li et al, ICML 12, ML 13, CSUR 14, etc) 17/12/2014 Online Learning (Hoi & Zhao) 40

41 Agenda PART I: Introduction Big Data: Opportunities & Challenges Online Learning: What and Why Online Learning Applications Overview of Online Learning Methods PART II: Online Learning Methods Traditional Linear OL Algorithms Non-traditional OL Algorithms Kernel-based OL Algorithms Discussions and Open Issues Summary and Take-Home Messages 17/12/2014 Online Learning (Hoi & Zhao) 41

42 Online Learning: Overview Online Learning Online Learning with Full Feedback Online Learning with Partial Feedback Bandit problems Reinforcement learning Online Learning without Feedback Unsupervised learning from stream data (e.g., online clustering) Not covered in this tutorial Bandits ACML12 Tutorial: ICML Tutorial: Prediction, Learning, and Games (Nicolo Cesa-Bianchi & Gabor Lugosi) Reinforcement learning /12/2014 Online Learning (Hoi & Zhao) 42

43 Online Learning: Overview First order OL Second order OL Sparse OL OL w/ Expert Advice Linear Methods Traditional Non- Traditional Online AUC Max. Cost-Sensitive OL Online Transfer Learning Online Distance Metric Learning Online Collaborative Filtering Single Kernel Classification Regression Ranking Multiple Kernels Kernel OL DUOL Budget OL Non-Linear Methods Online MKL Online MKC Online MKS 17/12/2014 Online Learning (Hoi & Zhao) 43

44 Agenda PART I: Introduction Big Data: Opportunities & Challenges Online Learning: What and Why Online Learning Applications Overview of Online Learning Methods PART II: Online Learning Methods Traditional Linear OL Algorithms Non-traditional OL Algorithms Kernel-based OL Algorithms Discussions and Further Topics Summary and Take-Home Messages 17/12/2014 Online Learning (Hoi & Zhao) 44

45 Notation 17/12/2014 Online Learning (Hoi & Zhao) 45

46 Online Learning: Overview Linear Methods Traditional Non- Traditional Single Kernel Multiple Kernels Non-Linear Methods 17/12/2014 Online Learning (Hoi & Zhao) 46

47 Online Learning: Classification Setting For t=1, 2,, T Receive an instance Predict its class label Receive the true class label Suffer loss Update the classification model 17/12/2014 Online Learning (Hoi & Zhao) 47

48 Objective Minimize the total loss Loss function Zero-One loss: Hinge loss: 17/12/2014 Online Learning (Hoi & Zhao) 48

49 Loss Functions Hinge Loss Zero-One Loss /12/2014 Online Learning (Hoi & Zhao) 49

50 Linear Classifiers Restrict our discussion to linear classifier Prediction: Confidence: 17/12/2014 Online Learning (Hoi & Zhao) 50

51 Update Rules Online algorithms are based on an update rule which defines from (and possibly other information) Linear Classifiers : find from based on the input 17/12/2014 Online Learning (Hoi & Zhao) 51

52 Algorithms for Update Rules First-Order Algorithms Perceptron (Rosenblatt, Frank, 1958) Online Gradient Descent (Zinkevich et al., 2003) Passive Aggressive learning (Crammer et al., 2006) MIRA: Margin Infused Relaxed Algorithm (Crammer and Singer, 2003) NORMA: Naive Online R-reg Minimization Algorithm (Kivinen et al., 2002) ROMMA: Relaxed Online Maximum Margin Algorithm (Li and Long, 2002) ALMA: A New Approximate Maximal Margin Classification Algorithm (Gentile, 2001) Second-Order Algorithms SOP: Second order Perceptron (Cesa-Bianchi et al, 2005) CW: Confidence Weighted learning (Dredze et al, 2008) AROW: Adaptive Regularization of Weights (Crammer, 2009) SCW: Soft Confidence Weighted learning (Wang et al, 2012) Sparse Online Learning Algorithms 17/12/2014 Online Learning (Hoi & Zhao) 52

53 Perceptron Algorithm (Rosenblatt Frank, 1958) w 1 w 3 + w 2-17/12/2014 Online Learning (Hoi & Zhao) 53

54 Aggressive Perceptron Initialize For t=1, 2, T Receive an instance Predict its class label Receive the true class label If then Aggressive: updates the classifier whenever loss is non-zero (even if it classifies correctly) is the learning rate 17/12/2014 Online Learning (Hoi & Zhao) 54

55 Online Gradient Descent Online Convex Optimization (Zinkevich et al., 2003) Consider a convex objective function where is a bounded convex set The update by Online Gradient Descent (OGD) or Stochastic Gradient Descent (SGD): where is called the learning rate 17/12/2014 Online Learning (Hoi & Zhao) 55

56 Online Gradient Descent (OGD) algorithm Repeat from t=1,2, An unlabeled example arrives Make a prediction based on existing weights Observe the true class label Update the weights by the OGD rule: where is a learning rate 17/12/2014 Online Learning (Hoi & Zhao) 56

57 Passive Aggressive Online Learning Passive Aggressive learning (Crammer et al., 2006) PA PA-I PA-II 17/12/2014 Online Learning (Hoi & Zhao) 57

58 Passive Aggressive Online Learning Closed-form solutions can be derived: 17/12/2014 Online Learning (Hoi & Zhao) 58

59 Traditional Linear Online Learning (cont ) First-Order methods Learn a linear weight vector (first order) of model Pros and Cons Simple and easy to implement Efficient and scalable for high-dimensional data Relatively slow convergence rate 17/12/2014 Online Learning (Hoi & Zhao) 59

60 Second Order Online Learning methods Key idea Update the weight vector w by maintaining and exploring second order information in addition to the first order information Some representative methods SOP: Second order Perceptron (Cesa-Bianchi et al, 2005) CW: Confidence Weighted learning (Dredze et al, 2008) AROW: Adaptive Regularization of Weights (Crammer, 2009) SCW: Soft Confidence Weighted (SCW) (Wang et al, 2012) Others (but not limited) IELLIP:Online Learning by Ellipsoid Method (Yang et al., 2009) NHERD: Gaussian Herding (Crammer & Lee 2010) NAROW: New variant of AROW algorithm (Orabona & Crammer 2010) 17/12/2014 Online Learning (Hoi & Zhao) 60

61 SOP: Second Order Perceptron SOP: Second order Perceptron (Cesa-Bianch et al. 2005) Whiten Perceptron (Not incremental!!) Correlation matrix Simply run a standard Perceptron for the following Online algorithm (an incremental variant of Whiten Perceptron) Augmented matrix: Correlation matrix: 17/12/2014 Online Learning (Hoi & Zhao) 61

62 SOP: Second Order Perceptron SOP: Second order Perceptron (Cesa-Bianch et al. 2005) 17/12/2014 Online Learning (Hoi & Zhao) 62

63 CW: Confidence Weighted learning CW: Confidence Weighted learning (Dredze et al. 2008) Draw a parameter vector The margin is viewed as a random variable: The probability of a correct prediction is Optimization of CW 17/12/2014 Online Learning (Hoi & Zhao) 63

64 CW: Confidence Weighted learning can be written as is the cumulative function of the normal distribution. Lemma 1: The optimal value of the Lagrange multiplier is given by 17/12/2014 Online Learning (Hoi & Zhao) 64

65 AROW: Adaptive Regularization of Weights AROW (Crammer et al. 2009) Extension of CW learning Key properties: large margin training, confidence weighting, and the capacity to handle non-separable data Formulations 17/12/2014 Online Learning (Hoi & Zhao) 65

66 AROW: Adaptive Regularization of Weights AROW algorithm (Crammer et al. 2009) 17/12/2014 Online Learning (Hoi & Zhao) 66

67 SCW: Soft Confidence Weighted learning SCW (Wang et al. 2012) Four salient properties Large margin, Non-separable, Confidence weighted (2nd order), Adaptive margin Formulation SCW-I SCW-II 17/12/2014 Online Learning (Hoi & Zhao) 67

68 SCW: Soft Confidence Weighted learning SCW Algorithms 17/12/2014 Online Learning (Hoi & Zhao) 68

69 Traditional Linear Online Learning (cont ) Second-Order Methods Learn both first order and second order info Pros and Cons Faster convergence rate Expensive for high-dimensional data Relatively sensitive to noise 17/12/2014 Online Learning (Hoi & Zhao) 69

70 Traditional Linear Online Learning (cont ) Empirical Results (Wang et al., ICML 12) Online Mistake Rate Online Time Cost 17/12/2014 Online Learning (Hoi & Zhao) 70

71 Sparse Online Learning Motivation How to induce Sparsity in the weights of online learning algorithms for high-dimensional data Space constraints (memory overflow) Test-time constraints (test computational cost) Some existing work Truncated gradient (Langford et al., 2009) FOBOS: Forward Looking Subgradients (Duchi and Singer 2009) Dual averaging (Xiao, 2010) etc. 17/12/2014 Online Learning (Hoi & Zhao) 71

72 Truncated gradient (Langford et al., 2009) Main Idea Truncated gradient: impose sparsity by modifying the stochastic gradient descent Stochastic Gradient Descent Simple Coefficient Rounding 17/12/2014 Online Learning (Hoi & Zhao) 72

73 Truncated gradient (Langford et al., 2009) Simple Coefficient Rounding vs. Less aggregative truncation Illustration of the two truncation functions, T0 and T1 17/12/2014 Online Learning (Hoi & Zhao) 73

74 Truncated gradient (Langford et al., 2009) The amount of shrinkage is measured by a gravity parameter The truncation can be performed every K online steps When the update rule is identical to the standard SGD Loss Functions: Logistic SVM (hinge) Least Square 17/12/2014 Online Learning (Hoi & Zhao) 74

75 FOBOS (Duchi and Singer 2009) Forward-Backward Splitting 17/12/2014 Online Learning (Hoi & Zhao) 75

76 The Fobos Algorithm Repeat I. Unconstrained (stochastic sub) gradient of loss II. Incorporate regularization Similar to Forward-backward splitting (Lions and Mercier 79), Composite gradient methods (Wright et al. 09, Nesterov 07), Dual averaging with regularization (Xiao 09). 17/12/2014 Online Learning (Hoi & Zhao) 76

77 Fobos: Step I Unconstrained (stochastic sub) gradient of loss 17/12/2014 Online Learning (Hoi & Zhao) 77

78 Fobos: Step II Incorporate regularization 17/12/2014 Online Learning (Hoi & Zhao) 78

79 Fobos with Step II: Separable: Coordinate-wise update yields sparsity: Similar to Truncated gradient (Langford et al. 08), Iterative shrinkage and Thresholding (Donoho 95, Daubechies et al. 04) 17/12/2014 Online Learning (Hoi & Zhao) 79

80 Forward Looking Property 17/12/2014 Online Learning (Hoi & Zhao) 80

81 Dual Averaging (Xiao, 2010) Goal: The regularized stochastic learning SGD lacks capability in exploiting problem structure and often suffers from large variations Extensions of Nesterov s dual averaging Method The Regularized Dual Averaging (RDA) Strongly Convex function 17/12/2014 Online Learning (Hoi & Zhao) 81

82 The RDA Algorithm Step 1: compute a subgradient Step 2: Update average subgradient: Step 3: Compute the next weight vector: Closed-form solutions 17/12/2014 Online Learning (Hoi & Zhao) 82

83 Comparison of Sparse OL Algorithms FOBOS RDA Subgradient Local Bregman divergence Average Subgradient Global proximal function Coefficient Equivalent to TG when Coefficient Uses a much more aggressive truncation threshold 17/12/2014 Online Learning (Hoi & Zhao) 83

84 Comparisons Comparison of TG with other baselines 17/12/2014 Online Learning (Hoi & Zhao) 84

85 Comparisons 17/12/2014 Online Learning (Hoi & Zhao) 85

86 Variants of Sparse Online Learning Online Feature Selection (OFS) A variant of Sparse Online Learning The key difference is that OFS focuses on selecting a fixed subset of features in online learning process Could be used as an alternative tool for batch feature selection when dealing with big data Existing Work Online Feature Selection (Hoi et al, 2012) proposed an OFS scheme by exploring the Sparse Projection to choose a fixed set of active features in online learning 17/12/2014 Online Learning (Hoi & Zhao) 86

87 Online Learning with Expert Advice Learning to combine the predictions from multiple experts (classifiers) An ensemble of d experts: Combination weights: Combined classifier 17/12/2014 Online Learning (Hoi & Zhao) 87

88 Hedge Algorithm (Freund & Schapire '97) Assume there exists some best expert What is the learning strategy that can perform as well as the best expert? Consider it as an on-line allocation task /12/2014 Online Learning (Hoi & Zhao) 88

89 Hedge Algorithm Initialize For t=1, 2, T Receive a training example Prediction If then For i=1, 2,, d If then 17/12/2014 Online Learning (Hoi & Zhao) 89

90 Hedge Algorithm Regret Bounds Denote as the loss of the i-th expert By choosing appropriate No-regret algorithm! Perform as well as the best expert! 17/12/2014 Online Learning (Hoi & Zhao) 90

91 Summary of Traditional Linear OL Pros Efficient for computation & memory Extremely scalable Theoretical bounds on the mistake rate Cons Learn Linear prediction models Optimize the mistake rate only 17/12/2014 Online Learning (Hoi & Zhao) 91

92 Online Learning: Overview Linear Methods Traditional Non- Traditional Single Kernel Multiple Kernels Non-Linear Methods 17/12/2014 Online Learning (Hoi & Zhao) 92

93 Non-Traditional Linear OL Online AUC Maximization Cost-Sensitive Online Learning Online Transfer Learning Online Distance Metric Learning Online Collaborative Filtering 17/12/2014 Online Learning (Hoi & Zhao) 93

94 Online AUC Maximization Motivation The mistake rate (or classification accuracy) measure could be misleading for many real-world applications Example: Consider a set of 10,000 instances with only 10 positive and 9,990 negative. A naïve classifier that simply declares every instance as negative has 99.9% accuracy. Many applications (e.g., anomaly detection) often adopt other metrics, e.g., AUC (area under the ROC curve). Can online learning directly optimize AUC? 17/12/2014 Online Learning (Hoi & Zhao) 94

95 Online AUC Maximization What is AUC? AUC (Area Under the ROC Curve) ROC (Receiver Operating Characteristic) curve details the rate of True Positives (TP) against False Positives (FP) over the range of possible thresholds. AUC measures the probability for a randomly drawn positive instance to have a higher decision value than a randomly sampled negative instance ROC was first used in World War II for the analysis of radar signals. 17/12/2014 Online Learning (Hoi & Zhao) 95

96 Online AUC Maximization Motivation To develop an online learning algorithm for training an online classifier to maximize the AUC metric instead of mistake rate/accuracy Online AUC Maximization (Zhao et al., ICML 11) Key Challenge In math, AUC is expressed as a sum of pairwise losses between instances from different classes, which is quadratic in the number of received training examples Hard to directly solve the AUC optimization efficiently 17/12/2014 Online Learning (Hoi & Zhao) 96

97 Formulation A data set Positive instances Negative instances Given a classifier w, its AUC on the dataset D: 17/12/2014 Online Learning (Hoi & Zhao) 97

98 Formulation (cont ) Replace the indicator function I with its convex surrogate, i.e., the hinge loss function Find the optimal classifier w by minimizing It is not difficult to show that (1) 17/12/2014 Online Learning (Hoi & Zhao) 98

99 Formulation (cont ) Re-writing objective function (1) into: In online learning task, given ( do online update: x, y t t ), we may The loss function is related to all received examples. Have to store all the received training examples!! 17/12/2014 Online Learning (Hoi & Zhao) 99

100 Main Idea of OAM Cache a small number of received examples; B B Two buffers of fixed size, and, to cache the t t positive and negative instances; x t Reservoir sampling y t Predictor f ( x t ) Sequential or Gradient Update buffer y 1 y 1 t t Update buffer B t update classifier Buffer + Buffer B t Flow of the proposed online AUC maximization process 17/12/2014 Online Learning (Hoi & Zhao) 100

101 OAM Framework 17/12/2014 Online Learning (Hoi & Zhao) 101

102 Update Buffer Reservoir Sampling (J. S. Vitter, 1985) A family of classical sampling algorithms for randomly choosing k samples from a data set with n items, where n is either a very large or unknown number. In general, it takes a random sample set of the desired size in only one pass over the underlying dataset. The UpdateBuffer algorithm is simple and very efficient: 17/12/2014 Online Learning (Hoi & Zhao) 102

103 Update Classifier Algorithm 1: Sequential update by PA: Follow the idea of Passive aggressive learning (Crammer et al. 06) For each x in buffer B, update the classifier: Algorithm 2: Gradient-based update Follow the idea of online gradient descent For each x in buffer B, update the classifier: 17/12/2014 Online Learning (Hoi & Zhao) 103

104 Empirical Results of OAM Comparisons Traditional algorithms: Perceptron, PA, Cost-sensitive PA (CPA), CW The proposed OAM algorithms: (i) OAM-seq, OAM-gra, (ii) OAM-inf (infinite buffer size) Evaluation of AUC for Classification tasks 17/12/2014 Online Learning (Hoi & Zhao) 104

105 Other Related Work Online AUC Maximization problem is a special case for Online Learning with Pairwise Loss Functions On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions (ICML 13) Wang, Yuyang, et al. "Online Learning with Pairwise Loss Functions." arxiv preprint arxiv: (2013). 17/12/2014 Online Learning (Hoi & Zhao) 105

106 Cost-Sensitive Online Learning Motivation Beyond optimizing the mistake rate or accuracy Attempt to optimize the cost-sensitive measures Sum Cost Existing Work Cost-sensitive Online Gradient Descent (Wang et al. 2012) Cost-Sensitive Double Updating Online Learning (Zhao et al. 2013) 17/12/2014 Online Learning (Hoi & Zhao) 106

107 Cost-Sensitive Online Gradient Descent CSOGD: Cost-Sensitive Online Gradient Descent Formulate the cost-sensitive objective functions where for optimizing sum or for cost Update by Online Gradient Descent 17/12/2014 Online Learning (Hoi & Zhao) 107

108 Cost-Sensitive Online Gradient Descent 17/12/2014 Online Learning (Hoi & Zhao) 108

109 Cost-Sensitive Online Active Learning Motivation Feedback is not always available Labeling cost could be expensive Learning to Detect Malicious URLs (Zhao et al. KDD 13) 17/12/2014 Online Learning (Hoi & Zhao) 109

110 Cost-Sensitive Online Active Learning Main idea Combining both Cost-Sensitive OL and Active Learning Acquire label only when necessary Query Probability Empirical Results CSOAL saves almost 99% amount of labeling cost by achieving similar performance 17/12/2014 Online Learning (Hoi & Zhao) 110

111 Online Transfer Learning Transfer learning (TL) Extract knowledge from one or more source tasks and then apply them to solve target tasks Three ways which transfer might improve learning Two Types of TL tasks Homogeneous vs Heterogeneous TL 17/12/2014 Online Learning (Hoi & Zhao) 111

112 Online Transfer Learning (Zhao and Hoi 2011) Online Transfer learning (OTL) Assume training data for target domain arrives sequentially Assume a classifier was learnt from a source domain online algorithms for transferring knowledge from source domain to target domain Settings Old/source data space: New/target domain: A sequence of examples from new/target domain OTL on Homogeneous domains OTL on heterogeneous domains 17/12/2014 Online Learning (Hoi & Zhao) 112

113 Online Transfer Learning (Zhao and Hoi 2011) OTL on Homogeneous domains Key Ideas: explore ensemble learning by combining both source and target classifiers Update rules using any existing OL algorithms (e.g., PA) 17/12/2014 Online Learning (Hoi & Zhao) 113

114 Online Transfer Learning (Zhao and Hoi 2011) OTL on Heterogeneous domains Assumption: not completely different Each instance in target domain can be split into two views: The key idea is to use a co-regularization principle for online optimizing two classifiers Prediction can be made by 17/12/2014 Online Learning (Hoi & Zhao) 114

115 Online Transfer Learning (Zhao and Hoi 2011) Heterogeneous OTL algorithm Applications: online learning with concept drifting 17/12/2014 Online Learning (Hoi & Zhao) 115

116 Online Distance Metric Learning Distance Metric Learning (DML) has many applications in multimedia, especially for content-based image retrieval and indexing, data clustering, etc. Objective Instead of learning the classification model, The goal of DML is to learn a Mahalanobis distance function where A is a d*d positive definite matrix for the distance metric, i.e., 17/12/2014 Online Learning (Hoi & Zhao) 116

117 Online Distance Metric Learning Two Types of Data (a.k.a. Side Information ) Pairwise Instances (a.k.a. Pairwise constraints ) Triple Instances (a.k.a. Triplet constraints ) Data sources: relevance feedback in CBIR, query logs of search engines, social media, etc. 17/12/2014 Online Learning (Hoi & Zhao) 117

118 Online DML: Problem Setting For t=1, 2,, T Receive a pairwise instance Predict similarity label Receive the true label Suffer loss Update the distance metric 17/12/2014 Online Learning (Hoi & Zhao) 118

119 Online DML: Update Rules Regularized Online Learning Framework Loss Functions Hinge Loss Square Loss 17/12/2014 Online Learning (Hoi & Zhao) 119

120 Online DML: Regularizers Frobenius divergence Examples: Projecting matrix A onto positive semidefinite (PSD) cone Pseudo-metric Online Learning Algorithm (POLA) (Shalev-Shwartz et al, 2004) (Online) Regularized Metric Learning (Jin et al, 2010) 17/12/2014 Online Learning (Hoi & Zhao) 120

121 Online DML: Regularizers LogDet divergence Information-theoretic approach Examples: Info Theo. Metric Learning (ITML) (Davis et al. 2007) LogDet Exact Gradient Online (LEGO) (Jain et al., 2009) 17/12/2014 Online Learning (Hoi & Zhao) 121

122 Online Similarity Learning Consider a parametric similarity function in a bi-linear form The goal is to find S such that for all triplets For each triplet, define loss function as 17/12/2014 Online Learning (Hoi & Zhao) 122

123 Online Similarity Learning Online Algorithm for Scalable Image Similarity Learning (OASIS) (Chechik et al., 2010) Follow the principle of Passive-Aggressive Online Learning 17/12/2014 Online Learning (Hoi & Zhao) 123

124 Online Collaborative Filtering Collaborative Filtering Learn User-Item matrix to predict rating/ranking Simple in data collection Model-based vs Memory-based methods I1 I2 I3 I4 I5 I6 I7 R = u1 5? 2? 1? 4 u2? 4? 1? 3? u3 5? 1?? 3? Model-based Collaborative Filtering Matrix factorization is the most widely used 17/12/2014 Online Learning (Hoi & Zhao) 124

125 Online Collaborative Filtering Problem Setting A total of m users, and n items A sequence of observed ratings Matrix Factorization The goal is to find U and V by minimizing 17/12/2014 Online Learning (Hoi & Zhao) 125

126 Online Collaborative Filtering Online Gradient Descent (OGD) for Online CF For t=1,,t Receive a rating observation with respect to the a-th user and the b-th item Update U and V as follows: OGD may converge slowly 17/12/2014 Online Learning (Hoi & Zhao) 126

127 Online Collaborative Filtering Online Multi-Task Collaborative Filtering (Wang et al, 2013) Equivalence between MF-CF and Multi-task learning Instead of only updating one user (row) vector and one item (column) vector, OMTCF attempts to update multiple users (tasks) for each observation Using a task-interaction matrix to model the relationship between the tasks, and using this matrix to simultaneously update multiple models 17/12/2014 Online Learning (Hoi & Zhao) 127

128 Online Collaborative Filtering Online Multi-Task Collaborative Filtering (Wang et al, 2013) For t=1,,t Receive a rating observation with respect to the a-th user and the b-th item Update U and V as follows: 17/12/2014 Online Learning (Hoi & Zhao) 128

129 Online Collaborative Filtering Second-Order Online Collaborative Filtering (Lu et al, 2013) Attempt to exploit second order information Assume Following the idea of Confidence-weighted (CW) learning CWOCF: Online Update Rules (w.r.t. RMSE loss) 17/12/2014 Online Learning (Hoi & Zhao) 129

130 Online Collaborative Filtering Open Challenges Handle novel sample extension (e.g., new user added or new item added during learning process) Parallelization (OMTCF is easier than CWOCF) High dimensionality for second-order methods Handle concept drifting or preference evolving Handle cold start (e.g., combining content-based) 17/12/2014 Online Learning (Hoi & Zhao) 130

131 Online Learning: Overview Traditional Single Kernel Linear Methods Non- Traditional Multiple Kernels Non-Linear Methods 17/12/2014 Online Learning (Hoi & Zhao) 131

132 Kernel-based Online Learning Motivation Linear classifier is limited in certain situations Objective Learn a non-linear model for online classification tasks using the kernel trick 17/12/2014 Online Learning (Hoi & Zhao) 132

133 Kernel-based Online Learning Kernel Perceptron Related Work Double Updating Online Learning (Zhao et al, 2011) Others 17/12/2014 Online Learning (Hoi & Zhao) 133

134 Double Updating Online Learning (DUOL) Motivation When a new support vector (SV) is added, the weights of existing SVs remain unchanged (i.e., only the update is applied for a single SV ) How to update the weights of existing SVs in an efficient and effective approach Main idea Update the weight for one more existing SV in addition to the update of the new SV Challenge which existing SV should be updated and how to update? 17/12/2014 Online Learning (Hoi & Zhao) 134

135 Double Updating Online Learning (DUOL) Denote a new Support Vector as: Choose an auxiliary example Misclassified: Conflict most with new SV: Update the current hypothesis by from existing SVs: How to optimize the weights of the two SVs DUOL formulates the problem as a simple QP task of large margin optimization, and gives closed-form solutions. 17/12/2014 Online Learning (Hoi & Zhao) 135

136 Double Updating Online Learning (DUOL) 17/12/2014 Online Learning (Hoi & Zhao) 136

137 Kernel-based Online Learning Challenge The number of support vectors with the kernelbased classification model is often unbounded! Non-scalable and inefficient in practice!! Question Can we bound the number of support vectors? Solution Budget Online Learning 17/12/2014 Online Learning (Hoi & Zhao) 137

138 Budget Online Learning Problem Kernel-based Online Learning by bounding the number of support vectors for a given budget B Related Work in literature Randomized Budget Perceptron (Cavallanti et al.,2007) Forgetron (Dekel et al.,2005) Projectron (Orabona et al.,2008) Bounded Online Gradient Descent (Zhao et al 2012) Others 17/12/2014 Online Learning (Hoi & Zhao) 138

139 RBP: Randomized Budget Perceptron (Cavallanti et al.,2007) Idea: maintaining budget by means of randomization Repeat whenever there is a mistake at round t: If the number of SVs <= B, then apply Kernel Perceptron Otherwise randomly discard one existing support vector 17/12/2014 Online Learning (Hoi & Zhao) 139

140 Forgetron (Dekel et al.,2005) Step (1) - Perceptron t-1 t Step (2) Shrinking t-1 t t-1 t Step (3) Remove Oldest t-1 t 17/12/2014 Online Learning (Hoi & Zhao) 140

141 Projectron (Orabona et al., 2008) The new hypothesis is projected onto the space spanned by How to solve the projection? 17/12/2014 Online Learning (Hoi & Zhao) 141

142 Bounded Online Gradient Descent (Zhao et al 2012) Limitations of previous work Perceptron-based, heuristic or expensive update Motivation of BOGD Learn the kernel-based model using online gradient descent by constraining the SV size less than a predefined budget B Challenges How to efficiently maintain the budget? How to minimize the impact due to the budget maintenance? 17/12/2014 Online Learning (Hoi & Zhao) 142

143 Bounded Online Gradient Descent (Zhao et al 2012) Main idea of the BOGD algorithms A stochastic budget maintenance strategy to guarantee One existing SV will be discarded by multinomial sampling Unbiased estimation with only B SVs; Formulation Current hypothesis Construct an unbiased estimator (to ensure ) indicates the i-th SV is selected for removal 17/12/2014 Online Learning (Hoi & Zhao) 143

144 Empirical Results of BOGD Comparison Baseline: Forgetron, RBP, Projectron, Projectron++ Our algorithms: BOGD (uniform), BOGD++ (non-uniform) Evaluation of budget online learning algorithms Experimental result of varied budget sizes on the codrna data set (n=271617) 17/12/2014 Online Learning (Hoi & Zhao) 144

145 Budget Online Kernel Learning: Kernel Approximation approaches Motivations Most existing budget online kernel learning often adopts different budget maintenance strategies Idea of Kernel Approximation 17/12/2014 Online Learning (Hoi & Zhao) 145

146 Kernel Approximation Two Approaches (Wang et al, 2013) Kernel Function Approximation Kernel Matrix Approximation Kernel Function Approximation Fourier Online Gradient Descent Algorithm Approximate kernel functions by Random fourier features, and then apply online gradient descent 17/12/2014 Online Learning (Hoi & Zhao) 146

147 Kernel Approximation Kernel Matrix Approximation: applicable to any types of kernels Nystrom Online Gradient Descent (NOGD) Approximate a Kernel matrix using the Nystrom method, and the apply OGD Construct a small B*B kernel matrix 17/12/2014 Online Learning (Hoi & Zhao) 147

148 Empirical Evaluation Comparison to Batch Binary Classification 17/12/2014 Online Learning (Hoi & Zhao) 148

149 Summary A family of Budget Online Kernel Learning Pros Very efficient due to stochastic strategy Rather scalable State-of-the-art performance, theoretical guarantee Cons Predefined budget size (optimal budget size)? Only learn with a single kernel 17/12/2014 Online Learning (Hoi & Zhao) 149

150 Online Learning: Overview Linear Methods Traditional Non- Traditional Single Kernel Multiple Kernels Non-Linear Methods 17/12/2014 Online Learning (Hoi & Zhao) 150

151 Online Multiple Kernel Learning Motivation Variety is a key challenge for multimedia data analytics Traditional methods assume data in vector space Real objects often have diverse representations Multiple Kernel Representation Each kernel represents one similarity function Pyramid matching kernels (vision, multimedia) Graph kernels (bio, web/social, etc) Sequence kernels (speech, video, bio, etc) Tree kernels (NLP, etc) 17/12/2014 Online Learning (Hoi & Zhao) 151

152 Multiple Kernel Learning (MKL) What is Multiple Kernel Learning (MKL) (Lanckriet et al JMLRl04) Kernel method by an optimal combination of multiple kernels Batch MKL Formulation Hard to solve the convex-concave optimization for big data! Can we avoid solving the batch optimization directly? 17/12/2014 Online Learning (Hoi & Zhao) 152

153 Online MKL (Hoi et al., ML 13) Objective Aims to learn a kernel-based predictor with multiple kernels from a sequence of (multi-modal) data examples Avoid the need of solving complicated optimizations Main idea: a two-step online learning At each iteration, if there is a mistake: Step 1: Online learning with each single kernel Kernel Perceptron (Rosenblatt Frank, 1958, Freund 1999) Step 2: Online update the combination weights Hedge algorithm (Freund and Schapire COLT95) 17/12/2014 Online Learning (Hoi & Zhao) 153

154 Online Multiple Kernel Classification Deterministic Algorithm for OMKC 17/12/2014 Online Learning (Hoi & Zhao) 154

155 OMKC by Stochastic Combination To improve the efficiency of Algorithm 1 by selecting a subset of kernels for prediction. 17/12/2014 Online Learning (Hoi & Zhao) 155

156 OMKC by Stochastic Updating To improve the learning efficiency of Algorithm 1 by sampling a subset of kernel classifiers for updating, based on the weights assigned to kernel classifiers. 17/12/2014 Online Learning (Hoi & Zhao) 156

157 OMKC by Stochastic Updating & Stochastic Combination 17/12/2014 Online Learning (Hoi & Zhao) 157

158 Summary of OMKC Variants OMKC(D,D) is the most computationally intensive algorithm that updates and combines all the kernel classifiers at each iteration OMKC(S,S) is the most efficient algorithm that selectively updates and combines a subset of kernel classifiers at each iteration. Finally, OMKC(D,S) and OMKC(S,D) are the other two variants of OMKC algorithms in between these two extremes 17/12/2014 Online Learning (Hoi & Zhao) 158

159 Empirical Evaluation of OMKC We compare the four variants of OMKC algorithms for classification with the following baselines: Perceptron: the well-known Perceptron with a linear kernel (Rosenblatt 1958; Freund and Schapire 1999); Perceptron(u): another Perceptron baseline with an unbiased/uniform combination of all the kernels; Perceptron(*): online validation to search for the best kernel among the pool (using first 10% training data) and then apply the Perceptron with the best kernel; OM-2: a state-of-the-art online learning algorithm for MKL (Jie et al. 2010; Orabona et al. 2010) 17/12/2014 Online Learning (Hoi & Zhao) 159

160 Evaluation Result of OMKC 17/12/2014 Online Learning (Hoi & Zhao) 160

161 Evaluation Result of OMKC 17/12/2014 Online Learning (Hoi & Zhao) 161

162 Online MKL for Multimedia Retrieval Online Multi-Kernel Similarity Learning (Xia et al TPAMI 14) Aim to learn multi-kernel similarity for multimedia retrieval Color Side Info Stream Texture OMKS Contentbased Multimedia Retrieval Local pattern (BoW) 17/12/2014 Online Learning (Hoi & Zhao) 162

163 Kernel Similarity Learning Define Similarity function S as follows: where Formulating the optimization framework of kernel similarity learning as 17/12/2014 Online Learning (Hoi & Zhao) 163

164 Online Kernel Similarity Learning Consider an online learning setting, at each trial t, given triplet, we solve the following optimization: The optimal solution: 17/12/2014 Online Learning (Hoi & Zhao) 164

165 Online Multiple Kernel Similarity The multiple kernel similarity function: Optimization problem: 17/12/2014 Online Learning (Hoi & Zhao) 165

166 OMKS Algorithm Time complexity OKS: O(T SV ) OMKS: O(T SV m) 17/12/2014 Online Learning (Hoi & Zhao) 166

167 Multi-modal Image Retrieval Query OASIS(*) OKS(*) OMKS-U OMKS OASIS(*) OKS(*) OMKS-U OMKS 17/12/2014 Online Learning (Hoi & Zhao) 167

168 Summary of OMKL Pros Nonlinear models for tough applications Avoid solving complicated optimization directly Handle multi-modal data Theoretical guarantee Cons Scalability has to be further improved 17/12/2014 Online Learning (Hoi & Zhao) 168

169 Agenda PART I: Introduction Big Data: Opportunities & Challenges Online Learning: What and Why Online Learning Applications Overview of Online Learning Methods PART II: Online Learning Methods Traditional Linear OL Algorithms Non-traditional OL Algorithms Kernel-based OL Algorithms Discussions and Open Issues Summary and Take-Home Messages 17/12/2014 Online Learning (Hoi & Zhao) 169

170 Discussions: Notion Comparison Online Learning (OL) vs Incremental Learning (IL) Similarity Both learns in a sequential fashion Difference OL does not assume input knowledge (could be adversarial), while IL has complete input knowledge OL: single pass, IL: can do multiple passes OL: may not solve the identical batch learning problem, IL: solve the same problem, and often associated with decremental solutions 17/12/2014 Online Learning (Hoi & Zhao) 170

171 Discussions: Notion Comparison Online Learning (OL) vs Reinforcement Learning (RL) Similarity Both (bandit OL and RL) works in a sequential fashion with only partial feedback given to the learner Both (bandit OL and RL) attempts to trade off between exploitation and exploration Difference In machine learning, RF is often focused more Markov decision process (MDP) for learning policies, while OL can address general supervised learning tasks 17/12/2014 Online Learning (Hoi & Zhao) 171

172 Discussions: Notion Comparison Online Learning (OL) vs Active Learning (AL) Similarity Both learn repeatedly in a sequential manner Difference OL assumes feedback (either full or partial) can always be received passively, while AL has to actively solicit the feedback from the environment OL typically process a single example, while AL may request to label multiple example (a.k.a. batch mode active learning) 17/12/2014 Online Learning (Hoi & Zhao) 172

173 Open Issues Challenges of Big Data Volume Explosive growing data: from million to billion scales From a single machine to multiple machines in parallel Velocity Data arrives extremely fast From a normal scheme to a real-time solution Variety Heterogeneous data and diverse sources From centralized approach to distributed solutions 17/12/2014 Online Learning (Hoi & Zhao) 173

174 Open Issues Parallel & Distributed Online Learning Motivation Not making significant gains in serial computation speed Data no longer fit on a single machine. Major Issues Synchronization: we can't wait for the slowest machine. Communication: we can't transfer all information. Parallelism in Online Learning Split task across M machines, solve independently, combine These allow optimal linear speedups Asynchronous Computation Updating asynchronously saves a lot of time. Reduced Communication Parallel with single coordinator versus distributed decentralized 17/12/2014 Online Learning (Hoi & Zhao) 174

175 Open Issues Other Issues High-dimensionality Data sparsity Structural/semi-structural data Noise and incomplete data Concept drifting Domain adaption Incorporation of background knowledge Parallel & distributed computing User interaction Interactive OL vs Passive OL Human computation, crowdsourcing 17/12/2014 Online Learning (Hoi & Zhao) 175

176 Open Issues Applications of Big Data Analytics Web Rich Media Search and Mining Social Network and Social Media Speech Recognition & Mining (e.g., SIRI) Multimedia Information Retrieval Computer Vision and Multimedia Understanding Medical and Healthcare Informatics Financial Engineering (with multimodal data) etc 17/12/2014 Online Learning (Hoi & Zhao) 176

177 Conclusion Introduction of emerging opportunities and challenges for big data mining Introduction of online learning, widely applied for various real-word applications with big data mining Survey of classical and state-of-the-art online learning techniques Traditional Non- Traditional Single Kernel Multiple Kernels 17/12/2014 Online Learning (Hoi & Zhao) 177

178 Take-Home Message Online learning is promising for big data mining More challenges and opportunities ahead: More smart online learning algorithms Handle more real-world challenges, e.g., concept drifting, noise, sparse data, high-dimensional issues, etc. Scale up for mining billions of instances using distributed computing facilities & parallel programming (e.g., Hadoop) LIBOL: An open-source Library of Online Learning Algorithms 17/12/2014 Online Learning (Hoi & Zhao) 178

179 References Steven C.H. Hoi, Rong Jin, Tianbao Yang, Peilin Zhao, "Online Multiple Kernel Classification", Machine Learning (ML), Hao Xia, Pengcheng Wu, Steven C.H. Hoi, "Online Multi-modal Distance Learning for Scalable Multimedia Retrieval, ACM Intl. Conf. on Web Search and Data Mining (WSDM),2013. Peilin Zhao, Jialei Wang, Pengcheng Wu, Rong Jin, Steven C.H. Hoi, "Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning", The 29th International Conference on Machine Learning (ICML), June 26 - July 1, Jialei Wang, Steven C.H. Hoi, "Exact Soft Confidence-Weighted Learning ", The 29th International Conference on Machine Learning (ICML), June 26 - July 1, Bin Li, Steven C.H. Hoi, "On-line Portfolio Selection with Moving Average Reversion", The 29th International Conference on Machine Learning (ICML), June 26-July 1, Jialei Wang, Peilin Zhao, Steven C.H. Hoi, "Cost-Sensitive Online Classification, IEEE International Conference on Data Mining (ICDM), Bin Li, Peilin Zhao, Steven C.H. Hoi, V. Gopalkrishnan, "PAMR: Passive-Aggressive Mean Reversion Strategy for Portfolio Selection", Machine Learning, vol.87, no.2, pp , Steven C.H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin, Online Feature Selection for Big Data Mining, ACM SIGKDD Workshop on Big Data Mining (BigMine), Beijing, China, 2012 Peilin Zhao, Steven C.H. Hoi, Rong Jin, "Double Updating Online Learning", Journal of Machine Learning Research (JMLR), Peilin Zhao, Steven C.H. Hoi, Rong Jin, Tianbo Yang, "Online AUC Maximization" The 28th International Conference on Machine Learning (ICML), /12/2014 Online Learning (Hoi & Zhao) 179

180 References Duchi, John C., Hazan, Elad, and Singer, Yoram. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12: , Jin, R., Hoi, S. C. H. and Yang, T. Online multiple kernel learning: Algorithms and mistake bounds, ALT, pp , Peilin Zhao and Steven C.H. Hoi, "OTL: A Framework of Online Transfer Learning" The 27th International Conference on Machine Learning, Haifa, Israel, June, 2010 Crammer, Koby and D.Lee, Daniel. Learning via gaussian herding. In NIPS, pp , Koby Crammer, Alex Kulesza, and Mark Dredze. Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems (NIPS), M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. In ICML, pages , Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer. The forgetron: A kernel-based perceptron on a budget. SIAM J. Comput., 37(5): , ISSN Francesco Orabona, Joseph Keshet, and Barbara Caputo. The projectron: a bounded kernelbased perceptron. In ICML, pages , Crammer, Koby, Dredze, Mark, and Pereira, Fernando. Exact convex confidence-weighted learning. In NIPS, pp , Giovanni Cavallanti, Nicolo Cesa-Bianchi, and Claudio Gentile. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2-3): , N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, /12/2014 Online Learning (Hoi & Zhao) 180

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda

More information

Steven C.H. Hoi. Methods and Applications. School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013

Steven C.H. Hoi. Methods and Applications. School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013 Methods and Applications Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013 http://libol.stevenhoi.org/ Acknowledgements Peilin Zhao Jialei Wang Rong

More information

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

More information

Online Feature Selection for Mining Big Data

Online Feature Selection for Mining Big Data Online Feature Selection for Mining Big Data Steven C.H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin School of Computer Engineering, Nanyang Technological University, Singapore Department of Computer Science

More information

Online Learning for Big Data Analytics

Online Learning for Big Data Analytics Online Learning for Big Data Analytics Irwin King and Haiqin Yang Dept. of Computer Science and Engineering The Chinese University of Hong Kong 1 Outline Introduction Big data: definition and history Online

More information

Online Learning for Big Data Analytics

Online Learning for Big Data Analytics Online Learning for Big Data Analytics Irwin King, Michael R. Lyu and Haiqin Yang Department of Computer Science & Engineering The Chinese University of Hong Kong Tutorial presentation at IEEE Big Data,

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

BDUOL: Double Updating Online Learning on a Fixed Budget

BDUOL: Double Updating Online Learning on a Fixed Budget BDUOL: Double Updating Online Learning on a Fixed Budget Peilin Zhao and Steven C.H. Hoi School of Computer Engineering, Nanyang Technological University, Singapore E-mail: {zhao0106,chhoi}@ntu.edu.sg

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Challenges of Cloud Scale Natural Language Processing

Challenges of Cloud Scale Natural Language Processing Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent

More information

Jubatus: An Open Source Platform for Distributed Online Machine Learning

Jubatus: An Open Source Platform for Distributed Online Machine Learning Jubatus: An Open Source Platform for Distributed Online Machine Learning Shohei Hido Seiya Tokui Preferred Infrastructure Inc. Tokyo, Japan {hido, tokui}@preferred.jp Satoshi Oda NTT Software Innovation

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Large Scale Learning to Rank

Large Scale Learning to Rank Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

How To Train A Classifier With Active Learning In Spam Filtering

How To Train A Classifier With Active Learning In Spam Filtering Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods

More information

List of Publications by Claudio Gentile

List of Publications by Claudio Gentile List of Publications by Claudio Gentile Claudio Gentile DiSTA, University of Insubria, Italy claudio.gentile@uninsubria.it November 6, 2013 Abstract Contains the list of publications by Claudio Gentile,

More information

Sparse Online Learning via Truncated Gradient

Sparse Online Learning via Truncated Gradient Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahoo-inc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

Cost-Sensitive Online Active Learning with Application to Malicious URL Detection

Cost-Sensitive Online Active Learning with Application to Malicious URL Detection Cost-Sensitive Online Active Learning with Application to Malicious URL Detection ABSTRACT Peilin Zhao School of Computer Engineering Nanyang Technological University 50 Nanyang Avenue, Singapore 639798

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Karthik Sridharan. 424 Gates Hall Ithaca, E-mail: sridharan@cs.cornell.edu http://www.cs.cornell.edu/ sridharan/ Contact Information

Karthik Sridharan. 424 Gates Hall Ithaca, E-mail: sridharan@cs.cornell.edu http://www.cs.cornell.edu/ sridharan/ Contact Information Karthik Sridharan Contact Information 424 Gates Hall Ithaca, NY 14853-7501 USA E-mail: sridharan@cs.cornell.edu http://www.cs.cornell.edu/ sridharan/ Research Interests Machine Learning, Statistical Learning

More information

Online Kernel Selection: Algorithms and Evaluations

Online Kernel Selection: Algorithms and Evaluations Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Online Kernel Selection: Algorithms and Evaluations Tianbao Yang 1, Mehrdad Mahdavi 1, Rong Jin 1, Jinfeng Yi 1, Steven C. H.

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Online Passive-Aggressive Algorithms on a Budget

Online Passive-Aggressive Algorithms on a Budget Zhuang Wang Dept. of Computer and Information Sciences Temple University, USA zhuang@temple.edu Slobodan Vucetic Dept. of Computer and Information Sciences Temple University, USA vucetic@temple.edu Abstract

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

Online Classification on a Budget

Online Classification on a Budget Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Semi-Supervised Learning for Blog Classification

Semi-Supervised Learning for Blog Classification Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Online Lazy Updates for Portfolio Selection with Transaction Costs

Online Lazy Updates for Portfolio Selection with Transaction Costs Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

A semi-supervised Spam mail detector

A semi-supervised Spam mail detector A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

MapReduce/Bigtable for Distributed Optimization

MapReduce/Bigtable for Distributed Optimization MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data

More information

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Interactive Machine Learning. Maria-Florina Balcan

Interactive Machine Learning. Maria-Florina Balcan Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Parallel & Distributed Optimization. Based on Mark Schmidt s slides Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Federated Optimization: Distributed Optimization Beyond the Datacenter

Federated Optimization: Distributed Optimization Beyond the Datacenter Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh J.Konecny@sms.ed.ac.uk H. Brendan McMahan Google, Inc. Seattle, WA 98103

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information