Multi-Label Learning with Millions of Labels for Query Recommendation
|
|
- Magdalene Haynes
- 8 years ago
- Views:
Transcription
1 Multi-Label Learning with Millions of Labels for Query Recommendation Rahul Agrawal Microsoft AdCenter Yashoteja Prabhu Microsoft Research India Archit Gupta IIT Delhi Manik Varma Microsoft Research India
2 Recommending Advertiser Bid Phrases geico auto insurance geico car insurance geico insurance www geico com care geicos geico com need cheap auto insurance wisconsin cheap car insurance quotes cheap auto insurance florida all state car insurance coupon code
3 Query Rewriting geico auto insurance geico car insurance Absolutely cheapest car insurance geico insurance www geico com care geicos geico com need cheap auto insurance wisconsin cheap car insurance quotes cheap auto insurance florida all state car insurance coupon code
4 Ranking & Relevance Meta Stream geico auto insurance geico car insurance geico insurance www geico com care geicos geico com need cheap auto insurance wisconsin cheap car insurance quotes cheap auto insurance florida geico twitter
5 Recommending Advertiser Bid Phrases geico auto insurance geico car insurance geico insurance www geico com care geicos geico com need cheap auto insurance wisconsin cheap car insurance quotes cheap auto insurance florida all state car insurance coupon code
6 Learning to Predict a Set of Queries italian restaurant f : X 2 Y need cheap auto insurance geico online quote car insurance iphone X: Ads Y: Queries
7 Learning to Predict a Set of Queries f ( ) need cheap auto insurance geico car insurance
8 Multi-Label Learning Challenges f ( ) need cheap auto insurance geico car insurance Infinite number of labels (queries) Training data acquisition Efficient training Cost of prediction
9 Binary Classification & Ranking h : (X, Y) {, } h(, geico) h(, iphone) Infinite number of labels (queries) Training data acquisition Efficient training Cost of prediction
10 Binary Classification h : (X, Y) {, } italian restaurant need cheap auto insurance geico online quote car insurance Infinite number of labels (queries) Training data acquisition Efficient training Cost of prediction iphone
11 Binary Classification KEX h : (X, Y) {, } switching to geico geico online quote car insurance Infinite number of labels (queries) Training data acquisition Efficient training Cost of prediction
12 Query Recommendations by KEX
13 Query Recommendations by KEX h(, car insurance)? h(, iphone)?
14 Query Recommendations by KEX plastic ponies simone plastics clothing and accessories sylvia pony clothing couture playground plastic recycling children's clothing
15 Multi-Label Learning Formulation italian restaurant f : X 2 Y need cheap auto insurance geico online quote car insurance iphone X: Ads Y: Queries
16 Learning with Millions of Labels italian restaurant f : X 2 Y need cheap auto insurance geico online quote car insurance iphone X: Ads Y: 10 Million Queries
17 Multi-Label Random Forests We develop Multi-Label Random Forests with logarithmic prediction costs that make predictions in a few milliseconds. We train on 200 M points, 100 M categories and 10 M features in 28 hours on a grid with 1000 compute nodes. We develop a tree growing criterion which learns from positive data alone. We generate training data automatically from click logs. We develop a sparse SSL formulation to infer beliefs about the state of missing and noisy labels.
18 Training Data Missing Labels No annotator can mark all the relevant labels for a data point. We have missing labels during Training Validation Testing. Even fundamental ML techniques such as validation can go awry. One can t design error metrics invariant to missing labels.
19 Training Data and Features iphone color material TF-IDF Bag of Words Features
20 Training Labels case for iphone best iphone case apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple iphone 4g cases best iphone nn4 case iphone 3gs cases iphone 4s case case iphone otterbox universal defender case iphone nn4 black silicone black plastic sena iphone cases apple iphone 4g premium soft silicone rubber black phone protector skin cover case apple iphone nn4 cases belkin grip vue tint case iphone nn4 clear black white premium bumper case apple iphone nn4 att bunny rabbit silicone case skin iphone nn4 stand tail holder iphone color material iphone case iphone 4g cases iphone case speck iphone case best case iphone 4s iphone 4gs cases iphone nn4 case switcheasy neo case iphone 3g black best case iphone nn4 iphone 4s defender series case 3g iphone cases waterproof iphone case best iphone 3g cases iphone case design TF-IDF Bag of Words Features iphone cases 4g apple iphone cases waterproof iphone cases best iphone 4s case iphone cases 3g best iphone 3g case amazonbasics protective tpu case screen protector att verizon iphone nn4 iphone 4s clear best iphone 4s cases
21 Training Labels
22 Missing and Noisy Labels best italian restaurants philadelphia italian restaurants italian restaurant italian restaurants arkansas italian restaurants connecticut italian restaurants idaho italian restaurants phoenix italian restaurant chains italian restaurant connecticut italian restaurant district columbia thai restaurant thai restaurants restaurants mexican restaurants
23 Missing and Noisy Labels
24 Frequency Biased Training Data Most labels will have very few positive training examples Zipf's Law
25 Multi-Label Prediction Costs Linear prediction costs are infeasible geico car insurance pizza iphone cases 1-vs-All Classification
26 Label and Feature Space Compression 10M Dimensional Label Space car motor vehicle auto iphone cases iphone case cases iphone 6M Dimensional Feature Space Car Ads iphone Case Ads 1K Dimensional Embedding Space
27 Hierarchical Prediction Prediction in logarithmic time
28 Gating Tree Based Prediction Prediction in logarithmic time Is the word insurance present in the ad? Yes No Is the word geico present in the ad? Yes No
29 Ensemble of Randomized Gating Trees
30 Efficient Training We seek classifiers and optimization algorithms that Are massively parallelizable Don t need to load the feature vectors (1 Tb) into RAM Don t need to load the label matrix (100 Gb) into RAM Number of training points Number of labels Dimensionality of feature vector 200 Million 100 Million 10 Million Number of cores RAM per core Training time 2 Gb 28 hours
31 Multi-Label Random Forests The splitting cost needs to be calculated in a 2 10M space Is the word insurance present?
32 Learning from Positively Labeled Data Split condition : x f > t f, t = argmin f,t n l k p l l k (1 p l l k ) + n r k p r l k (1 p r l k ) p l k = i p l k ad i p(ad i ) x f > t l1 l2 l3 0 l1 l2 l3
33 Multi-Label Random Forests x 1, y 1 = {l 2, l 3 } 1 0 l1 l2 l3 (x 1, y 1 ) (x 2, y 2 ) (x 3, y 3 ) x 2, y 2 = {l 1, l 3 } x 3, y 3 = {l 1, l 2, l 3 } l1 l2 l3 l1 l2 l3 p(y) l1 l2 l3
34 Query Recommendation Data Sets Data set statistics Data Set # of Training Points (M) # of Test Points (M) # of Dimensions (M) # of Labels (M) Wikipedia Ads Web Ads
35 Performance Evaluation We use loss functions where the penalty incurred for predicting the real (but unknown) ground truth is never more than that of predicting any other labelling L y, y Observed L y, y Observed y Y Hamming Loss Precision at k We found Precision at 10 to be robust for our application.
36 Query Recommendation Results MLRF KEX Wikipedia Ads1 Web Ads2 Percentage of top 10 predictions that were clicked queries
37 Query Recommendation Results MLRF KEX Wikipedia Ads1 Web Ads2 Percentage of top 10 predictions that were relevant
38
39 Geico Car Insurance KEX MLRF geico auto insurance geico car insurance geico insurance www geico com care geicos geico com need cheap auto insurance wisconsin cheap car insurance quotes cheap auto insurance florida all state car insurance coupon code
40
41 Domino s Pizza KEX MLRF dominos dominos pizza domino pizza domino pasta bowls domino pizza coupons domino pizza deals domino pizza locations domino pizza menu domino pizza online
42
43 Simone & Sylvia Kid s Clothing KEX plastic ponies simone plastics clothing and accessories sylvia pony clothing couture playground Plastic recycling children's clothing MLRF toddlers clothes toddlers clothing toddler costumes children clothes sale children clothes designer children clothes cute children clothes retro clothing retro baby clothes baby clothing
44
45 KCS Flowers KEX funeral flowers sympathy funeral flowers web home bleitz funeral home funeral flowers discount yarington's funeral home harvey funeral home green lake funeral home howden kennedy funeral home arranging flowers MLRF flowers delivery funeral arrangements birthday flowers funeral flowers funeral planning flowers valentines free delivery flowers cheap flowers florists cheap flowers funeral
46
47 Vistaprint Designer T-Shirts KEX embroidered apparel custom apparel readymade apparel customizable apparel customizable apparel leading print online business cards apparel and accessories own text MLRF custom t shirts funny t shirts hanes beefy t shirts hanes t shirts long sleeve t shirts personalized t shirts printed t shirts retro gamer t shirts t shirts buy custom t shirts
48
49 Metlife Auto Insurance KEX metlife auto home insurance auto home insurance auto insurance massachusetts metlife agent driver discount additional cost saving benefits car discount auto quote MLRF metlife auto insurance auto Insurance car Insurance automobile Insurance geico insurance cheap car insurance metlife auto insurance broker insurance home insurance
50
51 Wanta Thai Restaurant KEX authentic thai restaurant delicious thai food thai cuisine thai restaurant thai food wanta best thai restaurant thai eateries thai contemporary thai MLRF thai restaurant thai restaurants mexican restaurants cheap hotels hotels fast food restaurants restaurants coupons best web hosting restaurants vegetarian foods new york restaurants
52
53 best italian restaurants philadelphia italian restaurants italian restaurant italian restaurants arkansas italian restaurants connecticut italian restaurants idaho italian restaurants phoenix italian restaurant chains italian restaurant connecticut italian restaurant district columbia thai restaurant thai restaurants restaurants mexican restaurants
54
55 Compensating for Missing Labels 0.5 Case-mate phone cases 0.7 Auto insurance quotes Esurance 0.8 American family insurance 0.9 Progressive insurance Allstate auto insurance Maggiano s restaurant
56 Training on Belief Vectors 1 x 1, y 1 = l 2, l 3, f 1 0 l1 l2 l3 (x 1, f 1 ) (x 2, f 2 ) (x 3, f 3 ) x 2, y 2 = l 1, l 3, f 2 x 3, y 3 = l 1, l 2, l 3, f l1 l2 l3 l1 l2 l3 1 p(f) l1 l2 l3
57 Sparse Semi-Supervised Learning Graph-based SSL optimizes label belief smoothness and fidelity to original labels 1 F* = Min Tr F 2 Ft I D 1 2 W D 1 2 F + λ 2 s. t. F 0 K F Y 2 W MXM D MXM Y MXL F MXL λ M L K Document-document similarity matrix Diagonal matrix representing the row sums of W 0/1 label matrix Real valued label belief matrix Trade-off Hyperparameter Number of documents Number of labels Sparsity constant
58 Sparse Semi-Supervised Learning Graph-based SSL optimizes label belief smoothness and fidelity to original labels F* = Min F 1 Σ 2 i=1..lσ j=1..m l=1..m s. t. F 0 K w jl ( F ij D jj F il D ll ) 2 + λ 2 Σ i=1..m j=1..l (F ij Y ij ) 2 W MXM D MXM Y MXL F MXL λ M L K Document-document similarity matrix Diagonal matrix representing the row sums of W 0/1 label matrix Real valued label belief matrix Trade-off Hyperparameter Number of documents Number of labels Sparsity constant
59 Iterative Hard Thresholding Sparse SSL formulation F* = Min F J F = 1 2 Tr Ft I D 1 2 W D 1 2 F + λ 2 s. t. F 0 K F Y 2 The iterative hard thresholding algorithm converges to a global/local optimum F 0 F t+ 1 2 = Y = 1 λ+1 D 1 2 W D 1 2F t + F t+1 = Top K (F 1 t+ ) 2 λ λ+1 Y
60 Iterative Hard Thresholding If Y ij {0, 1} and W is positive definite then The sequence F 0, F 1, converges to a stationary point F. J(F 0 ) J(F 1 ) J(F ) If F 0 < K then F is a globally optimal solution If F 0 = K then F is a locally optimal solution J F J F + Min( λ 2 K + Y λ + 1 0, 2 ML K α K (F ) Y 0 )
61 Semi-Supervised Learning Results as judged by automatically generated click labels as well as by human experts. Data Set MLRF Click Labels (%) Human Verification (%) MLRF+ SSL KEX MLRF MLRF+ SSL Wikipedia KEX Ads Bing Ads
62 Query Expansion Results Query expansion techniques can help both KEX and MLRF Data Set Click Labels (%) Human Verification (%) MLRF+ SSL+KSP KEX+KSP MLRF+ SSL+KSP KEX+KSP Wikipedia Ads Web Ads
63 Query Recommendation Results Edit distance [Ravi et al. WSDM 2010] Data Set Click Labels (%) KEX KEX+KSP MLRF MLRF+SSL MLRF+SSL+ KSP Wikipedia Ads Web Ads
64 Conclusions Query recommendation can be posed as multi-label learning. Learning with millions of labels can be tractable and accurate. Other applications Query expansion. Document and ad relevance and ranking. Fine-grained query intent classification.
65 Deepak Bapna Prateek Jain A. Kumaran Mehul Parsana Krishna Leela Poola Adarsh Prasad Varun Singla Acknowledgements
66 Advantages of an ML Approach Can generalize to other domains such as images on Flickr or videos on YouTube.
67 System Architecture We leverage the Map/Reduce framework. Trees are grown in parallel breadth-wise. Number of compute nodes Evaluators 500 Combiners 100 Maximizers 25 Evaluator 1 Maximizer 1 Maximizer 2 Combiner 1 Evaluator 2 F*, T* Combiner 2 Evaluator 3 Combiner 3 Evaluator 4 Our objective is to balance the compute load across machines while minimizing data flow X 1,Y 1 to X N, Y N X N+1,Y N+1 to X 2N, Y 2N X 2N+1,Y 2N+1 to X 3N, Y 3N X 3N+1,Y 3N+1 to X 4N, Y 4N
68 Evaluators Input N training instances Set of keys Tree ID, Node ID, Feature ID and threshold Output Partial label distributions for the keys Evaluator 1 Maximizer 1 Maximizer 2 Combiner 1 Evaluator 2 F*, T* Combiner 2 Evaluator 3 Combiner 3 Evaluator 4 Computation N * # of keys X 1,Y 1 to X N, Y N X N+1,Y N+1 to X 2N, Y 2N X 2N+1,Y 2N+1 to X 3N, Y 3N X 3N+1,Y 3N+1 to X 4N, Y 4N
69 Combiners Input Partial label distributions for assigned keys F*, T* Maximizer 1 Maximizer 2 Output Objective function values for the keys. Combiner 1 Combiner 2 Combiner 3 Computation # of keys * Avg # of Evaluators / key * # of labels in the distribution for the key. X N+1,Y N+1 Evaluator 1 Evaluator 2 Evaluator 3 Evaluator 4 X 1,Y 1 to X N, Y N to X 2N, Y 2N X 2N+1,Y 2N+1 to X 3N, Y 3N X 3N+1,Y 3N+1 to X 4N, Y 4N
70 Maximizers Input Objective function values for assigned keys Output Optimal feature and threshold for assigned nodes in trees. Computation # of keys * Avg # of features per key * Avg # of Evaluator 1 Maximizer 1 Maximizer 2 Combiner 1 Evaluator 2 F*, T* Combiner 2 Evaluator 3 Combiner 3 Evaluator 4 thresholds per feature X 1,Y 1 X N+1,Y N+1 to X N, Y N to X 2N, Y 2N X 2N+1,Y 2N+1 to X 3N, Y 3N X 3N+1,Y 3N+1 to X 4N, Y 4N
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationAttribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationThe Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationMaximize Revenues on your Customer Loyalty Program using Predictive Analytics
Maximize Revenues on your Customer Loyalty Program using Predictive Analytics 27 th Feb 14 Free Webinar by Before we begin... www Q & A? Your Speakers @parikh_shachi Technical Analyst @tatvic Loves js
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationTree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationInvited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM
More informationBilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationCan we Analyze all the Big Data we Collect?
DBKDA/WEB Panel 2015, Rome 28.05.2015 DBKDA Panel 2015, Rome, 27.05.2015 Reutlingen University Can we Analyze all the Big Data we Collect? Moderation: Fritz Laux, Reutlingen University, Germany Panelists:
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationStreamdrill: Analyzing Big Data Streams in Realtime
Streamdrill: Analyzing Big Data Streams in Realtime Mikio L. Braun mikio@streamdrill.com @mikiobraun th 6 Realtime Big Data: Sources Finance Gaming Monitoring Advertisment Sensor Networks Social Media
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationSibyl: a system for large scale machine learning
Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning
More informationThe Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
More informationMeasuring the online experience of auto insurance companies
AUTO INSURANCE BENCHMARK STUDY Measuring the online experience of auto insurance companies Author: Ann Rochanayon Sr. Director of UX/CX Research at UserZoom Contents Introduction Study Methodology Summary
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationIJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
More informationPredicting Flight Delays
Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
More informationAutomated Model Based Testing for an Web Applications
Automated Model Based Testing for an Web Applications Agasarpa Mounica, Lokanadham Naidu Vadlamudi Abstract- As the development of web applications plays a major role in our day-to-day life. Modeling the
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationIntroduction to Clustering
1/57 Introduction to Clustering Mark Johnson Department of Computing Macquarie University 2/57 Outline Supervised versus unsupervised learning Applications of clustering in text processing Evaluating clustering
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationW6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
More informationMachine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationPart III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationExtreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1
Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing
More informationE6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics
E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationClient Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
Client Based Power Iteration Clustering Algorithm to Reduce Dimensionalit in Big Data Jaalatchum. D 1, Thambidurai. P 1, Department of CSE, PKIET, Karaikal, India Abstract - Clustering is a group of objects
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationFinding Advertising Keywords on Web Pages. Contextual Ads 101
Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationResponse prediction using collaborative filtering with hierarchies and side-information
Response prediction using collaborative filtering with hierarchies and side-information Aditya Krishna Menon 1 Krishna-Prasad Chitrapura 2 Sachin Garg 2 Deepak Agarwal 3 Nagaraj Kota 2 1 UC San Diego 2
More informationNeural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationSteven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationSupervised Learning Evaluation (via Sentiment Analysis)!
Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents
More informationDynamics of Genre and Domain Intents
Dynamics of Genre and Domain Intents Shanu Sushmita, Benjamin Piwowarski, and Mounia Lalmas University of Glasgow {shanu,bpiwowar,mounia}@dcs.gla.ac.uk Abstract. As the type of content available on the
More informationCrowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering
More informationBITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationMulti-Class and Structured Classification
Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationHomework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9
Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;
More informationA Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationData Mining and Predictive Analytics - Assignment 1 Image Popularity Prediction on Social Networks
Data Mining and Predictive Analytics - Assignment 1 Image Popularity Prediction on Social Networks Wei-Tang Liao and Jong-Chyi Su Department of Computer Science and Engineering University of California,
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationThe Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
More informationHow To Write A Data Processing Pipeline In R
New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end
More informationSocial Search. Communities of users actively participating in the search process
Chapter 1 Social Search Social Search Social search Communities of users actively participating in the search process Goes beyond classical search tasks Key differences Users interact with the system Users
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More information