Active Learning with Boosting for Spam Detection
|
|
|
- Scott Green
- 10 years ago
- Views:
Transcription
1 Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, / 38
2 Outline 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
3 Outline Spam Filters 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
4 Spam Filtering Spam Filters Active Learning with Boosting for Spam Detection Last update: March 22, / 38
5 Outline Active Learning and Boosting 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
6 Active Learning and Boosting What is Active Learning Given data, X 1 X n, n=# examples And labels Y 1 Y t, t=# labels And t <<< n How do we build a good classifier? Active Learning with Boosting for Spam Detection Last update: March 22, / 38
7 Active Learning and Boosting Boosting Given data, < X 1, Y 1 > < X n, Y n > A weak learner that does slightly better than a random classifier that is error, ɛ < 0.5 builds a set of hypotheses h 1 h t over t trials and assigns a confidence on each hypotheses α t after T trials a final strong classifier is constructed using a weighted majority vote of the obtained T hypotheses Active Learning with Boosting for Spam Detection Last update: March 22, / 38
8 Outline Algorithm 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
9 Algorithm Active Learning using Confidence based data sampling Given data S, with labeled data set S t and unlabeled data S u. Repeat Train a classifier using the current training data S t. Predict on S u using this classifier Compute confidence scores on S u Sort the scores Label the lowest scored k scored examples Call the new labeled set S i Set S t = S t S i ; S u = S u S i Active Learning with Boosting for Spam Detection Last update: March 22, / 38
10 Algorithm AdaBoost algorithm Given (x 1, y 1 )... (x n, y n ) S t wherey i = 0, 1 Initialize weights W 1... W f = 1/f, f= number of features Active Learning with Boosting for Spam Detection Last update: March 22, / 38
11 Algorithm for t=1 to T do W i = W i / i W i for each feature j, train a classifier h j compute error, ε j = i W i h j (x i ) y i choose classifier h t with lowest error update weights W t+1,i = W t,i β 1 e i { t where 0 if classified correctly e i = 1 otherwise ε t β t = 1 ε t compute α t = log(1/β t ) Active Learning with Boosting for Spam Detection Last update: March 22, / 38
12 Algorithm final output, { strong classifier, 1 if T h(x) = t=1 α th t (x) 1/2 T t=1 α t) 0 otherwise Active Learning with Boosting for Spam Detection Last update: March 22, / 38
13 Outline Sampling Methods 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
14 Sampling Methods Confidence based sampling Compute confidence scores on S u Sort the scores Label the lowest scored k scored examples These k examples are the ones closest to the classifier hyperplane. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
15 Sampling Methods Commitee Based Sampling Boosting is inherently a comitte based decision maker final output, strong classifier h(x)=1 if T t=1 α th t (x 1/2 T t=1 α t) and 0 otherwise Note not all the hypotheses are equally weighted The final confidence scores are low for examples for which multiple hypotheses disagree upon Active Learning with Boosting for Spam Detection Last update: March 22, / 38
16 Scoring Function Sampling Methods T t=1 confidence score score(x i ) = α th t(x i ) T t=1 α { t 1 if where, h t(x ht (x i ) = i ) = 0 1 if h t (x i ) = 1 Active Learning with Boosting for Spam Detection Last update: March 22, / 38
17 Outline Weak Learner 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
18 Weak Learner Visualization of the data Active Learning with Boosting for Spam Detection Last update: March 22, / 38
19 Weak Learner Single Feature Weak Learner { 1 if pj f h j (x) = j (x) < p j θ j 0 otherwise where, p j = +1, 1 and θ j = 0.5, 0.5 Error, ε j = i W i h j (x i ) y i Active Learning with Boosting for Spam Detection Last update: March 22, / 38
20 Outline Performance Analysis 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
21 Performance Analysis Testing and Analysis I used the SPAM data set provided in the class. It has 2000 examples with 2000 features per example. Restricted the total number of labeled examples used in training to 250 out of 2000 examples. Start with S t = 50 labeled examples k = 20 hard examples in each iteration Total 10 active learning iterations Active Learning with Boosting for Spam Detection Last update: March 22, / 38
22 Performance Analysis Does Active learning using Confidence based label sampling work? Do we see improvement in the true prediction rate? Do we see a decrease in the false prediction rate? Active Learning with Boosting for Spam Detection Last update: March 22, / 38
23 Performance Analysis TPR and FPR of the training set and test set Active Learning with Boosting for Spam Detection Last update: March 22, / 38
24 Performance Analysis Confidence based sampling vs Random sampling Does it do better than the random sampling? What are we measuring: True Positive rate True Prediction rate Misclassification rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
25 Performance Analysis True positive rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
26 Performance Analysis True prediction rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
27 Performance Analysis Misclassification rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
28 Performance Analysis Effect of boosting on active learning Active Learning with Boosting for Spam Detection Last update: March 22, / 38
29 Performance Analysis Adaboost performance on training data Active Learning with Boosting for Spam Detection Last update: March 22, / 38
30 Performance Analysis True Positive Rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
31 Performance Analysis False Positive Rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
32 Performance Analysis AdaBoost Training Margin Active Learning with Boosting for Spam Detection Last update: March 22, / 38
33 Performance Analysis Comparision of AdaBoost algorithm with AdaBoost ρ Active Learning with Boosting for Spam Detection Last update: March 22, / 38
34 Outline Future Work 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
35 Future Work 1 Implement other more sophisticated boosting algorithms 2 Compare Active Learning with Boosting with Active Learning using SVM 3 Implement other types of weak learners 4 Try to come up with an adaptive sampling technique for labeling Active Learning with Boosting for Spam Detection Last update: March 22, / 38
36 Outline Conclusions 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
37 Conclusions Achieved 86% accuracy level was achieved by restricting the labeled training data to 10% Active learning with confidence based sampling performed much better than random sampling Building a classifier using a weighted average of single feature hypotheses performed much better than best single feature based training. AdaBoost on this SPAM data set needs around 35 boosting iterations to build the perfect classifier. Margin of the training data also converges after 35 iterations. Constraining the margin using AdaBoost ρ did not improve the test error. More tests need to be performed to analyze the performance of soft margin based boosting for active learning. Should compare boosting as a classifier with other classifiers such as SVM which are commonly used for active learning. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
38 Outline References 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
39 References Y. Abramson and Y. Freund. Active learning for visual object recognition. UCSD Report, 1, Y. Freund and R.E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5): , D.Z. Hakkani-Tur, R.E. Schapire, and G. Tur. Active learning for spoken language understanding, August US Patent 7,263,486. G. Rätsch and M.K. Warmuth. Efficient Margin Maximizing with Boosting. The Journal of Machine Learning Research, 6: , Active Learning with Boosting for Spam Detection Last update: March 22, / 38
40 References R.E. Schapire. A brief introduction to boosting. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 2: , D. Sculley. Online Active Learning Methods for Fast Label-Efficient Spam Filtering. P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 1(2), M.K. Warmuth, K. Glocer, and G. Ratsch. Boosting Algorithms for Maximizing the Soft Margin. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Boosting. [email protected]
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg [email protected]
Robust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
How Boosting the Margin Can Also Boost Classifier Complexity
Lev Reyzin [email protected] Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire [email protected] Princeton University, Department
AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Logistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
Training Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
1 What is Machine Learning?
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
Online Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm
Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Interactive Machine Learning. Maria-Florina Balcan
Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
CSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Beating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
CS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Operations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Source. The Boosting Approach. Example: Spam Filtering. The Boosting Approach to Machine Learning
Source The Boosting Approach to Machine Learning Notes adapted from Rob Schapire www.cs.princeton.edu/~schapire CS 536: Machine Learning Littman (Wu, TA) Example: Spam Filtering problem: filter out spam
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Asymmetric Gradient Boosting with Application to Spam Filtering
Asymmetric Gradient Boosting with Application to Spam Filtering Jingrui He Carnegie Mellon University 5 Forbes Avenue Pittsburgh, PA 523 USA [email protected] ABSTRACT In this paper, we propose a new
Machine Learning Algorithms for Classification. Rob Schapire Princeton University
Machine Learning Algorithms for Classification Rob Schapire Princeton University Machine Learning studies how to automatically learn to make accurate predictions based on past observations classification
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
FilterBoost: Regression and Classification on Large Datasets
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley Machine Learning Department Carnegie Mellon University Pittsburgh, PA 523 [email protected] Robert E. Schapire Department
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Monday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.
Journal of Asian Scientific Research journal homepage: http://aesswebcom/journal-detailphp?id=5003 COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM25 IN HONG KONG RURAL AREA Yin Zhao School
Open-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence
Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Ensemble of Classifiers Based on Association Rule Mining
Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Sibyl: a system for large scale machine learning
Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Content-Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison
Content-Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison 1 R.Malarvizhi, 2 K.Saraswathi 1 Research scholar, PG & Research Department of Computer Science, Government Arts
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
Active Learning in the Drug Discovery Process
Active Learning in the Drug Discovery Process Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen Computer Science Dep., Univ. of Calif. at Santa Cruz FHG FIRST, Kekuléstr.
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Crowdfunding Support Tools: Predicting Success & Failure
Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo [email protected] [email protected] Karthic Hariharan [email protected] tern.edu Elizabeth
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets
Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets Mohamed Abouelenien Xiaohui Yuan Abstract Ensemble methods have been used for incremental learning. Yet, there are several issues
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
II. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Car Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa [email protected]
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
