Active Learning with Boosting for Spam Detection
|
|
- Scott Green
- 8 years ago
- Views:
Transcription
1 Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, / 38
2 Outline 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
3 Outline Spam Filters 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
4 Spam Filtering Spam Filters Active Learning with Boosting for Spam Detection Last update: March 22, / 38
5 Outline Active Learning and Boosting 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
6 Active Learning and Boosting What is Active Learning Given data, X 1 X n, n=# examples And labels Y 1 Y t, t=# labels And t <<< n How do we build a good classifier? Active Learning with Boosting for Spam Detection Last update: March 22, / 38
7 Active Learning and Boosting Boosting Given data, < X 1, Y 1 > < X n, Y n > A weak learner that does slightly better than a random classifier that is error, ɛ < 0.5 builds a set of hypotheses h 1 h t over t trials and assigns a confidence on each hypotheses α t after T trials a final strong classifier is constructed using a weighted majority vote of the obtained T hypotheses Active Learning with Boosting for Spam Detection Last update: March 22, / 38
8 Outline Algorithm 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
9 Algorithm Active Learning using Confidence based data sampling Given data S, with labeled data set S t and unlabeled data S u. Repeat Train a classifier using the current training data S t. Predict on S u using this classifier Compute confidence scores on S u Sort the scores Label the lowest scored k scored examples Call the new labeled set S i Set S t = S t S i ; S u = S u S i Active Learning with Boosting for Spam Detection Last update: March 22, / 38
10 Algorithm AdaBoost algorithm Given (x 1, y 1 )... (x n, y n ) S t wherey i = 0, 1 Initialize weights W 1... W f = 1/f, f= number of features Active Learning with Boosting for Spam Detection Last update: March 22, / 38
11 Algorithm for t=1 to T do W i = W i / i W i for each feature j, train a classifier h j compute error, ε j = i W i h j (x i ) y i choose classifier h t with lowest error update weights W t+1,i = W t,i β 1 e i { t where 0 if classified correctly e i = 1 otherwise ε t β t = 1 ε t compute α t = log(1/β t ) Active Learning with Boosting for Spam Detection Last update: March 22, / 38
12 Algorithm final output, { strong classifier, 1 if T h(x) = t=1 α th t (x) 1/2 T t=1 α t) 0 otherwise Active Learning with Boosting for Spam Detection Last update: March 22, / 38
13 Outline Sampling Methods 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
14 Sampling Methods Confidence based sampling Compute confidence scores on S u Sort the scores Label the lowest scored k scored examples These k examples are the ones closest to the classifier hyperplane. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
15 Sampling Methods Commitee Based Sampling Boosting is inherently a comitte based decision maker final output, strong classifier h(x)=1 if T t=1 α th t (x 1/2 T t=1 α t) and 0 otherwise Note not all the hypotheses are equally weighted The final confidence scores are low for examples for which multiple hypotheses disagree upon Active Learning with Boosting for Spam Detection Last update: March 22, / 38
16 Scoring Function Sampling Methods T t=1 confidence score score(x i ) = α th t(x i ) T t=1 α { t 1 if where, h t(x ht (x i ) = i ) = 0 1 if h t (x i ) = 1 Active Learning with Boosting for Spam Detection Last update: March 22, / 38
17 Outline Weak Learner 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
18 Weak Learner Visualization of the data Active Learning with Boosting for Spam Detection Last update: March 22, / 38
19 Weak Learner Single Feature Weak Learner { 1 if pj f h j (x) = j (x) < p j θ j 0 otherwise where, p j = +1, 1 and θ j = 0.5, 0.5 Error, ε j = i W i h j (x i ) y i Active Learning with Boosting for Spam Detection Last update: March 22, / 38
20 Outline Performance Analysis 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
21 Performance Analysis Testing and Analysis I used the SPAM data set provided in the class. It has 2000 examples with 2000 features per example. Restricted the total number of labeled examples used in training to 250 out of 2000 examples. Start with S t = 50 labeled examples k = 20 hard examples in each iteration Total 10 active learning iterations Active Learning with Boosting for Spam Detection Last update: March 22, / 38
22 Performance Analysis Does Active learning using Confidence based label sampling work? Do we see improvement in the true prediction rate? Do we see a decrease in the false prediction rate? Active Learning with Boosting for Spam Detection Last update: March 22, / 38
23 Performance Analysis TPR and FPR of the training set and test set Active Learning with Boosting for Spam Detection Last update: March 22, / 38
24 Performance Analysis Confidence based sampling vs Random sampling Does it do better than the random sampling? What are we measuring: True Positive rate True Prediction rate Misclassification rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
25 Performance Analysis True positive rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
26 Performance Analysis True prediction rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
27 Performance Analysis Misclassification rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
28 Performance Analysis Effect of boosting on active learning Active Learning with Boosting for Spam Detection Last update: March 22, / 38
29 Performance Analysis Adaboost performance on training data Active Learning with Boosting for Spam Detection Last update: March 22, / 38
30 Performance Analysis True Positive Rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
31 Performance Analysis False Positive Rate Active Learning with Boosting for Spam Detection Last update: March 22, / 38
32 Performance Analysis AdaBoost Training Margin Active Learning with Boosting for Spam Detection Last update: March 22, / 38
33 Performance Analysis Comparision of AdaBoost algorithm with AdaBoost ρ Active Learning with Boosting for Spam Detection Last update: March 22, / 38
34 Outline Future Work 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
35 Future Work 1 Implement other more sophisticated boosting algorithms 2 Compare Active Learning with Boosting with Active Learning using SVM 3 Implement other types of weak learners 4 Try to come up with an adaptive sampling technique for labeling Active Learning with Boosting for Spam Detection Last update: March 22, / 38
36 Outline Conclusions 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
37 Conclusions Achieved 86% accuracy level was achieved by restricting the labeled training data to 10% Active learning with confidence based sampling performed much better than random sampling Building a classifier using a weighted average of single feature hypotheses performed much better than best single feature based training. AdaBoost on this SPAM data set needs around 35 boosting iterations to build the perfect classifier. Margin of the training data also converges after 35 iterations. Constraining the margin using AdaBoost ρ did not improve the test error. More tests need to be performed to analyze the performance of soft margin based boosting for active learning. Should compare boosting as a classifier with other classifiers such as SVM which are commonly used for active learning. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
38 Outline References 1 Spam Filters 2 Active Learning and Boosting 3 Algorithm 4 Sampling Methods 5 Weak Learner 6 Performance Analysis 7 Future Work 8 Conclusions 9 References Active Learning with Boosting for Spam Detection Last update: March 22, / 38
39 References Y. Abramson and Y. Freund. Active learning for visual object recognition. UCSD Report, 1, Y. Freund and R.E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5): , D.Z. Hakkani-Tur, R.E. Schapire, and G. Tur. Active learning for spoken language understanding, August US Patent 7,263,486. G. Rätsch and M.K. Warmuth. Efficient Margin Maximizing with Boosting. The Journal of Machine Learning Research, 6: , Active Learning with Boosting for Spam Detection Last update: March 22, / 38
40 References R.E. Schapire. A brief introduction to boosting. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 2: , D. Sculley. Online Active Learning Methods for Fast Label-Efficient Spam Filtering. P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 1(2), M.K. Warmuth, K. Glocer, and G. Ratsch. Boosting Algorithms for Maximizing the Soft Margin. Active Learning with Boosting for Spam Detection Last update: March 22, / 38
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationBoosting. riedmiller@informatik.uni-freiburg.de
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationRobust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationHow Boosting the Margin Can Also Boost Classifier Complexity
Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department
More informationAdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLogistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
More informationTraining Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
More information1 What is Machine Learning?
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationOnline Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationHow To Train A Classifier With Active Learning In Spam Filtering
Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationOn-line Boosting and Vision
On-line Boosting and Vision Helmut Grabner and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {hgrabner, bischof}@icg.tu-graz.ac.at Abstract Boosting has become
More informationCase Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationOnline Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm
Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationInteractive Machine Learning. Maria-Florina Balcan
Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBeating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationMining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationSource. The Boosting Approach. Example: Spam Filtering. The Boosting Approach to Machine Learning
Source The Boosting Approach to Machine Learning Notes adapted from Rob Schapire www.cs.princeton.edu/~schapire CS 536: Machine Learning Littman (Wu, TA) Example: Spam Filtering problem: filter out spam
More informationAnti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationAsymmetric Gradient Boosting with Application to Spam Filtering
Asymmetric Gradient Boosting with Application to Spam Filtering Jingrui He Carnegie Mellon University 5 Forbes Avenue Pittsburgh, PA 523 USA jingruih@cs.cmu.edu ABSTRACT In this paper, we propose a new
More informationMachine Learning Algorithms for Classification. Rob Schapire Princeton University
Machine Learning Algorithms for Classification Rob Schapire Princeton University Machine Learning studies how to automatically learn to make accurate predictions based on past observations classification
More informationOn-line Spam Filter Fusion
On-line Spam Filter Fusion Thomas Lynam & Gordon Cormack originally presented at SIGIR 2006 On-line vs Batch Classification Batch Hard Classifier separate training and test data sets Given ham/spam classification
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationFilterBoost: Regression and Classification on Large Datasets
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley Machine Learning Department Carnegie Mellon University Pittsburgh, PA 523 jkbradle@cs.cmu.edu Robert E. Schapire Department
More informationOn Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationOn the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationSVM-Based Spam Filter with Active and Online Learning
SVM-Based Spam Filter with Active and Online Learning Qiang Wang Yi Guan Xiaolong Wang School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China Email:{qwang, guanyi,
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationMonday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
More informationData Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/36 Data Analytics and Business Intelligence (8696/8697) Ensemble Decision Trees Graham.Williams@togaware.com Data Scientist Australian
More informationJournal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.
Journal of Asian Scientific Research journal homepage: http://aesswebcom/journal-detailphp?id=5003 COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM25 IN HONG KONG RURAL AREA Yin Zhao School
More informationOpen-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationGovernment of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence
Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationEnsemble of Classifiers Based on Association Rule Mining
Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationSibyl: a system for large scale machine learning
Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationContent-Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison
Content-Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison 1 R.Malarvizhi, 2 K.Saraswathi 1 Research scholar, PG & Research Department of Computer Science, Government Arts
More informationThree types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationActive Learning in the Drug Discovery Process
Active Learning in the Drug Discovery Process Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen Computer Science Dep., Univ. of Calif. at Santa Cruz FHG FIRST, Kekuléstr.
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationRepresentation of Electronic Mail Filtering Profiles: A User Study
Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu
More informationCrowdfunding Support Tools: Predicting Success & Failure
Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationApplication of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
More informationIncremental SampleBoost for Efficient Learning from Multi-Class Data Sets
Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets Mohamed Abouelenien Xiaohui Yuan Abstract Ensemble methods have been used for incremental learning. Yet, there are several issues
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationIntelligent Data Entry Assistant for XML Using Ensemble Learning
Intelligent Data Entry Assistant for XML Using Ensemble Learning Danico Lee Information and Telecommunication Technology Center University of Kansas 2335 Irving Hill Rd, Lawrence, KS 66045, USA lee@ittc.ku.edu
More informationEnsemble Approach for the Classification of Imbalanced Data
Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationDUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
More informationCar Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
More informationNetwork Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
More informationUsing One-Versus-All classification ensembles to support modeling decisions in data stream mining
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa Patricia.Lutu@up.ac.za
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More information