Less naive Bayes spam detection
|
|
- Dwight Stafford
- 4 years ago
- Views:
Transcription
1 Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. also CoSiNe Connectivity Systems and Networks Philips Research, Eindhoven Maurice Stassen NXP Semiconductors HTC 3, WDC AE Eindhoven The Netherlands Tjalling Tjalkens Eindhoven University of Technology Dept. EE, Rm PT 3.6 P.O.Box 53, 5600MB Eindhoven The Netherlands Abstract We consider a binary classification problem with a feature vector of high dimensionality. Spam mail filters are a popular example hereof. A naive Bayes filter assumes conditional independence of the feature vector components. We use the context tree weighting method as an application of the minimum description length principle to allow for dependencies between the feature vector components. It turns out that, due to the limited amount of training data, we must assume conditional independence between groups of vector components. We consider several ad-hoc algorithms to find good groupings and good conditional models. I. INTRODUCTION AND PROBLEM STATEMENT We consider a supervised binary classification problem in which the class is conditioned on a binary feature vector of high dimensionality. Also the amount of training data is relatively small. Thus we are limited in the complexity of the models we may consider. Traditionally one can tackle this problem with the use of a naive Bayes filter. The simple model underlying this method is the assumption that the features are independent conditioned on the class. This results in a model with a small number of parameters that can easily be estimated from a small amount of training data. However, this assumption is too restrictive to be realistic. We shall consider models that allow some dependencies between features. We attempt to find a good trade-off between model complexity and the amount of training data. A good example of this problem and the motivation for this work is the classification of messages into spam or nonspam (ham). Naive Bayes filters have been very successful in separating spam and ham. However, the simple model allows spammers to fool the filter easily. We expect that more complex models are less easy to fool. An is characterized by its class, denoted by c, and the vector of features, f = f, f 2,..., f k. The class is binary, where c = 0 denotes ham and c = is spam. A feature f = if it is present and f = 0 if it is absent. We consider the likelihood ratio of a feature vector. It is given as Pr{c = 0 f} Pr{c = f}. () An with a feature vector f is classified as spam if ξ(f) <. In the following section we briefly describe the naive Bayes filter and thus introduce the problem setting. In section III we discuss models with dependent features and describe a mechanism to select a good model. This approach suffers from a too high complexity and thus cannot use all relevant features. We consider the selection of the most relevant features in section IV. But we still cannot make use of all necessary features. In section V we discuss models that use several trees. Section VI gives some ad-hoc approaches to find good trees and gives some experimental results. II. THE NAIVE BAYES FILTER We have a set of n feature vectors and classes. We denote the feature vector of the i th by f i and the corresponding class by c i. So f i j is the jth feature from the i th training message. We wish to determine the likelihood of a feature vector f using the assumption that the features are conditionally independent. We use Bayes formula to write: Pr{c} Pr{f c} Pr{c f} = (2) Pr{f} k Pr{f c} = Pr{f i c} (3) Pr{c = 0} Pr{c = } k Pr{f i c = 0} Pr{f i c = } It is in (3) where we use the independence assumption. We must make estimates of Pr{c} and Pr{f i c} for each i. For each of these parameters we have n instances in the training set. III. A TREE MODEL AND TREE MODEL SELECTION First consider the full dependence of class and features. We must estimate Pr{c f}. This implies however that we have to estimate 2 k parameters. Moreover, every contributes to precisely one parameter. So, with large k we quickly run out (4)
2 f = f = 0 Pr{c = f = xxx} f 2 = f2 = 0 f 3 = 0 Pr{c = f = 0x} f 4 = Pr{c = f = 00} Pr{c = f = 000} Pr{c = f = 00x} This results in the class sequence probability Pr{c, c 2,..., c n f, f 2,..., f n } = n P (S) Pr{c i π S (f i )}. (9) S S k Here S k is the set of all permissible tree models. For details on the CTW algorithm we refer to [3]. From (9) we see that CTW realizes an MDL solution, see e.g. [4], Fig.. f 3 = 0 Pr{c = f = 000x} A tree of relevant features of training data. This well-known problem is called the curse of dimensionality, see e.g. []. A feature is often selected to be an indication in favor of spam or ham. So, a properly chosen subsequence of features might suffice for the purpose of classification. Formally, we can define a prefix-free and complete set of varying length sequences of features that describe the relevant subsequences of features. Consider for example a problem with four features, f, f 2, f 3, and f 4. In principle we have sixteen parameters, but let the features and the generating process be such that if f = the other features do not contribute to the classification, so Pr{c f = xxx} = Pr{c f = }. See Fig. for a graphical representation of this dependency where we have a total of six parameters for a tree set S = {, 0, 00, 000, 00, 000}. Given a tree S we define π S (f) as the prefix of f that is in S. Note that because S is prefix-free and complete such a prefix always exist and is unique. π S (f) = x where x S and x is a prefix of f. (5) So we can write the a-posteriori class probability as Pr{c f} = Pr{c π S (f)}. (6) The number of possible tree models grows doubly exponential with the length of the feature vector. We must find the best possible tree model in this set. Following the MDL principle, we can use the CTW method to find a good tree structure. With the CTW method we compute a probability of the sequence of classes in the training set conditioned on the corresponding feature vectors, Pr{c, c 2,..., c n f, f 2,..., f n }. The CTW algorithm defines a prior, P (S), over all possible tree sources of height k or less. The height of a tree is the length of its longest path from root to leaf. The prior is given as P (S) = 2 Γ k(s) where Γ k (S), see [2], equals (7) Γ k (S) = 2 S {s S : s = k}. (8) Pr{c, c 2,..., c n f, f 2,..., f n } n max P (S) Pr{c i π S (f i )}. (0) S S k We can use this probability and method for classification in two different ways. C We find the MAP model S. This is described in [2] and also in [5]. This model defines the likelihood ratio of the with feature vector f Pr{c = 0 π S (f)} Pr{c = π S (f)}. The conditional class probability is calculated with the aid of (9) as Pr{c π S (f)} = Pr{c,..., c n, c π S (f ),..., π S (f n ), π S (f)} Pr{c,..., c n π S (f ),..., π S (f n. )} () Again, see [2], this can be computed more efficiently than () implies. C2 We can also continue the weighting procedure one step more and calculate the likelihood ratio as S S k P (S) n Pr{ci π S (f i )} Pr{c = 0 π S (f)} S S k P (S) n Pr{ci π S (f i )} Pr{c = π S (f)}. (2) Still this approach suffers from another form of the curse of dimensionality and too many relevant features are ignored by this method. This can be explained with the MDL principle. We briefly repeat the results as presented in [3]. As in that paper we shall consider log 2 P, or the code word-length, in stead of the probability P. log 2 P (S) This is the model cost, see (7). A more complex model, with more parameters, results in a longer description in terms of bits, or equivalently in a smaller probability. n log 2 Pr{c i π S (f i )} This splits into two parts namely a parameter cost log 2 P (θ S) which equals log 2 n per parameter and the cost of describing the sequence given the model and the parameters (this is an entropy term).
3 Root f f 2 6 P (c π(f)) 4 P (c π(f)) 8 P (c π(f)) 2 P (c π(f)) f 2 = f 2 = 0 Fig. 3. Fig. 2. f 4 = f3 = 0 f 3 f 4 Averaging in the CTW tree Pr{c = f = xx} Pr{c = f = xx0} f 4 = f = f = 0 P (c π(f)) Pr{c = f = x0} Pr{c = f = x00} Pr{c = f = 00x} Pr{c = f = 000x} A tree with arbitrary ordered features A more complex model usually decreases the entropy term, which is linear in n, but increases the parameter and model cost. For small n the increase in parameter and model cost is not overcome by the decrease of the entropy term so, initially, models will be small. Thus for small n only a few features can play a role in the classification. Another way of understanding this lies in the actual procedure of the CTW algorithm. In this algorithm the probability of a sequence, c = c i, c j,..., c k, where each i, j,..., k has the same feature vector part π(f), is calculated along the path from the node given by π(f) to the root by averaging. The contribution of P (c π(f)) is thus reduced by several factors 2, see Fig. 2. Therefor long feature vectors in the tree model will only contribute significantly when the difference in probabilities is large enough and this only happens after many symbols (or s). IV. TREE MODEL WITH FLEXIBLE FEATURE ORDERING We can rearrange the feature vector and move the relevant features to the front. A feature is most relevant if it gives the largest root probability in the CTW algorithm. In Section VI we give more detail on how we determine the relevant ordering of features. Another approach is to use the class III CTW method as described in [6]. In this method we consider tree sources where each split can be labeled by another feature, see Fig. 3. Even those approaches do not result in good classifications. We must conclude that attempting to model the full feature dependencies is not feasible. V. LESS NAIVE BAYES FILTER The success of the naive Bayes approach lies in the fact that the few parameters can be estimated quite well with the little training data that is available. Let us for now assume that the features can be split into sets where the sets are conditionally independent but dependencies exist within a set. So we assume the existence of an integer m that describes the number of sets and a sequence of integers a 0 = < a < a 2 <... < a m = k. The conditional dependence is now given as Pr{f c} = m Pr{f [ai,a i ] c}. (3) Here we introduced the notation for a sub-vector f [i,j] = f i, f i+,..., f j. Consider the likelihood ratio taking (3) into account. Pr{c = 0 f} Pr{c = f} Pr{c = 0} m Pr{f [ai,a = i ] c = 0} Pr{c = } Pr{f [ai,a i ] c = } { } m Pr{c = 0} m Pr{c = 0 f [ai,a = i ]} Pr{c = } Pr{c = f [ai,a i ]} { } m Pr{c = 0} m = ξ(f Pr{c = } [ai,a i ]) (4) Every tree that corresponds to a sub-vector f [ai,a i ] can train on the complete training set of n s. By keeping the sub-vectors small, in terms of dimension, the corresponding model parameters can be estimated quite well. We can use any approach that we discussed before for one tree. We prefer the use of CTW or maybe the class III CTW. The remaining problem is to find a good subdivision of the feature vector. VI. EXPERIMENTAL RESULTS ON GOOD TREE MODELS We obtained a set of classes and features from Sam Roweis course on Machine Learning [7]. This set contains s, each one classified as spam or ham and each one with a feature vector of k = 85 features. 000 are chosen as training data and 4000 are used to test the classifier. In the naive Bayes filter we need an estimate of Pr{c} and Pr{f c}. As in [8] we use the Bernoulli distribution with the beta prior and hyper-parameters α = β = /2. Let n = #( s) in the training set; N(c = ) = #(spam s) in the training set; N(f i =, c = 0) = #(ham s) with f i ; N(f i =, c = ) = #(spam s) with f i.
4 Order Max logpw 30 Error rate Tree depth Tree number Depth of the tree 0.09 Fig. 4. One tree with feature ordering Then we find the following parameter estimates Total error rate Pr{c = } estimate = N(c = ) + 2, n + Pr{f i = c = 0} estimate = N(f i =, c = 0) + 2 n N(c = ) +, Pr{f i = c = } estimate = N(f i =, c = ) + 2. N(c = ) + We applied this in the naive Bayes filter, trained on 000 e- mails and tested on 4000 other s, and determined the number of misclassifications. The result is listed in Table I. This serves as the benchmark for the other methods. For the CTW method with one tree we had to determine a good order of the features. First we ran the algorithm on the training data with a feature vector of dimension and tried all features. We selected the feature that gave the largest probability as given in (9). Then we appended successively every other feature and again selected the feature that maximized (9). This is repeated until all features are used. After training this algorithm gave a result, see Table I, that is far worse than the naive Bayes filter. The reason is that several relevant features were left out because of the small size of the tree models used, as we explained before in Section III. In Fig. 4 we show the classification error rate as a function of the number of features used. We observe an error floor starting at about 25 features. We list the result in Table I under CTW (one tree). Then we build a model with several trees. Our approach was to start with all features and build a tree with the best features as described above, except that we stop adding another feature when the resulting increase in the root probability drops below a fixed ratio threshold. We selected a threshold ratio of 2, in line with the MDL. A factor of 2 implies a one bit shortening of the resulting description length. With the remaining features we build a second tree and so on until all features are used. This is a greedy approach. In Fig. 5 we show the error rate as a function of the number of trees used and also the number of features per tree. We list the final result in Table I under Greedy multi-tree CTW. The performance is significantly better than with the naive Bayes approach. Finally we attempted building trees with flexible feature assignments, such as shown in Fig. 3. The Class III CTW Number of trees used Fig. 5. Greedy multi-tree CTW TABLE I NUMBER OF ERRORS ON THE ROWEIS DATA Algorithm nr. misclassifications Naive Bayes filter 235 CTW (one tree) 348 Greedy multi-tree CTW 205 Multi-tree with flexible feature order 22 from [6] is too complex to implement for a feature vector of length 85. So we used another greedy approach. We build the tree, first finding the single best feature but then extending each leaf of the tree independently with another feature that maximizes that node s probability. We stop with a leaf when the probability does not increase anymore. The result of this approach is listed under Multi-tree with flexible feature order in Table I. Although the model is more flexible and the performance is better than that of the naive approach, it is not the best. VII. CONCLUSION The small experiment we conducted showed that feature vectors are not conditionally independent. Modeling this dependency with a set of context trees is possible and can give a good trade-off between classification power and the amount of training data needed. We employed a compression based approach to find sets of (almost) independent feature vectors and used as decision criterion the (non-)decrease of the conditional entropy. Using a class III method could reduce the number of parameters needed in a tree. However, our experiment does not show the expected gain. The experiment is too small to draw a clear conclusion. Either rearranging the features in a flexible way is not useful for feature vectors or the greedy approach does not result in a good
5 arrangement. Of course it is important to put the features in the correct sub-set. How to do this in an optimal way is still an open question. REFERENCES [] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton University Press, 96. [2] F. M. J. Willems, T. J. Tjalkens, and T. Ignatenko, Context-tree weighting and maximizing: processing betas, in Proceedings of the Inaugural workshop of the ITA. UCSD, La Jolla, USA, feb 2006, pp. 5. [3] F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens, The contex-tree weighting method: Basic properties, IEEE Trans. Inform. Theory, vol. 4, pp , May 995. [4] A. Barron, J. Rissanen, and B. Yu, The minimum description length principle in coding and modeling, IEEE Trans. Inform. Theory, vol. 44, pp , Oct 998. [5] T. J. Tjalkens and F. M. J. Willems, Map model selection for context trees, in Proceedings of the 2006 IEEE International Workshop on Machine learning for Signal processing. MLSP (CDROM), september , pp. 6. [6] F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens, Context weighting for general finite context sources, IEEE Trans. Inform. Theory, vol. 42, pp , Sep 996. [7] S. Roweis. [Online]. Available: roweis/ csc255/ [8] R. E. Krichevsky and V. K. Trofimov, The performance of universal encoding, IEEE Trans. Inform. Theory, vol. IT-27, pp , Mar 98.
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Information, Entropy, and Coding
Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Analysis of Compression Algorithms for Program Data
Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory
Spam Filtering with Naive Bayesian Classification
Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
A Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
FCE: A Fast Content Expression for Server-based Computing
FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao li@mentor.com Fei Li Department of Computer Science
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Analysis of Algorithms I: Optimal Binary Search Trees
Analysis of Algorithms I: Optimal Binary Search Trees Xi Chen Columbia University Given a set of n keys K = {k 1,..., k n } in sorted order: k 1 < k 2 < < k n we wish to build an optimal binary search
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
Measuring the Performance of an Agent
25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
A Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Algorithms and Data Structures
Algorithms and Data Structures Part 2: Data Structures PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Summer Term 2016 Overview general linked lists stacks queues trees 2 2
On Attacking Statistical Spam Filters
On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering
Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears
Bayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
Functional-Repair-by-Transfer Regenerating Codes
Functional-Repair-by-Transfer Regenerating Codes Kenneth W Shum and Yuchong Hu Abstract In a distributed storage system a data file is distributed to several storage nodes such that the original file can
A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng
Learning to Detect Spam: Naive-Euclidean Approach
International Journal of Signal Processing, Image Processing and Pattern Recognition 31 Learning to Detect Spam: Naive-Euclidean Approach Tony Y.T. Chan, Jie Ji, and Qiangfu Zhao The University of Akureyri,
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1
Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms
Bayesian Clustering for Email Campaign Detection
Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Bayesian Spam Detection
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Tweaking Naïve Bayes classifier for intelligent spam detection
682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
Coding and decoding with convolutional codes. The Viterbi Algor
Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images Borusyak A.V. Research Institute of Applied Mathematics and Cybernetics Lobachevsky Nizhni Novgorod
An evolutionary learning spam filter system
An evolutionary learning spam filter system Catalin Stoean 1, Ruxandra Gorunescu 2, Mike Preuss 3, D. Dumitrescu 4 1 University of Craiova, Romania, catalin.stoean@inf.ucv.ro 2 University of Craiova, Romania,
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India
Integrity Preservation and Privacy Protection for Digital Medical Images M.Krishna Rani Dr.S.Bhargavi IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Abstract- In medical treatments, the integrity
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Classification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
Transforming XML trees for efficient classification and clustering
Transforming XML trees for efficient classification and clustering Laurent Candillier 1,2, Isabelle Tellier 1, Fabien Torre 1 1 GRAppA - Charles de Gaulle University - Lille 3 http://www.grappa.univ-lille3.fr
Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
RN-Codings: New Insights and Some Applications
RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently
Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
Spam Filtering based on Naive Bayes Classification. Tianhao Sun
Spam Filtering based on Naive Bayes Classification Tianhao Sun May 1, 2009 Abstract This project discusses about the popular statistical spam filtering process: naive Bayes classification. A fairly famous
Email Filters that use Spammy Words Only
Email Filters that use Spammy Words Only Vasanth Elavarasan Department of Computer Science University of Texas at Austin Advisors: Mohamed Gouda Department of Computer Science University of Texas at Austin
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Email Filtering and Analysis Using Classification Algorithms
www.ijcsi.org 115 Email Filtering and Analysis Using Classification Algorithms Akshay Iyer, Akanksha Pandey, Dipti Pamnani, Karmanya Pathak and IT Dept, VESIT Chembur, Mumbai-74, India Prof. Mrs. Jayshree
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1998-5
S I N S UN I ER E P S I T VER M A TA S CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1998-5 UNIVERSITY OF TAMPERE DEPARTMENT OF
Discuss the size of the instance for the minimum spanning tree problem.
3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can
Image Content-Based Email Spam Image Filtering
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
Lasso-based Spam Filtering with Chinese Emails
Journal of Computational Information Systems 8: 8 (2012) 3315 3322 Available at http://www.jofcis.com Lasso-based Spam Filtering with Chinese Emails Zunxiong LIU 1, Xianlong ZHANG 1,, Shujuan ZHENG 2 1
An Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks
CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters
THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM
THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM Iuon Chang Lin Department of Management Information Systems, National Chung Hsing University, Taiwan, Department of Photonics and Communication Engineering,
Scalable, Updatable Predictive Models for Sequence Data
Scalable, Updatable Predictive Models for Sequence Data Neeraj Koul, Ngot Bui, Vasant Honavar Artificial Intelligence Research Laboratory Dept. of Computer Science Iowa State University Ames, IA - 50014,
Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1
Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Maitreya Natu Dept. of Computer and Information Sciences University of Delaware, Newark, DE, USA, 19716 Email: natu@cis.udel.edu
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
Decision Trees for Mining Data Streams Based on the Gaussian Approximation
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu