Image Spam Filtering by Content Obscuring Detection
|
|
|
- Mariah Berry
- 9 years ago
- Views:
Transcription
1 Image Spam Filtering by Content Obscuring Detection Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Dept. of Electrical and Electronic Eng., University of Cagliari Piazza d Armi, Cagliari, Italy {bat,fumera,pillai,roli}@diee.unica.it ABSTRACT We address the problem of filtering image spam, a rapidly spreading kind of spam in which the text message is embedded into attached images to defeat spam filtering techniques based on the analysis of s body text. We propose an approach based on low-level image processing techniques to detect one of the main characterstics of most image spam, namely the use of content obscuring techniques to defeat OCR tools. A preliminary experimental evaluation of our approach is reported on a personal data set of spam images publicly available. 1. INTRODUCTION Current spam filters are made up by several modules devoted to analyze different components of s (sender s address, header, body, attachments), to detect specific characteristics of spam s. The outputs of the different modules are then combined to get the final decision about the spamminess of an . A typical example of this architecture is the open-source filter SpamAssassin. 1 In the past ten years the machine learning research community has been interested in the spam filtering task, and in particular in the analysis of s textual content, formerly carried out by simple hand-made filtering rules like keyword detection, which were easily circumvented by spammers. Text categorization techniques with potentially higher generalization capability have been developed (see for instance [10, 5, 7, 1]) and are now used in many spam filters. To defeat such techniques, recently spammers started to embed the spam message into attached images (often bogus text is also inserted into s body), a trick known as image spam (see the examples in Figure 2). The rapid spread of image spam leads us to argue that in the near future computer vision and pattern recognition techniques will gain a prominent role against image spam. Using OCR tools to extract text embedded into images, and processing it using text categorization techniques was thoroughly investigated by the authors in [6]: it was found that this approach can be effective for clean images. A much simpler approach based on keyword search on text extracted by OCR was recently implemented in a plug-in of 1 CEAS Fourth Conference on and Anti-Spam, August 2-3, 2007, Mountain View, California USA the SpamAssassin filter. 2 However very recently spammers started to obscure image text to defeat OCR tools (as can be seen from the examples in Figure 2). To this aim, it is worth noting that spammers could exploit to their advantage techniques used to create CAPTCHAs (see Figure 2, bottom), which were introduced just to defend against robot spamming. Although we believe that content obscuring techniques are not likely to be used in every kind of spam (for instance, phishing s should look as if they come from reputable senders, and thus should be as clean as possible), so that spam filtering modules based on OCR techniques can still be useful, we also argue that different techniques should be devised for the cases in which noise introduced by spammers il likely to make standard OCR tools ineffective. To our knowledge, few works in the scientific literature dealt with image spam so far [2, 11], and no one addressed the specific issue of noisy text images, although some vendors have already included in their filters image processing modules. In this work we propose a specific approach aimed at detecting the presence of noisy text due to the use of content obscuring techniques against OCR tools. This can be viewed as complementary to the approach based on OCR tools investigated in [6], since it aims at detecting the noise (the adversarial clutter contained in the image) instead of the signal (the text embedded into the image). The presence of noisy text can be used by a spam filter as an indication of s spamminess, to be properly combined with the outputs provided by other filtering modules to reach a final decision. An overview of our approach is given in Section 2. In Section 3 we describe a possible implementation we are currently investigating, based on detecting the similar effects of many content obscuring techniques on binarized images. In Section 4 a preliminary experimental analysis of our approach is reported, on a personal data set of spam images publicly available. 2. PROPOSED APPROACH In this paper we propose an approach against the kind of image spam in which obscuring techniques are used by spammers to make the embedded text unreadable by OCR tools. Our approach consists in developing techniques to detect the presence of noisy text into an image. Such techniques are intended to be used by a specific module of a spam filter, whose output could be either a crisp label indicating the presence or absence of noisy text, or a real num- 2 BayesInSpamAssassin
2 ber indicating the amount of noise in a proper scale (for instance in the range [0, 1], where 0 means that the image is clean). The rationale is that the presence of noisy text can be considered as an indication that an is spam. However this can not give the certainty that the is spam, since noisy text could also be present in legitimate images. This is the reason why we do not intend the output of such a module to be directly a spam/legitimate label, or the likelyhood (or probability estimate) that the is spam. Instead, we intend such output as providing a piece of information which will subsequently have to be combined with the outputs provided by other modules to reach a final and hopefully more reliable decision, as happens in current filters. As a trivial example, if an has an attached image with noisy text but the sender address is in the white list, it can be labelled as legitimate by the spam filter. The techniques proposed in this work to detect the presence of noisy text into an image are based on the following idea. Different kinds of content obscuring techniques can be used against OCR (adding background noise interfering with text, distorting text lines or single characters, etc.), including methods developed for building CAPTCHAs, and many of them have already been observed (see Figure 2). Clearly, methods tailored to detect specific obscuring techniques are likely to exhibit a poor generalization capability and to be easily circumvented by spammers. However we observed that several obscuring techniques like the ones in Figure 2 (except for the one at the bottom-right) turn out to compromise OCR effectiveness by producing similar effects on the binarized image (note that OCR tools work by first binarizing images to divide characters from background). In particular, these effects consists in broken characters, characters interfering with background noise components, and large background components overlapping characters. Accordingly, we argue that many obscuring techniques (although not all of them, for instance the ones based on character distortion as at the bottom-right of Figure 2) can be detected by looking for the kind of image defects mentioned above in the binarized image. In the rest of the paper we will discuss a method we are developing to detect such kind of noise and to measure its amount. 3. NOISY TEXT DETECTION The problem we are addressing exhibit some analogies with the problem of measuring image quality addressed in the OCR literature (see for instance [4]). However these methods are based on specific assumptions which do not hold in our task (like equally sized characters). An interesting suggestion came instead to us from the BaffleText CAPTCHA [3], which uses random masking to degrade text images resulting in image defects similar to the ones we are interested in. In [3] the complexity of an image for a human reader (not for an OCR) was evaluated through the perimetric complexity measure used in the psychophysics of reading literature [9], defined as the squared length of the boundary between black and white pixels (the perimeter ) in the whole image, divided by the black area, P 2 /A. In [3] P 2 /A was computed on the whole image, but this does not fit our goals since, for instance, the P 2 /A value depends on the number of characters into an image. However, taking into account that P 2 /A is scale-invariant, we found that measuring the P 2 /A value of each individual component of a binarized image can allow to characterize the presence of charachters broken or interfering with noise components, as described in the following. We found that clean characters are characterized approximately by values of P 2 /A in the range (16, 150] (where P was computed as the number of background pixels 4-connected to at least one foreground pixel, and A is the number of foreground pixels) and of the aspect ratio (the ratio between width and height) in the range [0.25, 2.5]. Instead, the components corresponding to broken characters and to small background noise can exhibit a lower P 2 /A value than 16, while components corresponding to two or more characters connected through noise components, can exhibit a larger P 2 /A values than 150 or an aspect ratio outside the above interval. The above observations suggested us two measures of the degree of image noise, one related to the presence of broken characters or small noise components (denoted as ), the other to characters connected through large noise components (f 2). In both cases the starting point is a binarized image, on which all connected components are first identified (in this work, 8-connectivity is used to this aim), and the P 2 /A value and aspect ratio are computed for each of them. The and f 2 measures are based on the following rationale: if a text area contains the considered kind of noisy text, then small regions of the text area will contain on average both character-like and noise-like components. Accordingly, we first subdvided an image into a grid of p q equally sized cells C ij, i = 1,..., p; j = 1,..., q (p = q = 10 was used in this work), and then computed the average fraction of noiselike components over all cells. More precisely, was computed by considering for each cell C ij the components whose center of mass belongs to C ij. The number c ij of characterlike components (the ones with P 2 /A in the range (16, 150] and aspect ratio in the range [0.25, 2.5]), and the number n ij of noise components (the ones with either P 2 /A or aspect ratio, or both, outside the above intervals) was then computed. Then, considering only the cells with c ij > 0, i.e. containing at least one character-like component (we denote the number of such cells with c 1 0), we computed for each of them the fraction of noise components f ij = n ij/(n ij +c ij) [0, 1), where a vaue of 0 denotes the presence of clean text. We finally computed the average fraction of noise components over the c 1 0 cells, = 1 c 1 0 P C ij :c ij >0 n ij n ij +c ij [0, 1]. If c ij = 0 for all cells (namely, c 1 0 = 0), this means that the text area contains only noise-like components: is defined accordingly as 1. The computation of f 2 is similar. However, since the considered noise components can spread over several cells, we count the number of pixels of character-like components (c p ij ) and of noise-like components whose P 2 /A value is higher than 150 (n p ij ) lying into each cell Cij. Then, analogously to, we consider only cells containing at least one characterlike component (let c 2 0 be their number), and compute the p n PC ij :c pij ij >0 n p ij +cp ij average fraction of noise components, f 2 = 1 c 2 0 [0, 1], where a value of 0 denotes clean text. Accordingly, if c p ij = 0 for all cells (namely, c2 0 = 0), f 2 is defined as 1. We finally devised a third measure (f 3) aimed at detecting the presence of large background noise components overlapping with characters, which can occur when text is placed over non-uniform background as in Figure 2 (center-bottom). To this aim, we extract the edges from the original image using the Canny algorithm, and compute the average num-
3 ber of edge pixels which lie inside each component of the binarized image. More precisely, the fraction of edge pixels which lie inside a component C is computed as the number of edge pixels which correspond to any pixel of C, excluding the pixels of its inner perimeter (defined as the set of pixels of the component 4-connected with at least one background pixel in the binarized image), divided by the total number of edge pixels that are not part of the inner perimeter of any component. The f 3 measure lies as well in the range [0, 1], where 0 denotes the absence of the considered kind of noise. In the next section we present some preliminary experiments to assess to what extent the measures defined above are able to detect the presence and measure the amount of the considered kind of image noise. 4. EXPERIMENTAL RESULTS We give first a demonstration of the capability of the three proposed measures to detect and measure the extent of the considered kinds of noisy text. To this aim we constructed an image which contains the 26 characters of the English alphabet, both uppercase and lowercase (see Figure 1). We considered four different obscuring techniques shown in Figure 1 (two examples with different amounts of noise are shown for each technique): from top-left to bottomright: small background noise components around text; a grid made up o-pixel wide lines of the same colour as the background overlapped with text; a non-uniform background; a grid made up o-pixel wide lines of the same colour as the text, overlapped with it. Obscuring techniques similar to the first three above were observed by the authors in real spam images (see Figure 2), while the last one was proposed in the EZ-Gimpy visual CAPTCHA used by Yahoo [8]. The values of the, f 2 and f 3 measures turned out to be 0 for the clean image, correctly denoting the presence of clean text. The values of the three measures for the noisy images reported in Figure 1 are shown below each image. The two obscuring techniques in the four top images of Figure 1 resulted in broken characters or in small background noise components around characters, which are the focus of the measure. As expected, it can be seen that the values increase as the degradation level increases, while the values of f 2 and f 3 (which are aimed to detect different effects of obscuring techniques) remain zero. The same happens to the f 3 measure for the first and second image (bottom), when large background components turn out to hide some characters, and to the f 2 measure when characters in the binarized image turn out to be connected through noise components (the two right-bottom images). We now report some preliminary experimental results on a data set o86 real spam images collected at the personal authors mailboxes (since no publicly available data sets of spam images is available yet, to our knowledge), available at Image binarization was carried out using the demo version of the commercial software ABBYY FineReader 7.0 Professional, 3 with default parameter settings. Content obscuring techniques aimed at defeating OCR tools were applied by spammers on 96 images (see Figure 2). Note that some of the obscuring techniques did not result in the kinds of noise considered in Section 3 (as in Figure 2, bottom-right image). We found that on 29 out of the 96 noisy images no 3 noise remained after binarization. The remaining 90 images were clean or contained a limited amount of random noise probably aimed at defeating detection techniques based on digital signatures instead of OCR tools. Figure 3 shows the values of the three measures for each of the 186 spam images. Most of the 119 clean binarized images turned out to exhibit low values of, f 2 and f 3. Instead, the 67 noisy binarized images are spread across a larger range of values for the three measures. In particular, we observed that each measure took on high values for images characterized by the presence of the corresponding kind of noise. As an example, consider the third image from top in Figure 2: in the binarized image some characters are broken or connected by background line segments, but no one is hidden by background components. This is revealed by high values of and f 2, respectivey 0.58 and 0.89, while the value of f 3 is low (0.08) as expected. Consider finally the case of spam images in which the effect of noise is different than the one addressed by these three measures, as in the bottom-right image of Figure 2: in the binarized image there is no significant amount of character breaking or merging, and there are no background noise components. The values of three measures turned out to be low as expected ( = 0.08, f 2 = f 3 = 0). As explained in Section 3, to detect such kind of obscuring techniques other measures should be used, for instance related to character alignment or deformation. To sum up, the above preliminary experiments provided some evidence that our approach can be exploited to design modules of spam filters aimed at detecting image spam as described in section 2, and in particular the use of content obscuring techniques applied to text embedded into images, which is one of the main charateristics of current image spam and it is likely to be widely used in the near feature. We believe that works in the OCR literature, in particular related to image quality measures, and recent works on CAPTCHAs could give further useful suggestions for the development of this research direction. Ongoing work is aimed at evaluating the proposed approach on legitimate images. 5. REFERENCES [1] A. Androutsopoulos, J. Koutsias, K. V. Cbandrinos, and C. D. Spyropoulos. An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal messages. In Proc. ACM Int. Conf. on Research and Developments in Information Retrieval, pages , [2] H. Aradhye, G. Myers, and J. A. Herson. Image analysis for efficient categorization of image-based spam . In Proc. Int. Conf. Document Analysis and Recognition, pages , [3] H. S. Baird and M. Chew. Baffletext: a human interactive proof. In Proc. IS&T/SPIE Document Recognition & Retrieval Conf., [4] L. R. Blando, J. Kanai, and T. A. Nartker. Prediction of OCR accuracy using simple image features. In Proc. Int. Conf. on Document Analysis and Recognition, pages , [5] H. Drucker, D. Wu, and V. N. Vapnik. Support vector machines for spam categorization. IEEE Transaction on Neural Networks, 10(5): , [6] G. Fumera, I. Pillai, and F. Roli. Spam filtering based on the analysis of text information embedded into
4 Figure 1: Four obscuring techniques applied to an artificial image with embedded text (two different amounts of noise are shown for each technique). = 0.63 f 2 = 0.19 f 3 = 0.01 = 0.58 f 2 = 0.89 f 3 = 0.08 = 0.08 f 2 = 0.00 f 3 = 0.00 Figure 2: Examples of real spam images (details) from the data set used for the experiments. Figure 3: Plots of the, f 2 and f 3 measures for the 186 spam images (ordered along the x-axis). : clean images; : obscured images (clean binarized images); : obscured images (noisy binarized images).
5 images. Journal of Machine Learning Research (special issue on Machine Learning in Computer Security), 7: , [7] P. Graham. A plan for spam, [8] G. Mori and J. Malik. Recognizing objects in adversarial clutter: breaking a visual captcha. In Proc. Int. Conf. Computer Vision and Pattern Recognition, volume I, pages , [9] D. G. Pelli, C. W. Burns, B. Farell, and D. C. Moore-Page. Feature detection and letter identification. Vision Research, 46: , [10] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk . AAAI Technical Report WS-98-05, Madison, Wisconsin, [11] C.-T. Wu, K.-T. Cheng, Q. Zhu, and Y.-L. Wu. Using visual features for anti-spam filtering. In Proc. IEEE Int. Conf. on Image Processing, volume III, pages , 2005.
Image Spam Filtering Using Visual Information
Image Spam Filtering Using Visual Information Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli, Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari, Italy
Image spam filtering using textual and visual information
Image spam filtering using textual and visual information Giorgio Fumera Ignazio Pillai Fabio Roli Battista Biggio Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari,
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
Journal of Machine Learning Research 7 (2006) 2699-2720 Submitted 3/06; Revised 9/06; Published 12/06 Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Giorgio Fumera Ignazio
Combining Optical Character Recognition (OCR) and Edge Detection Techniques to Filter Image-Based Spam
Combining Optical Character Recognition (OCR) and Edge Detection Techniques to Filter Image-Based Spam B. Fadiora Department of Computer Science The Polytechnic Ibadan Ibadan, Nigeria [email protected]
Improved Spam Filter via Handling of Text Embedded Image E-mail
J Electr Eng Technol Vol. 9, No.?: 742-?, 2014 http://dx.doi.org/10.5370/jeet.2014.9.7.742 ISSN(Print) 1975-0102 ISSN(Online) 2093-7423 Improved Spam Filter via Handling of Text Embedded Image E-mail Seongwook
Fusion of Text and Image Features: A New Approach to Image Spam Filtering
Fusion of Text and Image Features: A New Approach to Image Spam Filtering Congfu Xu 1, Kevin Chiew 2, Yafang Chen 1,andJuxinLiu 1 1 Institute of Artificial Intelligence, Zhejiang University, Hangzhou,
A survey and experimental evaluation of image spam filtering techniques
A survey and experimental evaluation of image spam filtering techniques Battista Biggio, Giorgio Fumera, Ignazio Pillai and Fabio Roli Department of Electrical and Electronic Engineering, University of
Learning Fast Classifiers for Image Spam
Learning Fast Classifiers for Image Spam Mark Dredze Computer and Information Sciences Dept. University of Pennsylvania Philadelphia, PA 19104 [email protected] Reuven Gevaryahu Philadelphia, PA 19104
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
1 Introductory Comments. 2 Bayesian Probability
Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham s website at www.paulgraham.com/ffb.html, and the second was a paper
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary [email protected] http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
Image Based Spam: White Paper
The Rise of Image-Based Spam No matter how you slice it - the spam problem is getting worse. In 2004, it was sufficient to use simple scoring mechanisms to determine whether email was spam or not because
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: [email protected], Abstract- Unethical
A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng
An Efficient Spam Filtering Techniques for Email Account
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-10, pp-63-73 www.ajer.org Research Paper Open Access An Efficient Spam Filtering Techniques for Email
On Attacking Statistical Spam Filters
On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Abstract. The efforts of anti-spammers
A Novel Approach towards Image Spam Classification
A Novel Approach towards Image Spam Classification M.Soranamageswari, Dr.C.Meena Abstract The volume of unsolicited commercial mails has grown extremely in the past few years because of increased internet
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
A Three-Way Decision Approach to Email Spam Filtering
A Three-Way Decision Approach to Email Spam Filtering Bing Zhou, Yiyu Yao, and Jigang Luo Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {zhou200b,yyao,luo226}@cs.uregina.ca
6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME
INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976-6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET)
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
Filtering Junk Mail with A Maximum Entropy Model
Filtering Junk Mail with A Maximum Entropy Model ZHANG Le and YAO Tian-shun Institute of Computer Software & Theory. School of Information Science & Engineering, Northeastern University Shenyang, 110004
A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College
Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam
Groundbreaking Technology Redefines Spam Prevention Analysis of a New High-Accuracy Method for Catching Spam October 2007 Introduction Today, numerous companies offer anti-spam solutions. Most techniques
Email Classification Using Data Reduction Method
Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user
An Efficient Three-phase Email Spam Filtering Technique
An Efficient Three-phase Email Filtering Technique Tarek M. Mahmoud 1 *, Alaa Ismail El-Nashar 2 *, Tarek Abd-El-Hafeez 3 *, Marwa Khairy 4 * 1, 2, 3 Faculty of science, Computer Sci. Dept., Minia University,
PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering
2007 IEEE/WIC/ACM International Conference on Web Intelligence PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering Khurum Nazir Juneo Dept. of Computer Science Lahore University
Naive Bayes Spam Filtering Using Word-Position-Based Attributes
Naive Bayes Spam Filtering Using Word-Position-Based Attributes Johan Hovold Department of Computer Science Lund University Box 118, 221 00 Lund, Sweden [email protected] Abstract This paper
Filtering Spam Using Search Engines
Filtering Spam Using Search Engines Oleg Kolesnikov, Wenke Lee, and Richard Lipton ok,wenke,rjl @cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA 30332 Abstract Spam filtering
Filtering Image Spam with Near-Duplicate Detection
Filtering Image Spam with Near-Duplicate Detection Zhe Wang, William Josephson, Qin Lv, Moses Charikar, Kai Li Computer Science Department, Princeton University 35 Olden Street, Princeton, NJ 08540 USA
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift Sarah Jane Delany 1 and Pádraig Cunningham 2 and Barry Smyth 3 Abstract. While text classification has been identified for some time
Fuzzy Logic for E-Mail Spam Deduction
Fuzzy Logic for E-Mail Spam Deduction P.SUDHAKAR 1, G.POONKUZHALI 2, K.THIAGARAJAN 3,R.KRIPA KESHAV 4, K.SARUKESI 5 1 Vernalis systems Pvt Ltd, Chennai- 600116 2,4 Department of Computer Science and Engineering,
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
Detecting E-mail Spam Using Spam Word Associations
Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 [email protected] 2 [email protected]
AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH
AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH MUMTAZ MOHAMMED ALI AL-MUKHTAR College of Information Engineering, AL-Nahrain University IRAQ ABSTRACT The spam has now become a significant security issue
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift Sarah Jane Delany 1 and Pádraig Cunningham 2 Abstract. While text classification has been identified for some time as a promising application
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Differential Voting in Case Based Spam Filtering
Differential Voting in Case Based Spam Filtering Deepak P, Delip Rao, Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology Madras, India [email protected],
Immunity from spam: an analysis of an artificial immune system for junk email detection
Immunity from spam: an analysis of an artificial immune system for junk email detection Terri Oda and Tony White Carleton University, Ottawa ON, Canada [email protected], [email protected] Abstract.
A Case-Based Approach to Spam Filtering that Can Track Concept Drift
A Case-Based Approach to Spam Filtering that Can Track Concept Drift Pádraig Cunningham 1, Niamh Nowlan 1, Sarah Jane Delany 2, Mads Haahr 1 1 Department of Computer Science, Trinity College Dublin 2 School
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study Xiuyi Jia 1,4, Kan Zheng 2,WeiweiLi 3, Tingting Liu 2, and Lin Shang 4 1 School of Computer Science and Technology, Nanjing University
Abstract. Find out if your mortgage rate is too high, NOW. Free Search
Statistics and The War on Spam David Madigan Rutgers University Abstract Text categorization algorithms assign texts to predefined categories. The study of such algorithms has a rich history dating back
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
PARTIAL IMAGE SPAM E-MAIL DETECTION USING OCR
PARTIAL IMAGE SPAM E-MAIL DETECTION USING OCR V. Sathiya *1 M.Divakar #2 T.S. Sumi *3 1 Faculty, Department of M.C.A, Panimalar Engineering College, Anna University, Chennai, India 2 PG Scholar, Department
Impact of Feature Selection Technique on Email Classification
Impact of Feature Selection Technique on Email Classification Aakanksha Sharaff, Naresh Kumar Nagwani, and Kunal Swami Abstract Being one of the most powerful and fastest way of communication, the popularity
Spam Filtering with Naive Bayesian Classification
Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011
An Algorithm for Classification of Five Types of Defects on Bare Printed Circuit Board
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 3, July 2011 CSES International 2011 ISSN 0973-4406 An Algorithm for Classification of Five Types of Defects on Bare
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Bayesian Spam Detection
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional
Increasing the Accuracy of a Spam-Detecting Artificial Immune System
Increasing the Accuracy of a Spam-Detecting Artificial Immune System Terri Oda Carleton University [email protected] Tony White Carleton University [email protected] Abstract- Spam, the electronic
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
WE DEFINE spam as an e-mail message that is unwanted basically
1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir
Signature Region of Interest using Auto cropping
ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 1 Signature Region of Interest using Auto cropping Bassam Al-Mahadeen 1, Mokhled S. AlTarawneh 2 and Islam H. AlTarawneh 2 1 Math. And Computer Department,
Accelerating Techniques for Rapid Mitigation of Phishing and Spam Emails
Accelerating Techniques for Rapid Mitigation of Phishing and Spam Emails Pranil Gupta, Ajay Nagrale and Shambhu Upadhyaya Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {pagupta,
Spam Filter Optimality Based on Signal Detection Theory
Spam Filter Optimality Based on Signal Detection Theory ABSTRACT Singh Kuldeep NTNU, Norway HUT, Finland [email protected] Md. Sadek Ferdous NTNU, Norway University of Tartu, Estonia [email protected] Unsolicited
An Overview of Spam Blocking Techniques
An Overview of Spam Blocking Techniques Recent analyst estimates indicate that over 60 percent of the world s email is unsolicited email, or spam. Spam is no longer just a simple annoyance. Spam has now
Cosdes: A Collaborative Spam Detection System with a Novel E- Mail Abstraction Scheme
IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 9 (September 2012), PP 55-60 Cosdes: A Collaborative Spam Detection System with a Novel E- Mail Abstraction Scheme
Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks
CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization
International Journal of Network Security, Vol.9, No., PP.34 43, July 29 34 An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization Jyh-Jian Sheu Department of Information Management,
Email Filters that use Spammy Words Only
Email Filters that use Spammy Words Only Vasanth Elavarasan Department of Computer Science University of Texas at Austin Advisors: Mohamed Gouda Department of Computer Science University of Texas at Austin
Colorado School of Mines Computer Vision Professor William Hoff
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Introduction to 2 What is? A process that produces from images of the external world a description
Spam Filter: VSM based Intelligent Fuzzy Decision Maker
IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 ISSN : 0976-8491(Online Spam Filter: VSM based Intelligent Fuzzy Decision Maker Dr. Sonia YMCA University of Science and Technology, Faridabad, India E-mail
Email Marketing Glossary of Terms
Email Marketing Glossary of Terms A/B Testing: A method of testing in which a small, random sample of an email list is split in two. One email is sent to the list A and another modified email is sent to
An Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
Neural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN
Detecting Image Spam Using Image Texture Features
Detecting Image Spam Using Image Texture Features Basheer Al-Duwairi*, Ismail Khater and Omar Al-Jarrah *Department of Network Engineering & Security Department of Computer Engineering Jordan University
Why Content Filters Can t Eradicate spam
WHITEPAPER Why Content Filters Can t Eradicate spam About Mimecast Mimecast () delivers cloud-based email management for Microsoft Exchange, including archiving, continuity and security. By unifying disparate
Detection and mitigation of Web Services Attacks using Markov Model
Detection and mitigation of Web Services Attacks using Markov Model Vivek Relan [email protected] Bhushan Sonawane [email protected] Department of Computer Science and Engineering, University of Maryland,
Detecting Spam in VoIP Networks
Detecting Spam in VoIP Networks Ram Dantu, Prakash Kolan Dept. of Computer Science and Engineering University of North Texas, Denton {rdantu, prk2}@cs.unt.edu Abstract Voice over IP (VoIP) is a key enabling
Mining E-mail Authorship
Mining E-mail Authorship Olivier de Vel Information Technology Division Defence Science and Technology Organisation P.O. Box 1500 Salisbury 5108, Australia [email protected] ABSTRACT In
Spam or Not Spam That is the question
Spam or Not Spam That is the question Ravi Kiran S S and Indriyati Atmosukarto {kiran,indria}@cs.washington.edu Abstract Unsolicited commercial email, commonly known as spam has been known to pollute the
Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
IJCSET October 2012 Vol 2, Issue 10, 1447-1451 www.ijcset.net ISSN:2231-0711 Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme I.Kalpana, B.Venkateswarlu Avanthi Institute
Adaption of Statistical Email Filtering Techniques
Adaption of Statistical Email Filtering Techniques David Kohlbrenner IT.com Thomas Jefferson High School for Science and Technology January 25, 2007 Abstract With the rise of the levels of spam, new techniques
Automatic Extraction of Signatures from Bank Cheques and other Documents
Automatic Extraction of Signatures from Bank Cheques and other Documents Vamsi Krishna Madasu *, Mohd. Hafizuddin Mohd. Yusof, M. Hanmandlu ß, Kurt Kubik * *Intelligent Real-Time Imaging and Sensing group,
