Fusion of Text and Image Features: A New Approach to Image Spam Filtering

Size: px
Start display at page:

Download "Fusion of Text and Image Features: A New Approach to Image Spam Filtering"

Transcription

1 Fusion of Text and Image Features: A New Approach to Image Spam Filtering Congfu Xu 1, Kevin Chiew 2, Yafang Chen 1,andJuxinLiu 1 1 Institute of Artificial Intelligence, Zhejiang University, Hangzhou, China 2 School of Engineering, Tan Tao University, Long An, Vietnam Abstract. While enjoying the convenience of communications, many users have also experienced annoying spam. Even if the current spam detecting approaches have gained a competitive edge against text-based spam, they still face the challenge arising from imagebased spam (image spam in short). Image spam normally includes embedded images that contain the spam messages in binary format rather than text format and cost more storage and bandwidth resources. In this paper, we propose a hybrid image spam filtering framework to detect spam images based on both extracted text and image features. Our experimental results show that our approach achieves significant improvement in detection accuracy as compared with other methods that simply use text or image features, and works robustly in an environment with either complex background or compression artifact. 1 Introduction Nowadays one of the most pervasive applications of the Internet is the service which has brought great convenience in our communications. While enjoying the facilities of service, users are also facing a big number of annoying spam. spam, of which the volume has been growing tremendously in past few years as reported, has also decreased the quality of service. This is partly because spam costs the resources of storage and communication bandwidth. Moreover, a latest news 1 reports a research result telling that spam produces millions of tons of CO 2 globally every year. Many solutions are proposed for detecting and filtering spam s to prevent them from being received, forwarded, and spread. The basic technique for these solutions is to train classifiers to identify spam images from ham (hold-andmodify) images. These classifiers normally use two types of rules: (a) rules based on connection and relay properties of s, and (b) rules using the features extracted from the contents of s. The second type of rules that carry out contents filtering by using machine learning mechanisms such as Naive Bayes classification or support vector machines (SVM), have been a cornerstone of anti-spam systems [16] and have shown the advantage of high accuracy. However, currently there is a new attack which could be devastating on content filters. Instead of obscuring the message s text, spammers now are able to 1 See Y. Wang and T. Li (Eds.): Practical Applications of Intelligent Systems, AISC 124, pp springerlink.com c Springer-Verlag Berlin Heidelberg 2011

2 130 C. Xu et al. Fig. 1. Examples of spam images (noticing the high amount of text and the use of text obfuscation technique against OCR) defeat text analysis techniques by replacing text with images. A whitepaper released in November 2006 [17] shows the rise of image spam from 10% in April to 27% of all spam in October 2006 totaling up to 48 billion s every day. A possible way to detect image spam is using a pipeline of an optical character recognition (OCR) system, which extracts and recognizes embedded text, followed by a text classifier that separates spam from legitimate content. It was found that this approach can be effective for clean images [8]. However image spam has allowed spammers to design spam as CAPTCHAs (see the right part of Figure 1) or use obscuring image text to defeat OCR tools. Thus if an image spam filter is equipped with an OCR-based module as the unique countermeasure against spam, it is vulnerable to image spam with obfuscated text. In this paper, we propose a solution for image spam filtering. Since most of spam images contain large proportions of text as shown in Figure 1, our solution first extracts the text information embedded into images, together with the image information that can be identified by the unique properties [14] of spam images as compared with those of natural scene images or generic computer-generated graphic images. We then use a combinational filter with two-layer structure for training and classification, of which the bottom-layer classifiers obtain the image spam confidence score by using the two types of features, and a top-layer classifier makes the final decision by using the outputs of the bottom-layer classifiers. The remaining sections of the paper are organized as follows. Firstly in Section 2 we review the related work on the filtering techniques for contentbased image spam, following which in Section 3 we introduce the framework of image spam filtering in details. In Section 4, we report experimental results on real data sets of ham and spam images, and conclude the paper in Section 5. 2 Related Work The detection of image spam is a special case of image categorization, which is addressed as a task of two-class classification between ham and spam images in [1, 6,8] and has been extensively studied in context of many important applications. In [1], Aradhye et al. used a support vector classifier to extract the text regions in an image, followed by which they identified five visual features of the spam. The first feature is the relative area of the image occupied by text. It is used with the underlying idea that spam images usually contain more text than

3 Fusion of Text and Image Features: A New Approach 131 legitimate images. The other features such as color heterogeneity and saturation are identified over text and non-text regions based on the assumption that images of which the main part are synthetic are normally more likely to be spam. Based on the method in [1], Dredze et al. [6] proposed to use different kind of features. Although some visual features are used (like average RGB colors, the relative area occupied by the most common color, and color saturation features as in [1]), the most important role is played by metadata extracted from the images. They also introduced a feature selection algorithm (JIT) to select the most discriminant features based on their speed as well as the predictive power. Fumera et al. [8] proposed an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments. This approach is based on the consideration that text embedded into images plays the same role as text in the body of s without images (i.e., it conveys the spam messages). After extracting text with OCR tools from images attached to s, they carried out the semantic analysis of text using text categorization techniques like the ones applied to the body of the without images. A method [4] is presented to recognize image spam based on detecting the presence of content obscuring techniques which aim to compromise the OCR effectiveness. The implementation is based on two low-level image features aimed at measuring the extent of character breaking or the presence of small noise components, and the presence of merged characters or large noise components. Nhung and Phuong used simple edge-based features [16] to compute a vector of similarity scores between an image and a set of templates. This similarity vector is then used with an SVM to separate spam images from other common categories of images. In [11] specific features are selected for inspection by the components-based method, and then the spam-filter system uses these features to identify image spam by feature matching. 3 Hybrid Framework for Image Spam Filtering Since the content obscuring techniques can defeat the attempts of using OCR tools [8] to detect text embedded into images, to filter such image spam, we propose an image categorization approach that detects both text and image features. Figure 2 shows the proposed hybrid framework for image spam filtering. The framework works by three phases. Firstly, we calculate the features of an input spam . This work includes keyword detection and text-related features extraction. We then use an SVM to obtain the image spam confidence score. Secondly, we define a small number of reliable spam-indicative features from the image metadata and image color properties, and then use an SVM again to classify the image. Lastly, we use fusion classifier to make a decision based on the outputs of both text and image classifiers. An example of a spam image is shown in Figure 3. The spam image is identified by our framework as a ham image with the confidence score of by the image classifier and as a spam image by the text classifier with the confidence score of Thus finally the image is identified as a spam image after fusion

4 132 C. Xu et al. Fig. 2. Architecture of our hybrid framework for image spam filtering Fig. 3. An example of spam image of both confidence scores. The functions of major components are introduced as follows. 3.1 Keyword Detection Semantic analysis of text embedded into images first requires text extraction by techniques such as OCR which may bring with the following two issues: (a) high computational complexity and (b) susceptible to content obscuring techniques. For the first issue, it is possible to reduce the computational complexity by using a hierarchical architecture for the spam filter. Text extraction and analysis are carried out only if the previous and less complex modules are unable to reliably identify whether an is legitimate or not. To further reduce computational complexity, techniques based on image signature could be employed. For the second issue, since embedded text extraction is often inaccurate, we use keyword detection to improve classification accuracy. We first define a keyword set composed of thirty words and five phrases. And then, for every image we calculate a feature indicating whether at least one element of the keyword set is detected in the text extracted by an OCR system. Performing OCR on images attached to s is carried out by the demo version of the commercial software ABBYY FineReader 8.0 Professional with default parameter settings.

5 Fusion of Text and Image Features: A New Approach Text-Related Features Extraction The text-related features detect the properties of text in an image. The text regions in the image are firstly extracted. A subsequent step defines some features from the image by using the extracted text regions. Our method of text region extraction comprises the following three main steps. Step 1: Edge detection. A convolution operation with a compass operator [12] is used to generate intensity images of four oriented edges which are at 0, 45,90 and 135 orientations respectively. For color images, we convert them into gray images at first. Step 2: Feature generation. We first subdivide an image into a grid of w h equally sized cells C ij where i =1,...,w and j =1,...,h(each cell is as big as pixels in this work), and then compute the six features over all cells. These six features, namely mean μ, standard deviation σ, energye g, entropy E t, inertial-quadrature I, and local homogeneity H, are defined by the following Equations (1) to (6) [5, 9]: μ = 1 w h E(i, j) (1) w h i=1 j=1 σ = 1 w h [E(i, j) μ] w h 2 (2) E g = i,j i=1 j=1 E 2 (i, j) (3) E t = i,j I = i,j H = i,j E(i, j)loge(i, j) (4) (i j) 2 E(i, j) (5) 1 E(i, j) (6) 1+(i j) 2 in which E(i, j) is the normalized symmetrical gray level co-occurrence matrix (GLCM) of cell C ij [10]. Step 3: Text region detection. We first use the K-means clustering based on the above features to obtain the text areas and background areas, and then refine the text region by morphological dilation and erosion. Figure 4 illustrates the process of text region detection. Based on the extracted text regions, we calculate the following simple features that are most indicative of spam images: (1) Extent of text regions. The extent of text in the image is defined as the proportion between the area of the extracted text regions and the total areas of the image; (2) Amount of text regions; and (3) Amount of text letters.

6 134 C. Xu et al. (a) Initial picture (b) Candidate of text region (c) After erosion operation (d) After dilation operation (e) Final result (f) Labeled by pane Fig. 4. Illustration of the process of text region detection Text may be inherently presented in natural scene images in the form of road signs, building names, company names or others, and synthetic images may include text. However, the extraction of text features as defined above is intuitively expected to be discriminative between spam images and non-spam images. Figure 5 shows the distributions of features 1 and 3, from which we can find that the spam images and non-spam images distribute in different data domains. For feature 1, more than 40% of ham images distribute in the range of 0 to 0.1, and more than 80% of spam images in the range of 0.2 to 0.6; whereas for feature 3, more ham images distribute in the range of 0 to 6, and more spam images in the range of 6 to 60. According to[3],we also use three features to detect the presence of content obscuring. The idea is to measure the perimetric complexity which is used in the psychophysics of reading literature and aspect ratio (the ratio between width and height). The perimetric complexity is defined as the squared length of the boundary between black and white pixels in the whole image, divided by the black area.

7 Fusion of Text and Image Features: A New Approach % 80% ham images spam images 90% 80% ham images spam images 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% More than 0.6 0% More than 60 (a) Distribution of extent of text regions (feature 1) (b) Distribution of amount of text letters (feature 3) Fig. 5. Feature distributions in all images 3.3 Image Features Extraction Our first group of image features relies on the following metadata: (1) File format. The file format of an image includes its extension, the actual file format (as identified by metadata) and whether they match with each other; and (2) Image metadata. We extract 10 features that are contained in the image metadata, including whether the image has comments, bits per pixel, number of bands, progressive flag, sample precision, transparent color, approx high, index value, logical height and width. The rest of our image features based on the following color properties: (1) Color saturation. As defined by Frankel et al. [7], color saturation is quantified as the fraction of the total number of pixels in the image for which the difference max(r, G, B) min(r, G, B) is greater than a predefined threshold; (2) Color histogram. The color histogram is a compact summary of the image, and the legitimate images typically convey a much larger number of colors than spam images. We chose a 6-bit color space leading to 64 feature vectors; and (3) Color moments. The use of color moments is based on the assumption that the distribution of color in an image can be interpreted as a probability distribution. The distribution of spam images is always not continuous since they are synthetic. In our study, we use the following three central moments of an image s color distribution, namely mean, standard deviation and skewness. Using RGB channels and three moments for each channel, we obtain nine feature vectors. Figure 6 shows several ham and spam images and Figure 7 shows their color saturation, from which we can see that spam images are generally more saturated as compared with images of natural scenes. 3.4 Bottom-Layer Classifiers Some significant advantages of an SVM, such as excellent generalization ability through maximum margin approach, the absence of local minima, and the sparse representation of solution, are the major reason for using an SVM as a

8 136 C. Xu et al. (a) Ham image 1 (b) Ham image 2 (c) Ham image 3 (d) Spam image 1 (e) Spam image 2 Fig. 6. Three ham images and two spam images Fig. 7. Color saturation of images in Figure 6 powerful model in classification tasks. Both the text classifier and image classifier use SVMs first to differentiate between text and images, and obtain the spam confidence scores as the inputs of classifier fusion for further decision. The kernel trick is another important point to the success of SVMs. Polynomial kernel, radial basic function (RBF) kernel and sigmoid kernel are three typical kernels. In our study, LIBSVM 2 is adopted and RBF is used as a kernel function since the corresponding Hilbert space is of infinite dimension. The 2 The software is available at

9 Fusion of Text and Image Features: A New Approach 137 default parameters are used. In the previous section, we extract features and obtain the vector space model (VSM) which represents each image. The text-based vector space includes seven feature vectors and the image-based vector space includes 87 feature vectors. The text classifier and image classifier use their vectors as inputs to the SVM for training and classification respectively. 3.5 Classifier Fusion Combining the outputs from multiple tools has been reported effective in terms of improving information retrieval [13,15] and classification performance [2,18]. Our experiments also show that we can improve accuracy by combining the results of several classifiers. Furthermore, it makes sense that by including the inputs of many types of classifiers we can protect ourselves from risk of any one classifier being compromised. We use an SVM again to fuse the confidence scores of text and image classifiers. The outputs of bottom-layer classifiers constitute a vector for SVM training and classification. The vector is defined as (S t,s i )in which S t is the confidence score of text classifier and S i the confidence score of image classifier. Similar to bottom-layer classifiers, LIBSVM and RBF are also adopted for classifiers fusion. 4 Experiment 4.1 Experimental Setup The experiments are carried out on the corpora of images taken from real s. The corpora are collections of personal s used in [6], containing 2006 ham images and 3297 spam images. To our best knowledge, this is the only corpus of real ham images publicly available to research communities 3. For the experiments, the images are first split into two subsets: about 60% are randomly chosen for training classifiers on the bottom layer, and the other 40% for testing. And then for fusion stage, about 50% images are randomly chosen for training, and the other 50% for testing. We repeat this random selection 10 times and average all of the results. We first reduce the images by scaling so that the width and height are no more than 200 pixels. This simple mechanism makes our method robust to random pixels and simple scaling. It also meets the computational requirements since image analysis has high computational complexity. We then extract features from all the images from the positive and negative test sets. In our evaluation, accuracy, precision, image spam recall (recall in short) and image non-spam recall (non-spam recall in short) are defined as follows: accuracy = # of all images correctly classified # of all images 3 Available at spam/

10 138 C. Xu et al. 100% % 87.00% Performance 80% 60% 40% 20% Image classifier Text classifier Fusion classifier with averaging Fusion classifier with SVM Accuracy Precision Recall Non-spam recall Measure Performance 74.00% 61.00% 48.00% 35.00% SA with Bayes-OCR Huang's approach in [8] Our approach Precision Measure Recall (a) Performance comparison for different approaches (b) Performance comparison with Huang s approach Fig. 8. Experimental results precision = recall = # of spam images correctly classified # of images classified as spam # of spam images correctly classified # of all spam images # of non-spam images correctly classified non-spam recall = # of all non-spam images All the experiments are conducted on a typical PC with Core 2 Quad Q6600 CPU and 4GB memory and with Windows XP installed. 4.2 Experimental Results Figure 8(a) shows the details of experiment results, from which we can see that, as compared with the text classifier, the image classifier can obtain higher accuracy for common categories of images classification; whereas the text classifier has a better discriminative capability for spam images classification. The fusion classifier with averaging has achieved better results in total accuracy though, we cannot see any improvement in other indicators. The discriminative capability is greatly improved when we fuse the confidence scores of text classifier and image classifier with an SVM. Therefore, we can draw such a conclusion from the results: the fusion classifier with an SVM combines the classification performance from the text and image classifiers in a complementary fashion that unites the strengths of both. To evaluate the performance of our approach, we compare it with a public spam corpus SpamAssassin 4 (SA in short) in its standard configuration and equipped with a device Bayes-OCR for filtering image spam, and with the existing approach which is presented in a recent paper [11]. The comparative results are shown in Figure 8(b). The results of SA with Bayes-OCR are our baseline, of which the precision values are very good (almost as high as 100%) while the recall is still acceptably challenged (lower than 40%). Although our experiment 4 Available at

11 Fusion of Text and Image Features: A New Approach 139 and the approach in [11] are not using the same corpora, from the table we can see that our approach obtains better results, i.e., the precision is high enough to compete that from SA with Bayes-OCR, while the recall is much more improved. We also compare our approach with the existing approach in [6] which uses the same corpus. The average accuracy of our approach is %, better than the result of % by the approach in [6]. For some text-based anti-spam filtering experiments, there are a number of public benchmark datasets publicly available; whereas for our experiments, there are not any other shared ham images available besides another public corpus SpamArchive 5 which consists of 16,021 spam images. We hope that a larger corpus with real spam and non-spam images be available in the future to facilitate the experiments so that we can conduct a more fair comparison for the above mentioned approaches. 5 Conclusion In this paper, we have presented a novel hybrid framework for detecting spam with content embedded in images by fusion of classifiers. Given a spammed image, our method has been able to extract both the text and image features, and input the vector into the bottom-layer classifiers respectively, and lastly obtain the final decision based on the fusion of the outputs of the classifiers. Our experimental results have shown that our approach has achieved a significant improvement in the accuracy of image spam detection as compared with other approaches. For the next stage of study, we will further formalize our framework and approach, and will develop an online version of the fusion method by considering the spam filter s handing capacity and test the image model s ability in spam detection. Acknowledgments. This paper is supported by the 863 Plan project of China (No. 2007AA01Z197) and the Natural Science Foundations of China (No ), and partially supported by the National Basic Research Program of China (No. 2010CB327903). We would like to thank Dr. Mark Dredze who is now in the Department of Computer Science at University of Pennsylvania for making his data set publicly available and sending us his code for performing the feature extraction. References 1. Aradhye, H.B., Myers, G.K., Herson, J.A.: Image analysis for efficient categorization of image-based spam . In: Proceedings of International Conference on Document Analysis and Recognition, pp (August 2005) 5 SpamArchive was downloadable from SpamArchive.org which has been shut down. It is now available at image spam/

12 140 C. Xu et al. 2. Bennett, P.N., Dumais, S.T., Horvitz, E.: The combination of text classifiers using reliability indicators. Information Retrieval 8(1), (2005) 3. Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering by content obscuring detection. In: Proceedings of the Fourth Conference on and Anti-Spam (CEAS 2007), pp. 2 3 (August 2007) 4. Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007), pp (September 2007) 5. Cheng, H.D., Sun, Y.: A hierarchical approach to color image segmentation using homogeneity 9(12), (2000) 6. Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proceedings of the Fourth Conference on and Anti-Spam (CEAS 2007), pp (August 2007) 7. Frankel, C., Swain, M., Athitsos, V.: Webseer: an image search engine for the world wide web. Technical report, University of Chicago (1996) 8. Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. Journal of Maching Learning Research (special issue on Machine Learning in Computer Security) 7, (2006) 9. Gopalan, C., Manjula, D.: Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme, (2010) 10. Haralick, R., Shanmugam, K., Dinstein, I.: Textual features for image classification 3(6), (1973) 11. Huang, H., Guo, W., Zhang, Y.: A novel method for image spam filtering. In: Proceedings of the 9th International Conference for Young Computer Scientists (ICYCS 2008), pp (November 2008) 12. Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall, Inc., Upper Saddle River (1989) 13. Lynam, T.R., Buckley, C., Clarke, C.L.A., Cormack, G.V.: A multi-system analysis of document and term selection for blind feedback. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM 2004), pp (November 2004) 14. Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th International Conference on World Wide Web (WWW 2008), pp (April 2008) 15. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management (CIKM 2002), pp (November 2002) 16. Nhung, N.P., Phuong, T.M.: An efficient method for filtering image-based spam. In: Proceedings of 2007 IEEE International Conference on Research, Innovation and Vision for the Future, pp (March 2007) 17. Secure Computing Whitepaper. Image spam: The latest attack on the enterprise inbox. Technical report (November 2006) 18. Zhang, Y.: Using bayesian priors to combine classifiers for adaptive filtering. In: Proceedings of the 27th Conference on Research and Development in Information Retrieval (SIGIR 2004), pp (July 2004)

Image Spam Filtering Using Visual Information

Image Spam Filtering Using Visual Information Image Spam Filtering Using Visual Information Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli, Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari, Italy

More information

Image Spam Filtering by Content Obscuring Detection

Image Spam Filtering by Content Obscuring Detection Image Spam Filtering by Content Obscuring Detection Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Dept. of Electrical and Electronic Eng., University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

How To Filter Spam Image From A Picture By Color Or Color

How To Filter Spam Image From A Picture By Color Or Color Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

More information

Learning Fast Classifiers for Image Spam

Learning Fast Classifiers for Image Spam Learning Fast Classifiers for Image Spam Mark Dredze Computer and Information Sciences Dept. University of Pennsylvania Philadelphia, PA 19104 mdredze@seas.upenn.edu Reuven Gevaryahu Philadelphia, PA 19104

More information

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Journal of Machine Learning Research 7 (2006) 2699-2720 Submitted 3/06; Revised 9/06; Published 12/06 Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Giorgio Fumera Ignazio

More information

Image spam filtering using textual and visual information

Image spam filtering using textual and visual information Image spam filtering using textual and visual information Giorgio Fumera Ignazio Pillai Fabio Roli Battista Biggio Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari,

More information

A survey and experimental evaluation of image spam filtering techniques

A survey and experimental evaluation of image spam filtering techniques A survey and experimental evaluation of image spam filtering techniques Battista Biggio, Giorgio Fumera, Ignazio Pillai and Fabio Roli Department of Electrical and Electronic Engineering, University of

More information

Improved Spam Filter via Handling of Text Embedded Image E-mail

Improved Spam Filter via Handling of Text Embedded Image E-mail J Electr Eng Technol Vol. 9, No.?: 742-?, 2014 http://dx.doi.org/10.5370/jeet.2014.9.7.742 ISSN(Print) 1975-0102 ISSN(Online) 2093-7423 Improved Spam Filter via Handling of Text Embedded Image E-mail Seongwook

More information

Combining Optical Character Recognition (OCR) and Edge Detection Techniques to Filter Image-Based Spam

Combining Optical Character Recognition (OCR) and Edge Detection Techniques to Filter Image-Based Spam Combining Optical Character Recognition (OCR) and Edge Detection Techniques to Filter Image-Based Spam B. Fadiora Department of Computer Science The Polytechnic Ibadan Ibadan, Nigeria tundefadiora@yahoo.com

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

PARTIAL IMAGE SPAM E-MAIL DETECTION USING OCR

PARTIAL IMAGE SPAM E-MAIL DETECTION USING OCR PARTIAL IMAGE SPAM E-MAIL DETECTION USING OCR V. Sathiya *1 M.Divakar #2 T.S. Sumi *3 1 Faculty, Department of M.C.A, Panimalar Engineering College, Anna University, Chennai, India 2 PG Scholar, Department

More information

Detecting Image Spam Using Image Texture Features

Detecting Image Spam Using Image Texture Features Detecting Image Spam Using Image Texture Features Basheer Al-Duwairi*, Ismail Khater and Omar Al-Jarrah *Department of Network Engineering & Security Department of Computer Engineering Jordan University

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle

More information

Image Based Spam: White Paper

Image Based Spam: White Paper The Rise of Image-Based Spam No matter how you slice it - the spam problem is getting worse. In 2004, it was sufficient to use simple scoring mechanisms to determine whether email was spam or not because

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Not So Naïve Online Bayesian Spam Filter

Not So Naïve Online Bayesian Spam Filter Not So Naïve Online Bayesian Spam Filter Baojun Su Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 310027, China freizsu@gmail.com Congfu Xu Institute of Artificial

More information

AN ENHANCED APPROACH FOR CONTENT FILTERING IN SPAM DETECTION

AN ENHANCED APPROACH FOR CONTENT FILTERING IN SPAM DETECTION AN ENHANCED APPROACH FOR CONTENT FILTERING IN SPAM DETECTION Shashi Kant Rathore Department of Computer Science & Engineering, Lovely Professional University, Jalandhar, Punjab shashi.mnit@gmail.com Jyoti

More information

An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction

An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction Congfu Xu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 327, China

More information

Neural Network based Vehicle Classification for Intelligent Traffic Control

Neural Network based Vehicle Classification for Intelligent Traffic Control Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN

More information

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

More information

A Novel Approach towards Image Spam Classification

A Novel Approach towards Image Spam Classification A Novel Approach towards Image Spam Classification M.Soranamageswari, Dr.C.Meena Abstract The volume of unsolicited commercial mails has grown extremely in the past few years because of increased internet

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

Identifying Image Spam based on Header and File Properties using C4.5 Decision Trees and Support Vector Machine Learning

Identifying Image Spam based on Header and File Properties using C4.5 Decision Trees and Support Vector Machine Learning Identifying Image Spam based on Header and File Properties using C4.5 Decision Trees and Support Vector Machine Learning Sven Krasser, Yuchun Tang, Jeremy Gould, Dmitri Alperovitch, Paul Judge Abstract

More information

Bayesian Spam Filtering

Bayesian Spam Filtering Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

More information

PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering

PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering 2007 IEEE/WIC/ACM International Conference on Web Intelligence PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering Khurum Nazir Juneo Dept. of Computer Science Lahore University

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

Combining Global and Personal Anti-Spam Filtering

Combining Global and Personal Anti-Spam Filtering Combining Global and Personal Anti-Spam Filtering Richard Segal IBM Research Hawthorne, NY 10532 Abstract Many of the first successful applications of statistical learning to anti-spam filtering were personalized

More information

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

How To Create A Text Classification System For Spam Filtering

How To Create A Text Classification System For Spam Filtering Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar

More information

Spam Filtering Based on Latent Semantic Indexing

Spam Filtering Based on Latent Semantic Indexing Spam Filtering Based on Latent Semantic Indexing Wilfried N. Gansterer Andreas G. K. Janecek Robert Neumayer Abstract In this paper, a study on the classification performance of a vector space model (VSM)

More information

Cosdes: A Collaborative Spam Detection System with a Novel E- Mail Abstraction Scheme

Cosdes: A Collaborative Spam Detection System with a Novel E- Mail Abstraction Scheme IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 9 (September 2012), PP 55-60 Cosdes: A Collaborative Spam Detection System with a Novel E- Mail Abstraction Scheme

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature

Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature 3rd International Conference on Multimedia Technology ICMT 2013) Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature Qian You, Xichang Wang, Huaying Zhang, Zhen Sun

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

A Dynamic Approach to Extract Texts and Captions from Videos

A Dynamic Approach to Extract Texts and Captions from Videos Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information

A Method of Caption Detection in News Video

A Method of Caption Detection in News Video 3rd International Conference on Multimedia Technology(ICMT 3) A Method of Caption Detection in News Video He HUANG, Ping SHI Abstract. News video is one of the most important media for people to get information.

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features

Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features Remote Sensing and Geoinformation Lena Halounová, Editor not only for Scientific Cooperation EARSeL, 2011 Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Detecting Image Spam using Visual Features and Near Duplicate Detection

Detecting Image Spam using Visual Features and Near Duplicate Detection Detecting Image Spam using Visual Features and Near Duplicate Detection Bhaskar Mehta Google Inc. Brandschenkestr 110 Zurich, Switzerland bmehta@google.com Saurabh Nangia* IIT Guwahati Guwahati 781039

More information

Image Classification for Dogs and Cats

Image Classification for Dogs and Cats Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science kzhou3@ualberta.ca Abstract

More information

An Algorithm for Classification of Five Types of Defects on Bare Printed Circuit Board

An Algorithm for Classification of Five Types of Defects on Bare Printed Circuit Board IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 3, July 2011 CSES International 2011 ISSN 0973-4406 An Algorithm for Classification of Five Types of Defects on Bare

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

Keywords Phishing Attack, phishing Email, Fraud, Identity Theft

Keywords Phishing Attack, phishing Email, Fraud, Identity Theft Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection Phishing

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

More information

Image Spam: The Email Epidemic of 2006

Image Spam: The Email Epidemic of 2006 S e c u r i t y T r e n d s Overview Image Spam: The Email Epidemic of 2006 S E C U R I T Y T R E N D S O v e r v i e w End-users around the world are reporting an increase in spam. Much of this increase

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Spam Filtering using Naïve Bayesian Classification

Spam Filtering using Naïve Bayesian Classification Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering

More information

Adaption of Statistical Email Filtering Techniques

Adaption of Statistical Email Filtering Techniques Adaption of Statistical Email Filtering Techniques David Kohlbrenner IT.com Thomas Jefferson High School for Science and Technology January 25, 2007 Abstract With the rise of the levels of spam, new techniques

More information

Hoodwinking Spam Email Filters

Hoodwinking Spam Email Filters Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 533 Hoodwinking Spam Email Filters WANLI MA, DAT TRAN, DHARMENDRA

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Multimodal Biometric Recognition Security System

Multimodal Biometric Recognition Security System Multimodal Biometric Recognition Security System Anju.M.I, G.Sheeba, G.Sivakami, Monica.J, Savithri.M Department of ECE, New Prince Shri Bhavani College of Engg. & Tech., Chennai, India ABSTRACT: Security

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

How To Train A Classifier With Active Learning In Spam Filtering

How To Train A Classifier With Active Learning In Spam Filtering Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods

More information

Personalized Spam Filtering for Gray Mail

Personalized Spam Filtering for Gray Mail Personalized Spam Filtering for Gray Mail Ming-wei Chang Computer Science Dept. University of Illinois Urbana, IL, USA mchang21@uiuc.edu Wen-tau Yih Microsoft Research One Microsoft Way Redmond, WA, USA

More information

Investigation of Support Vector Machines for Email Classification

Investigation of Support Vector Machines for Email Classification Investigation of Support Vector Machines for Email Classification by Andrew Farrugia Thesis Submitted by Andrew Farrugia in partial fulfillment of the Requirements for the Degree of Bachelor of Software

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

ScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis

ScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 388 394 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Brain Image Classification using

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Medical Image Segmentation of PACS System Image Post-processing *

Medical Image Segmentation of PACS System Image Post-processing * Medical Image Segmentation of PACS System Image Post-processing * Lv Jie, Xiong Chun-rong, and Xie Miao Department of Professional Technical Institute, Yulin Normal University, Yulin Guangxi 537000, China

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Face Recognition For Remote Database Backup System

Face Recognition For Remote Database Backup System Face Recognition For Remote Database Backup System Aniza Mohamed Din, Faudziah Ahmad, Mohamad Farhan Mohamad Mohsin, Ku Ruhana Ku-Mahamud, Mustafa Mufawak Theab 2 Graduate Department of Computer Science,UUM

More information

Circle Object Recognition Based on Monocular Vision for Home Security Robot

Circle Object Recognition Based on Monocular Vision for Home Security Robot Journal of Applied Science and Engineering, Vol. 16, No. 3, pp. 261 268 (2013) DOI: 10.6180/jase.2013.16.3.05 Circle Object Recognition Based on Monocular Vision for Home Security Robot Shih-An Li, Ching-Chang

More information

Laser Gesture Recognition for Human Machine Interaction

Laser Gesture Recognition for Human Machine Interaction International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-04, Issue-04 E-ISSN: 2347-2693 Laser Gesture Recognition for Human Machine Interaction Umang Keniya 1*, Sarthak

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME

6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976-6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET)

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal

More information

Bayesian Spam Detection

Bayesian Spam Detection Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Department of Mechanical Engineering, King s College London, University of London, Strand, London, WC2R 2LS, UK; e-mail: david.hann@kcl.ac.

Department of Mechanical Engineering, King s College London, University of London, Strand, London, WC2R 2LS, UK; e-mail: david.hann@kcl.ac. INT. J. REMOTE SENSING, 2003, VOL. 24, NO. 9, 1949 1956 Technical note Classification of off-diagonal points in a co-occurrence matrix D. B. HANN, Department of Mechanical Engineering, King s College London,

More information