Save this PDF as:

Size: px
Start display at page:

## Transcription

5 Fraction of tweets with URLs (a) Fraction of tweets containing URLs Fraction of tweets with spam word (b) Fraction of tweets with spam words Hashtags per tweet (average) (c) Average number of hashtags per tweet Figure 3: Cumulative distribution functions of three content attributes Number of followers per number of followees (a) Fraction of followers per followees Account age (days) (b) Age of the user account Number of tweets received (c) Number of tweets received Figure 4: Cumulative distribution functions of three user behavior attributes is the ratio of the number of users classified correctly to the total predicted as users of class X. In order to explain these metrics, we will make use of a confusion matrix [2], illustrated in Table 2. Each position in this matrix represents the number of elements in each original class, and how they were predicted by the classification. In Table 2, the precision (p spam) and the recall (r spam) indices of class spammer are computed as p spam = a/(a + c) and r spam = a/(a + b). Predicted Spammer Non-spammer True Spammer a b Non-spammer c d Table 2: Example of confusion matrix The F metric is the harmonic mean between both precision and recall, and is defined as F = 2pr/(p+r). Two variations of F, namely, micro and macro, are usually reported to evaluate classification effectiveness. Micro-F is calculated by first computing global precision and recall values for all classes, and then calculating F. Micro-F considers equally important the classification of each user, independently of its class, and basically measures the capability of the classifier to predict the correct class on a per-user basis. In contrast, Macro-F values are computed by first calculating F values for each class in isolation, as exemplified above for spammers, and then averaging over all classes. Macro-F considers equally important the effectiveness in each class, independently of the relative size of the class. Thus, the two metrics provide complementary assessments of the classification effectiveness. Macro-F is especially important when the class distribution is very skewed, as in our case, to verify the capability of the classifier to perform well in the smaller classes. 5.2 The classifier and the experimental setup We use a Support Vector Machine (SVM) classifier [9], which is a state-of-the-art method in classification and obtainned the best results among a set of classifiers tested. The goal of a SVM is to find the hyperplane that optimally separates with a maximum margin the training data into two portions of an N-dimensional space. A SVM performs classification by mapping input vectors into an N-dimensional space, and checking in which side of the defined hyperplane the point lies. We use a non-linear SVM with the Radial Basis Function (RBF) kernel to allow SVM models to perform separations with very complex boundaries. The implementation of SVM used in our experiments is provided with libsvm [3], an open source SVM package that allows searching for the best classifier parameters using the training data, a mandatory step in the classifier setup. In particular, we use the easy tool from libsvm, which provides a series of optimizations, including normalization of all numerical attributes. For experiments involving the SVM J parameter (discussed in Section 5.3), we used a different implementation, called SVM light, since libsvm does not provide this parameter. Classification results are equal for both implementations when we use the same classifier parameters. The classification experiments are performed using a 5- fold cross-validation. In each test, the original sample is partitioned into 5 sub-samples, out of which four are used as training data, and the remaining one is used for testing

6 the classifier. The process is then repeated 5 times, with each of the 5 sub-samples used exactly once as the test data, thus producing 5 results. The entire 5-fold cross validation was repeated 5 times with different seeds used to shuffle the original data set, thus producing 25 different results for each test. The results reported are averages of the 25 runs. With 95% of confidence, results do not differ from the average in more than 5%. 5.3 Basic classification results Table 3 shows the confusion matrix obtained as the result of our experiments with SVM. The numbers presented are percentages relative to the total number of users in each class. The diagonal in boldface indicates the recall in each class. Approximately, 7% of spammers and 96% of nonspammers were correctly classified. Thus, only a small fraction of non-spammers were erroneously classified as spammers. A significant fraction (almost 3%) of spammers was misclassified as non-spammers. We noted that, in general, these spammers exhibit a dual behavior, sharing a reasonable number of non-spam tweets, thus presenting themselves as nonspammers most of the time, but occasionally some tweet that was considered as spam. This dual behavior masks some important aspects used by the classifier to differentiate spammers from non-spammers. This is further aggravated by the fact that a significant number of non-spammers post their tweets to trending topics, a typical behavior of spammers. Although the spammers our approach was not able to detect are occasional spammers, an approach that allow one to choose to detect even occasional spammers could be of interest. In Section 5.4, we discuss an approach that allows one to trade a higher recall of spammers at a cost of misclassifying a larger number of non-spammers. Predicted Spammer Non-spammers True Spammer 7.% 29.9% Non-spammer 3.6% 96.4% Table 3: Basic classification results As a summary of the classification results, Micro-F value is 87.6, whereas per-class F values are 79. and 9.2, for spammers and non-spammers, respectively, resulting in an average Macro-F equal to 85.. The Micro-F result indicates that we are predicting the correct class in 87.6% of the cases. Complementarily, the Macro-F result shows that there is a certain degree of imbalance for F across classes, with more difficulty for classifying spammers. Comparing with a trivial baseline classifier that chooses to classify every single user as non-spammer, we obtain gains of about 3.4% in terms of Micro-F, and of 2.8% in terms of Macro-F. 5.4 Spammer detection tradeoff Our basic classification results show we can effectively identify spammers, misclassifying only a small fraction of non-spammers. However, even the small fraction of misclassified non-spammers could not be suitable for a detection mechanism that apply some sort of automatic punishment to users. Additionally, one could prefer identifying more spammers at the cost of misclassifying more non-spammers. This tradeoff can be explored using a cost mechanism, available in the SVM classifier. In this mechanism, one can Percentage (%) as spammers Non as spammers SVM parameter (j) Figure 5: Impact of varying the J parameter give priority to one class (e.g., spammers) over the other (e.g., non-spammers) by varying its J parameter 2 [24]. Figure 5 shows classification results when we vary the parameter J. We can note that increasing J leads to a higher percentage of correctly classified spammers (with diminishing returns for J >.3), but at the cost of a larger fraction of misclassified legitimate users. For instance, one can choose to correctly classify around 43.7% of spammers, misclassifying only.3% non-spammers (J =.). On the other hand, one can correctly classify as much as 8.3% of spammers (J = 5), paying the cost of misclassifying 7.9% of legitimate users. The best solution to this tradeoff depends on the system s objectives. For example, a system might be interested in sending an automatic warning message to all users classified as spammers, in which case they might prefer to act conservatively, avoiding sending the message to legitimate users, at the cost of reducing the number of correctly predicted spammers. In another situation, a system may prefer to filter any spam content and then detect a higher fraction of spammers, misclassifying a few more legitimate users. It should be stressed that we are evaluating the potential benefits of varying J. In a practical situation, the optimal value should be discovered in the training data with cross-validation, and selected according to the system s goals. 5.5 Importance of the attributes In order to verify the ranking of importance of these attributes we use two feature selection methods available on Weka [27]. We assessed the relative power of the 6 selected attributes in discriminating one user class from the others by independently applying two well known feature selection methods, namely, information gain and χ 2 (Chi Squared) [3]. Since results for information gain and χ 2 are very similar and both methods ranked attributes in common among the top, we omitted results for information gain. Table 4 presents the most important attributes for the χ 2 method. We can note that two of the most important attributes are the fraction of tweets with URLs and the average number 2 The J parameter is the cost factor by which training errors in one class outweigh errors in the other. It is useful, when there is a large imbalance between the two classes, to counterbalance the bias towards the larger one.

### Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

### Practical Detection of Spammers and Content Promoters in Online Video Sharing Systems

1 Practical Detection of Spammers and Content Promoters in Online Video Sharing Systems Fabrício Benevenuto, Tiago Rodrigues, Adriano Veloso, Jussara Almeida, Marcos Gonçalves and Virgílio Almeida Computer

DON T FOLLOW ME: SPAM DETECTION IN TWITTER Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA hwang@psu.edu Keywords: Abstract: social

### Spam detection with data mining method:

Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

### Security and Trust in social media networks

!1 Security and Trust in social media networks Prof. Touradj Ebrahimi touradj.ebrahimi@epfl.ch! DMP meeting San Jose, CA, USA 11 January 2014 Social media landscape!2 http://www.fredcavazza.net/2012/02/22/social-media-landscape-2012/

Spam Filtering in Twitter using Sender-Receiver Relationship Jonghyuk Song 1, Sangho Lee 1 and Jong Kim 2 1 Dept. of CSE, POSTECH, Republic of Korea {freestar,sangho2}@postech.ac.kr 2 Div. of ITCE, POSTECH,

### Detecting Spam Web Pages through Content Analysis

Detecting Spam Web Pages through Content Analysis Alex Ntoulas 1, Marc Najork 2, Mark Manasse 2, Dennis Fetterly 2 1 UCLA Computer Science Department 2 Microsoft Research, Silicon Valley Web Spam: raison

### Observing Common Spam in Tweets and Email

Observing Common Spam in Tweets and Email Cristian Lumezanu NEC Laboratories America Princeton, NJ lume@nec-labs.com Nick Feamster University of Maryland College Park, MD feamster@cs.umd.edu ABSTRACT Spam

### Exploring Big Data in Social Networks

Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

### Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA

Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA mpm308@lehigh.edu M. Chuah CSE Dept Lehigh University 19 Memorial

### Detection of Malicious URLs by Correlating the Chains of Redirection in an Online Social Network (Twitter)

International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 3, July 2014, PP 33-38 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Detection

### Object Popularity Distributions in Online Social Networks

Object Popularity Distributions in Online Social Networks Theo Lins Computer Science Dept. Federal University of Ouro Preto (UFOP) Ouro Preto, Brazil theosl@gmail.com Wellington Dores Computer Science

### Tweets Miner for Stock Market Analysis

Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com

### Network-based spam filter on Twitter

Network-based spam filter on Twitter Ziyan Zhou Stanford University ziyanjoe@stanford.edu Lei Sun Stanford University sunlei@stanford.edu ABSTRACT Rapidly growing micro-blogging social networks, such as

### Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

### Removing Web Spam Links from Search Engine Results

Removing Web Spam Links from Search Engine Results Manuel EGELE pizzaman@iseclab.org, 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features

### Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

### Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

### Domain Name Abuse Detection. Liming Wang

Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?

### End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

### Content-Based Discovery of Twitter Influencers

Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy irma.metra@mail.polimi.it chiara.francalanci@polimi.it

### An Exploratory Study on Software Microblogger Behaviors

2014 4th IEEE Workshop on Mining Unstructured Data An Exploratory Study on Software Microblogger Behaviors Abstract Microblogging services are growing rapidly in the recent years. Twitter, one of the most

### Gianluca Stringhini, Christopher Kruegel, Giovanni Vigna University of California, Santa Barbara 26 th ACSAC(December, 2010)

Gianluca Stringhini, Christopher Kruegel, Giovanni Vigna University of California, Santa Barbara 26 th ACSAC(December, 2010) Presented by Alankrit Chona 2009CS10176 Nikita Gupta 2009CS50248 Motivation

### Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/

### Image Content-Based Email Spam Image Filtering

Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

### Comparison of Ant Colony and Bee Colony Optimization for Spam Host Detection

International Journal of Engineering Research and Development eissn : 2278-067X, pissn : 2278-800X, www.ijerd.com Volume 4, Issue 8 (November 2012), PP. 26-32 Comparison of Ant Colony and Bee Colony Optimization

### Detecting Linkedin Spammers and its Spam Nets

Detecting Linkedin Spammers and its Spam Nets Víctor M. Prieto, Manuel Álvarez and Fidel Cacheda Department of Information and Communication Technologies, University of A Coruña, Coruña, Spain 15071 Email:

### Fraud Detection in Online Reviews using Machine Learning Techniques

ISSN (e): 2250 3005 Volume, 05 Issue, 05 May 2015 International Journal of Computational Engineering Research (IJCER) Fraud Detection in Online Reviews using Machine Learning Techniques Kolli Shivagangadhar,

### The Social Media Guide For Small Businesses

The Social Media Guide For Small Businesses Authored By: Justin Rissmiller, Owner & Operator A Publication Of: T&R Solutions: Define. Design. Progress. YOUR LOGO Contents An Introduction To Social Media

### Can Twitter Predict Royal Baby's Name?

Summary Can Twitter Predict Royal Baby's Name? Bohdan Pavlyshenko Ivan Franko Lviv National University,Ukraine, b.pavlyshenko@gmail.com In this paper, we analyze the existence of possible correlation between

### Spam Host Detection Using Ant Colony Optimization

Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web

### A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

### SIP Service Providers and The Spam Problem

SIP Service Providers and The Spam Problem Y. Rebahi, D. Sisalem Fraunhofer Institut Fokus Kaiserin-Augusta-Allee 1 10589 Berlin, Germany {rebahi, sisalem}@fokus.fraunhofer.de Abstract The Session Initiation

### Beyond Friendship Graphs: A Study of User Interactions in Flickr

Beyond Friendship Graphs: A Study of User Interactions in Flickr ABSTRACT Masoud Valafar, Reza Rejaie Department of Computer & Information Science University of Oregon Eugene, OR 9743 {masoud,reza}@cs.uoregon.edu

### Emerging Trends in Fighting Spam

An Osterman Research White Paper sponsored by Published June 2007 SPONSORED BY sponsored by Osterman Research, Inc. P.O. Box 1058 Black Diamond, Washington 98010-1058 Phone: +1 253 630 5839 Fax: +1 866

### Reconstruction and Analysis of Twitter Conversation Graphs

Reconstruction and Analysis of Twitter Conversation Graphs Peter Cogan peter.cogan@alcatellucent.com Gabriel Tucci gabriel.tucci@alcatellucent.com Matthew Andrews andrews@research.belllabs.com W. Sean

### Exploiting Network Structure for Proactive Spam Mitigation

Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman, Subhabrata Sen, Oliver Spatscheck, Patrick Haffner, Dawn Song Carnegie Mellon University, AT&T Research shobha@cs.cmu.edu,

### Mimicking human fake review detection on Trustpilot

Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark

### Review: A Ranking Fraud Detection System for Mobile Apps

Review: A Ranking Fraud Detection System for Mobile Apps Raghuveer Dagade, Prof. Lomesh Ahire 1 Student, Dept. of M.E.Comp. Network, Nutan Maharashta Institute of Engineering and Technology, Pune, India

### An Efficient Methodology for Detecting Spam Using Spot System

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

### A Novel Method to Detect Fraud Ranking For Mobile Apps

A Novel Method to Detect Fraud Ranking For Mobile Apps Geedikapally Rajesh Kumar M.Tech Student Department of CSE St.Peter's Engineering College. Abstract: Ranking fraud in the mobile App market refers

### CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

### Music Mood Classification

Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

### The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

### Web Spam Identification Through Language Model Analysis

Web Spam Identification Through Language Model Analysis ABSTRACT Juan Martinez-Romo Dpto. Lenguajes y Sistemas Informáticos UNED 28040 Madrid, Spain juaner@lsi.uned.es This paper applies a language model

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan cjmaclel@asu.edu May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three

### SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

International Journal of Computer Engineering and Applications, Volume IX, Issue I, January 15 SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

### Extracting Information from Social Networks

Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

### Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen, Daniel Gordon, and Danny Hendler Ben Gurion University of the Negev, Be er Sheva, Israel {yehonatc,gordonda,hendlerd}@cs.bgu.ac.il

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### Ipswitch IMail Server with Integrated Technology

Ipswitch IMail Server with Integrated Technology As spammers grow in their cleverness, their means of inundating your life with spam continues to grow very ingeniously. The majority of spam messages these

Follow the Green: Growth and Dynamics in Twitter Follower Markets Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, Ben Y. Zhao Department of Computer Science,

### SPSB: Semi-Permeable Spam Banner

SPSB: Semi-Permeable Spam Banner Parth S. Shah 1, Vishal R. Andodariya 2 1 PG Student, 2 Asst. Professor Department of Computer Engineering Shri Pandit Nathulalji Vyas Technical Campus, Wadhwan - India

### Image Spam Filtering Using Visual Information

Image Spam Filtering Using Visual Information Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli, Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari, Italy

### Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

Groundbreaking Technology Redefines Spam Prevention Analysis of a New High-Accuracy Method for Catching Spam October 2007 Introduction Today, numerous companies offer anti-spam solutions. Most techniques

### Impact of Multimedia in Sina Weibo: Popularity and Life Span

Impact of Multimedia in Sina Weibo: Popularity and Life Span Xun Zhao 1, Feida Zhu 1, Weining Qian 2, and Aoying Zhou 2 1 School of Information Systems, Singapore Management University 2 Software Engineering

### A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

### Identifying Potentially Useful Email Header Features for Email Spam Filtering

Identifying Potentially Useful Email Header Features for Email Spam Filtering Omar Al-Jarrah, Ismail Khater and Basheer Al-Duwairi Department of Computer Engineering Department of Network Engineering &

### Market Assessment & Campaign SLA Calculator LOGO WE OPEN THE DOOR, SO YOU CAN CLOSE IT.

Market Assessment & Campaign SLA Calculator LOGO WE OPEN THE DOOR, SO YOU CAN CLOSE IT. Your Market Assessment Overview Your Inbound Market Assessment and Campaign SLA Calculator is broken down into several

### Music Classification by Composer

Music Classification by Composer Janice Lan janlan@stanford.edu CS 229, Andrew Ng December 14, 2012 Armon Saied armons@stanford.edu Abstract Music classification by a computer has been an interesting subject

### Research and Implementation of Real-time Automatic Web Page Classification System

3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME 2015) Research and Implementation of Real-time Automatic Web Page Classification System Weihong Han 1, a *, Weihui

### Toward Spam 2.0: An Evaluation of Web 2.0 Anti-Spam Methods

Toward Spam 2.0: An Evaluation of Web 2.0 Anti-Spam Methods Pedram Hayati, Vidyasagar Potdar Digital Ecosystem and Business Intelligence (DEBI) Institute, Curtin University of Technology, Perth, WA, Australia

### Understanding the popularity of reporters and assignees in the Github

Understanding the popularity of reporters and assignees in the Github Joicy Xavier, Autran Macedo, Marcelo de A. Maia Computer Science Department Federal University of Uberlândia Uberlândia, Minas Gerais,

### Measuring User Credibility in Social Media

Measuring User Credibility in Social Media Mohammad-Ali Abbasi and Huan Liu Computer Science and Engineering, Arizona State University Ali.abbasi@asu.edu,Huan.liu@asu.edu Abstract. People increasingly

### Image Spam Filtering by Content Obscuring Detection

Image Spam Filtering by Content Obscuring Detection Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Dept. of Electrical and Electronic Eng., University of Cagliari Piazza d Armi, 09123 Cagliari,

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Machine Learning model evaluation. Luigi Cerulo Department of Science and Technology University of Sannio

Machine Learning model evaluation Luigi Cerulo Department of Science and Technology University of Sannio Accuracy To measure classification performance the most intuitive measure of accuracy divides the

### COAT : Collaborative Outgoing Anti-Spam Technique

COAT : Collaborative Outgoing Anti-Spam Technique Adnan Ahmad and Brian Whitworth Institute of Information and Mathematical Sciences Massey University, Auckland, New Zealand [Aahmad, B.Whitworth]@massey.ac.nz

### Who are my Audiences? A Study of the Evolution of Target Audiences in Microblogs

Who are my Audiences? A Study of the Evolution of Target Audiences in Microblogs Ruth García-Gavilanes 1, Andreas Kaltenbrunner 2, Diego Sáez-Trumper 3, Ricardo Baeza-Yates 3, Pablo Aragón 2, and David

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods

Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods João Emanoel Ambrósio Gomes 1, Ricardo Bastos Cavalcante Prudêncio 1 1 Centro de Informática Universidade Federal

### E-MAIL FILTERING FAQ

V8.3 E-MAIL FILTERING FAQ COLTON.COM Why? Why are we switching from Postini? The Postini product and service was acquired by Google in 2007. In 2011 Google announced it would discontinue Postini. Replacement:

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response

Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response Abstract Manabu IWANAGA, Toshihiro TABATA, and Kouichi SAKURAI Kyushu University Graduate School of Information

### Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

### On Word-of-Mouth Based Discovery of the Web

On Word-of-Mouth Based Discovery of the Web Tiago Rodrigues Universidade Federal de Minas Gerais, UFMG Belo Horizonte, Brazil tiagorm@dcc.ufmg.br Krishna P. Gummadi Max Planck Institute for Software Systems,

### 131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

### International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

### Entropy-based Classification of Retweeting Activity on Twitter

Entropy-based Classification of Retweeting Activity on Twitter Rumi Ghosh USC Information Sciences Institute Marina del Rey, CA 90292 {rumig}@isi.edu Tawan Surachawala USC Information Sciences Institute

### Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India.

Enhanced Approach on Web Page Classification Using Machine Learning Technique S.Gowri Shanthi Research Scholar, Department of Computer Science, NGM College, Pollachi, India. Dr. Antony Selvadoss Thanamani,

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Theme-based Retrieval of Web News

Theme-based Retrieval of Web Nuno Maria, Mário J. Silva DI/FCUL Faculdade de Ciências Universidade de Lisboa Campo Grande, Lisboa Portugal {nmsm, mjs}@di.fc.ul.pt ABSTRACT We introduce an information system

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Social Networks. Listening and monitoring what happens online Identifying online audiences The main social media tools

Communication Handbook - Factsheet 6 Version 1 April 2012 Social Networks Listening and monitoring what happens online Identifying online audiences The main social media tools Complementary to the website,

### escan Anti-Spam White Paper

escan Anti-Spam White Paper Document Version (esnas 14.0.0.1) Creation Date: 19 th Feb, 2013 Preface The purpose of this document is to discuss issues and problems associated with spam email, describe

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

SOCIAL MEDIA TIPS & TRICKS FACEBOOK FOR NONPROFITS 10 Tips to Get the Most Out of Facebook 1. Be helpful. If someone asks a question on your Facebook page, respond. If someone shares feedback, thank them

### Forecasting stock markets with Twitter

Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,

### SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research

SEO 360: The Essentials of Search Engine Optimization By Chris Adams, Director of Online Marketing & Research INTRODUCTION Effective Search Engine Optimization is not a highly technical or complex task,

### Bayesian Spam Filtering

Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

### Software Engineering 4C03 SPAM

Software Engineering 4C03 SPAM Introduction As the commercialization of the Internet continues, unsolicited bulk email has reached epidemic proportions as more and more marketers turn to bulk email as

### The Evolvement of Big Data Systems

The Evolvement of Big Data Systems From the Perspective of an Information Security Application 2015 by Gang Chen, Sai Wu, Yuan Wang presented by Slavik Derevyanko Outline Authors and Netease Introduction