FIGHTING SPAM WEBSITES AND SOCIAL I.P. IN ON SOCIAL NEWS

Similar documents

SpotRank: A Robust Voting System for Social News Websites

Search and Information Retrieval

Combating Spam in Tagging Systems: An Evaluation

Social Media Marketing for Small Business Demystified

Social Information Processing in Social News Aggregation

When Reputation is Not Enough: Barracuda Spam & Virus Firewall Predictive Sender Profiling

Pulsar TRAC. Big Social Data for Research. Made by Face

When Reputation is Not Enough: Barracuda Spam Firewall Predictive Sender Profiling. White Paper

Content Security: Protect Your Network with Five Must-Haves

5) Social Media Marketing. Paolo Torchio & Amanda Webb September

Predicting Influentials in Online Social Networks

Threat Intelligence: The More You Know the Less Damage They Can Do. Charles Kolodgy Research VP, Security Products

Cisco Security Intelligence Operations

Search Engine Optimisation

Business Case and Strategy for Executing Mobile SEO Programs in Large Advertisers. Michael Martin and Craig Macdonald

2H 2015 SHADOW DATA REPORT

Top 4 Trends in Digital Marketing 2014

HubSpot's Website Grader

Integrating MSS, SEP and NGFW to catch targeted APTs

7 QUESTIONS TO ASK YOUR PPC AGENCY. We Turn Browsers Into Buyers

Search Engine Optimization Techniques. For Small Businesses

SEARCH UBERSEO SMALL BUSINESS OWNER S GUIDE TO A DIVISION OF EBWAY CREATIVE

Android App Marketing and Google Play

Predictive Coding Defensibility

SOCIAL MEDIA OPTIMIZATION

Software-assisted document review: An ROI your GC can appreciate. kpmg.com

Trust based Peer-to-Peer System for Secure Data Transmission ABSTRACT:

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

REDUCING COSTS WITH ADVANCED REVIEW STRATEGIES - PRIORITIZATION FOR 100% REVIEW. Bill Tolson Sr. Product Marketing Manager Recommind Inc.

Threat Intelligence. Benefits for the enterprise

Spam & The Power of Social Networks

Applying the National Intelligence Process to Information Security

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Anti Spam Best Practices

MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS

SEO Definition. SEM Definition

May 2011 Report #53. The following trends are highlighted in the May 2011 report:

WSI White Paper. Prepared by: Francois Muscat Search Engine Optimization Expert, WSI

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Search engine marketing

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Youtube Search Engine Optimization (SEO) - How to Rank a Youtube Video:

Socialbakers Analytics User Guide

ADGEAR TRADER EMPOWERING DIGITAL MEDIA INNOVATORS WITH CROSS-CHANNEL, REAL-TIME DISPLAY SOLUTIONS

SEO Services. Climb up the Search Engine Ladder

Demand Generation vs. Customer Relationship Management David M. Raab Raab Associates Inc.

Oracle Business Intelligence Commenting & Annotations Solution Brief

Internet Marketing for Local Businesses Online

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

Security from Above: How Cloud based Security Delivers Up to the Minute Network Protection

Search Engine Marketing Overview LOCAL SEM PROCESS

THE THREE Es OF MODERN SECURITY FOR PHISHING

SEO Services Sample Proposal

Addressing Big Data Security Challenges: The Right Tools for Smart Protection

A Guide to Social Media Marketing

REPUTATION MARKETING PROPOSAL Developed For: Dutchie s Fresh Market From: Screaming Tree Media

Optimizing Network Vulnerability

Workshop Discussion Notes: Housing

90 Digital igaming Reports: SEO Risk in igaming. Risk of being hit by a Google update has doubled in 17 months. June Gillis van den Broeke

Get Found Online. Consumers are searching for your products and services online. Is your website getting found?

1. Introduction to SEO (Search Engine Optimization)

Introduction. Why does it matter how Google identifies these SEO black hat tactics?

Link Building and Practical PR Essentials

Build Your Business with Search Advertising. If you re only doing SEO, You Could Be SOL!

SEO Trends for 2014

Global Reputation Monitoring The FortiGuard Security Intelligence Database WHITE PAPER

Northern Light. Best Practices In Market Research Portals. Presented at CMAG. August Strategic Research Portals

Spam DNA Filtering System

Five Questions to Ask Your Mobile Ad Platform Provider. Whitepaper

Search Privacy Practices: A Work In Progress

Intelligent Vulnerability Management The Art of Prioritizing Remediation. Phone Conference

Extracting and Preparing Metadata to Make Video Files Searchable

Critical Security Controls

How to Protect Your Dealership's Online Reputation

Search Engine Optimization: The Linchpin of a Successful Online Presence

Spam Classification Techniques

The State of Spam A Monthly Report August Generated by Symantec Messaging and Web Security

McAfee Network Security Platform

Top 12 Website Tips. How to work with the Search Engines

Resource 2.19 An Introduction to Social Media for Business Types of social media

Frequency Matters. The keys to optimizing send frequency

Security and Fraud Exceptions Under Do Not Track. Christopher Soghoian Center for Applied Cybersecurity Research, Indiana University

Professional Diploma in Digital Marketing Module 4: Marketing Version 5.0 Lecturer: David Maher

Securing Your Business with DNS Servers That Protect Themselves

The smart buyers guide to Search Engine Optimisation

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

TIBCO Cyber Security Platform. Atif Chaughtai

The search engine optimization process involves:

Symantec Cyber Threat Analysis Program Program Overview. Symantec Cyber Threat Analysis Program Team

SECURE YOUR BUSINESS WHEREVER IT TAKES YOU. Protection Service for Business

Bridging the gap between COTS tool alerting and raw data analysis

Introduction 3. What is SEO? 3. Why Do You Need Organic SEO? 4. Keywords 5. Keyword Tips 5. On The Page SEO 7. Title Tags 7. Description Tags 8

Ipswitch IMail Server with Integrated Technology

SCORECARD MARKETING. Find Out How Much You Are Really Getting Out of Your Marketing

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE

5 Tips to Turn Your Website into a Marketing Machine

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

2011 Forrester Research, Inc. Reproduction Prohibited

Microsoft Security Intelligence Report volume 7 (January through June 2009)

Social Media Marketing and Search Engine Optimization By: Joe Laratro

Transcription:

FIGHTING SPAM ON SOCIAL WEBSITES AND SOCIAL I.P. IN NEWS

LERMAN: SOCIAL INFORMATION PROCESSING IN SOCIAL NEWS AGGREGATION Premise: The web has now become a participatory medium. Users create, evaluate, and distribute info. Findings: Social networks play an important role in document recommendation. We can mathematically determine what a group s evaluation of a document is based on an aggregate of individual choices. DIGG: http://digg.com/

COMMON CHARACTERISTICS OF SOCIAL MEDIA Users can: Annotate Evaluate Create or Contribute Content Create Social Network

DIGG A social news aggregator Users submit stories and moderate them ~ 1 to 2 news per minute When a story is submitted: Goes to the upcoming stories queue Users can: Digg the story (+) Bury the story With enough bury votes the story is removed

DIGG PARTS Emergent front page Story promoted to front page when it gets enough votes Consensus between many independent users Social networks Users can designate others as their friends Digg tracks user s friends activities Users help each other find interesting news Social filtering Top users Ranking based on how many stories on the front-pg Marketers paid top users for promotion Discontinued after Feb. 2007

SOCIAL FILTERING Out of 2858 stories Submitted by 1570 distinct users Only 98 made it to front page By 60 users Dynamics of votes on select stories Over a period of 4 days Small rectangle In the upcoming Q Dashed line transition to the front page Interestingness When the votes saturate

USER RANK VS. INTERESTINGNESS Top-ranked users are not submitting stories that get the most votes

SOCIAL NETWORKS AND RECOMMENDATION Top-ranked users not submitting the most interesting stories why successful? social filtering, promotes their stories Friend relationship asymmetric User s success defined as: fraction of front-paged stories submitted by user correlated with his social network size

FRIENDS VS. REVERSE FRIENDS Top users have larger social networks Top 1020 users Black circles top 33 users

CORRELATION BTWN SUCCESS & NETWORK The larger the social network the more successful

DIGGING BEHAVIOR OF USERS Users digg stories that: Their friends submit Their friends have dugg # of voters that are among the reverse friends of submitter

CHANGE IN PROMOTION ALGORITHM Digg s goal: feature only the most interesting on front page Takes into account opinions of thousands of users Not just a few dedicated users Problem: top users had a cabal Game system by automatically vote for each other New algorithm Looks at unique digging diversity of the individuals Reduces dominance of top users 3072 stories submitted by 1865 users 71 promoted by 63 users

RESULTS OF NEW ALGORITHM Rank distr. Less skewed towards top-ranked users 50% of stories from users with rank < 300 Rather than rank < 25

MATHEMATICAL MODEL OF RATING Interestingness coefficient r Probability of receiving a positive vote once seen # of votes depends on visibility # of people who can see and follow the story Types of visibility Visibility on the front page (Vf) Visibility in the upcoming stories queue (Vu) Visibility through the Friends interface Friends of the submitter (Vs) Friends of voters (Vm)

VISIBILITY CALCULATION t: time since the story s submission If m(t)>h visible on front page Story stays: In the upcoming Q for 24h In Friends interface for 48h

CONCLUSIONS Social information processing allows users to solve hard information processing problems like document recommendation and filtering, and quality evaluation. Social networks as a good basis for recommendation. Modeled collective voting using ratings on Digg. As we have seen with crowdsourcing, it is possible to exploit user activity to solve hard problems. There are more potential applications in personalization, search, and discovery.

QUESTIONS What are your overall impressions of the paper? Which elements of this paper did you most like or dislike? Can you think of any improvements or future directions? How might this be relevant to our projects?

HEYMANN, ET AL.: FIGHTING SPAM ON SOCIAL WEBSITES, A SURVEY OF APPROACHES AND FUTURE CHALLENGES Problem: With the rise of social websites, we also see a growing amount of spam. Spam undermines the trustworthiness and overall. experience of these sites. Findings: Spam on social websites must be addressed differently from spam in other contexts. Three types of countermeasures were investigated: Detection Demotion Prevention Using network information improves spam detection models. Synthetic spam detection models have been underutilized.

SPAM Spam Characteristics: Goals fall under advertising, self-promotion, disruption/destruction, information gathering, and/or disparaging a competitor. Manifestations include audio, documents, annotations/comments, profiles, images, links. Depends on the attacker s goal. Social Website Characteristics: One controlling entity, well-defined interactions, identity, multiple interfaces. In social media sites, spammers have more avenues for attack, but service providers have more defense strategies available. They have better enforceability and better knowledge of user identity and history than in other contexts. Choice of social bookmarking as an example is interesting, since these sites seem to have died out, or at least have been modified considerably.

IDENTIFICATION-BASED STRATEGIES (DETECTION) 1. Identify material that is likely spam (many possible object types) 1. Manually by users of trusted monitors 2. Automatically through source, text, behavioral, or link analysis. 1. Text analysis of terms, stylistic elements, or with a corpus. 2. Behavioral analysis can be very useful because typical interactions on social website are often quite well-defined. 2. Delete the information or mark that it is probably spam. Methods are commonly used on email as well.

RANK-BASED STRATEGIES (DEMOTION) Reduce the prominence of context likely to be spam. Generally done through an automated system, but can also be done manually (Ex: Reddit). Creating personalized ranks by incorporating elements such as geography can make this more robust. Difficult to implement on time-ranked interfaces. This method is also commonly applied on the web.

INTERFACE- OR LIMIT-BASED (PREVENTION) Make the contribution of spam difficult through secret details or limiting automated interaction. Interface based or limit-based. Create restrictions on access to actions (ex: CAPTCHA, graylisting) Common in other types of sites and email as well. Fairly universal. Place restrictions on number of actions allowed (hard-number limits; charges) Not often implemented Which tactics do you think most effectively minimize social spam? Are there other possibilities?

EVALUATION MODELS AND METRICS Evaluation is important because many of these strategies are costly to implement. The subjectivity of spam and malicious activity can make modeling difficult. Synthetic spam models versus trace-driven models. Both require annotation. Trace-driven is generally preferred and considered more realistic. Authors argue synthetic models should be utilized to avoid sample bias and time consumption, and to account for site growth/change. However, these are more difficult to interpret. Metrics: Classification (precision and recall) Ranking (SpamFactor, mean average precision), broader problems. What did you think about the synthetic and trace-driven models? What about the metrics used by the author?

EXPERIMENT Created a synthetic model on a tagging system. Implemented a tag budget. The system returns ranked lists of objects from single-tag queries. Boolean model Occurrence-based model (counting tuples) Coincidence-based model (user reliability) Leveraging social knowledge such as tag coincidences improved spam detection by a factor of 2 (when compared to SpamFactor). This is only if no collusion takes place. Also, what about users with mixed good and bad behavior? Coincidence-based > Occurrence-based > Boolean

CONCLUSIONS The solutions detailed in the study will likely work better for social media sites than in other contexts due to their highly structured nature. Importance of not annoying users with too much spam prevention. In the future, spammers will probably start coming up with more specialized spam methods, and these techniques may not be specialized enough. Cycle of improved sophistication on both sides. Which combinations of countermeasures will be most effective in which environments? What is the appropriate combination of manual and automated work?

MORE QUESTIONS Which elements of this paper did you most like or dislike? Do you think spammers have evolved since 2007? Can you think of any examples? Is it still appropriate or useful to group all social media sites together? How are these concepts relevant to our projects? Are there any in particular that you think we should consider implementing?