Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering



Similar documents
Detecting Spam in VoIP Networks. Ram Dantu Prakash Kolan

A VoIP Traffic Monitoring System based on NetFlow v9

Monitoring SIP Traffic Using Support Vector Machines

Prevention of Spam over IP Telephony (SPIT)

How To Protect Your Phone From Being Hacked By A Man In The Middle Or Remote Attacker

Voice Printing And Reachability Code (VPARC) Mechanism for prevention of Spam over IP Telephony (SPIT)

point to point and point to multi point calls over IP

Application Note. Onsight Connect Network Requirements V6.1

Internet Security. Internet Security Voice over IP. Introduction. ETSF10 Internet Protocols ETSF10 Internet Protocols 2011

Application Note. Firewall Requirements for the Onsight Mobile Collaboration System and Hosted Librestream SIP Service v5.0

SIP : Session Initiation Protocol

Introduction to VoIP Technology

Unit 23. RTP, VoIP. Shyam Parekh

Session Initiation Protocol (SIP) The Emerging System in IP Telephony

TSIN02 - Internetworking

Mixer/Translator VOIP/SIP. Translator. Mixer

EE4607 Session Initiation Protocol

How To Attack A Phone With A Billing Attack On A Sip Phone On A Cell Phone On An At&T Vpn Vpn Phone On Vnet.Com (Vnet) On A Pnet Vnet Vip (Sip)

Chapter 10 Session Initiation Protocol. Prof. Yuh-Shyan Chen Department of Computer Science and Information Engineering National Taipei University

Computer Networks. Voice over IP (VoIP) Professor Richard Harris School of Engineering and Advanced Technology (SEAT)

Customize Approach for Detection and Prevention of Unsolicited Call in VoIP Scenario: A Review

VoIP. Overview. Jakob Aleksander Libak Introduction Pros and cons Protocols Services Conclusion

CSC574 - Computer and Network Security Module: Intrusion Detection

A Study on Countering VoIP Spam using RBL

TECHNICAL CHALLENGES OF VoIP BYPASS

CE Advanced Network Security VoIP Security

Formación en Tecnologías Avanzadas

VIDEOCONFERENCING. Video class

Basic Vulnerability Issues for SIP Security

SIP A Technology Deep Dive

Voice Over IP. Priscilla Oppenheimer

Chapter 2 PSTN and VoIP Services Context

Integrating Voice over IP services in IPv4 and IPv6 networks

NAT TCP SIP ALG Support

Software Engineering 4C03 VoIP: The Next Telecommunication Frontier

SIP Intrusion Detection and Response Architecture for Protecting SIP-based Services

A Model-based Methodology for Developing Secure VoIP Systems

SIP-based VoIP Traffic Behavior Profiling and Its Applications

proudly presents Homer-Shooting The secret Art of Troubleshooting VoIP in Real-Time with Homer & SIPGrep

A Comparative Study of Signalling Protocols Used In VoIP

Denial of Service attacks: analysis and countermeasures. Marek Ostaszewski

SHORT DESCRIPTION OF THE PROJECT...3 INTRODUCTION...4 MOTIVATION...4 Session Initiation Protocol (SIP)...5 Java Media Framework (JMF)...

Radware s Behavioral Server Cracking Protection

An outline of the security threats that face SIP based VoIP and other real-time applications

Indepth Voice over IP and SIP Networking Course

Intrusion Detection in Voice-over-IP Environments

Voice over IP Security

NCAS National Caller ID Authentication System

Threat Mitigation for VoIP

Security & Reliability in VoIP Solution

Implementing SIP and H.323 Signalling as Web Services

Ram Dantu. VOIP: Are We Secured?

ETM System SIP Trunk Support Technical Discussion

internet technologies and standards

2.2 SIP-based Load Balancing. 3 SIP Load Balancing. 3.1 Proposed Load Balancing Solution. 2 Background Research. 2.1 HTTP-based Load Balancing

Managing Risks at Runtime in VoIP Networks and Services

A Novel Approach for Evaluating and Detecting Low Rate SIP Flooding Attack

Voice Over IP (VoIP) Denial of Service (DoS)

Internet Working 15th lecture (last but one) Chair of Communication Systems Department of Applied Sciences University of Freiburg 2005

10 Key Things Your VoIP Firewall Should Do. When voice joins applications and data on your network

VoIP in Flow A Beginning

Application Notes for Configuring SIP Trunking between Metaswitch MetaSphere CFS and Avaya IP Office Issue 1.0

This presentation discusses the new support for the session initiation protocol in WebSphere Application Server V6.1.

Prevention of Anomalous SIP Messages

SIP Essentials Training

Application Notes for Avaya IP Office 7.0 Integration with Skype Connect R2.0 Issue 1.0

Implementing a Voice Over Internet (Voip) Telephony using SIP. Final Project report Presented by: Md. Manzoor Murshed

Functional Specifications Document

Best Practices for Securing IP Telephony

Two-Stage Forking for SIP-based VoIP Services

White paper. SIP An introduction

IxLoad: Advanced VoIP

SIP, Session Initiation Protocol used in VoIP

How to make free phone calls and influence people by the grugq

Detecting Spam in VoIP networks

Introducing Cisco Voice and Unified Communications Administration Volume 1

VOICE OVER IP SECURITY

SURVEY OF INTRUSION DETECTION SYSTEM

IP PBX using SIP. Voice over Internet Protocol

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP LTM for SIP Traffic Management

Network Connection Considerations for Microsoft Response Point 1.0 Service Pack 2

CS Computer and Network Security: Intrusion Detection

Project Code: SPBX. Project Advisor : Aftab Alam. Project Team: Umair Ashraf (Team Lead) Imran Bashir Khadija Akram

Kommunikationsdienste im Internet Möglichkeiten und Risiken

The use of IP networks, namely the LAN and WAN, to carry voice. Voice was originally carried over circuit switched networks

3.1 SESSION INITIATION PROTOCOL (SIP) OVERVIEW

Session Initiation Protocol and Services

SIP: NAT and FIREWALL TRAVERSAL Amit Bir Singh Department of Electrical Engineering George Washington University

Voice over IP Fundamentals

Transcription:

Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering Yu-Sung Wu, Saurabh Bagchi Purdue University, USA Navjot Singh Avaya Labs, USA Ratsameetip Wita Chulalongkorn University, Thailand Slide 1/29 Voice-over-IP (VoIP) Overview Session Initiation Protocol (SIP) or H.323 for signaling Real-time Transport Protocol (RTP) for media Media flow happens after a successful call setup, which is achieved through signaling Real-time Transport Protocol (RTCP) for feedback Other supporting protocols: DNS, DHCP, ICMP Slide 2/29

A (Phone) Sample Call Flow in VoIP S1 (Proxy) S2 (Proxy) B (Phone) Invite F1 Invite F2 100 Trying F3 Invite F4 100 Trying F5 180 Ringing F6 180 Ringing F8 180 Ringing F7 200 OK F9 200 OK F10 200 OK F11 ACK F12 Media Session BYE F13 200 OK F14 Slide 3/29 Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: e-mpck- Means, p-mpck-means 6. Call Trace and Experiments 7. Conclusions Slide 4/29

Spam Calls in VoIP Systems SPam over Internet Telephony (SPIT) Unsolicited and unwanted phone calls from (malicious) parties Telemarketing calls Harassing calls Survey / polling calls Why is this a growing phenomenon? VoIP calls are cheap to make SPIT is very easy to automate Comparison with e-mail spam: Motives and impacts are analogous But, more disruptively, a VoIP spam intrudes in real-time Slide 5/29 Challenges for Dealing with VoIP Spam A spam call in many ways appears like a normal (non- SPIT) call Both follow the same protocols (SIP, H.323, RTP, RTCP) No malformed packets No exploitation of protocol vulnerabilities Existing NIDS systems (Snort, SCIDIVE [1], ) do not apply VoIP is a real-time system Before you pick up the call, can you tell if it s going to be a spam call? [1] Y-S. Wu, S. Bagchi, S. Garg, N. Singh, T. Tsai, SCIDIVE: A Stateful and Cross Protocol Intrusion Detection Architecture for Voice-over-IP Environments, DSN 05, pp. 401-410. Slide 6/29

Challenges for Dealing with VoIP Spam VoIP system is a dynamic environment Call duration, call frequency, the words you say, can all be changing from one deployment to another Different persons have different perspectives on what constitute a spit call Some might be interested in buying merchandise from telemarketers while they do dislike other harassing phone calls. Therefore, fixed threshold-based rules for detection are not suitable for filtering spam calls Slide 7/29 Contribution Identify features from a VoIP call for spam detection Clustering of VoIP calls to identify spam calls Use of user-feedback and semi-supervised clustering technique to differentiate between spam and legitimate calls Adapting the original MPCK-Means [2] algorithm into: empck-means : A O(N) algorithm for clustering a batch of VoIP calls pmpck-means M : A real-time algorithm for detecting ti VoIP spam [2] M. Bilenko, S. Basu, and R. J. Mooney, "Integrating constraints and metric learning in semi-supervised clustering," in ICML, 2004, pp. 81-88. Slide 8/29

Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: e-mpck- Means, p-mpck-means 6. Call Trace and Experiments 7. Conclusions Slide 9/29 Legend System Architecture : normal user Our Contribution S : spitter SIP based VoIP Proxy Server #1 Server-side Detector Spit Detector SIP based VoIP Proxy Server #2 Client-side Detector Client-side Detector Client-side Detector S A B C S E F Slide 10/29

VoIP Call Features 17 call features extracted from VoIP signaling and media traffic used here for clustering A. Call Establishment B. Media Stream (RTP/RTCP) / Call Maintenance C. Call Tear Down 1-2. From/To URI 3. Start time 4. Duration 5. # of SIP INVITE messages 6. # of SIP ACK messages 7-8. # of SIP BYE messages from caller/callee 9. Time since the last call from the originator of the current call 10-15. # of 1xx, 2xx, 3xx, 4xx, 5xx, and 6xx SIP response messages 16. Call frequency of the originator of the current call 17. Ratio of non-silence duration of the callee to the caller media streams Slide 11/29 Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: empck- Means, pmpck-means 6. Call Trace and Experiments 7. Conclusions Slide 12/29

Basic Clustering Objective: Cluster calls into legitimate and spam calls Classic K-Means clustering K x μ i i j= 1 x i X j Objective function puts weight on each feature evenly However, there may be only a few call features that can distinguish between the different clusters Putting equal weight on all the selected features can drown out the influence of these distinguishing features 2 is minimized Slide 13/29 Semi-supervised clustering MPCK-Means Distance from centroids (reweighted by A matrix) Cost from violating must-link constraints (pairs of data points which should be put in the same cluster) Cost tfrom violating cannot-link constraints (pairs of data points which should be put in different clusters) τ mpckm ( ( i )) 2 xi μl log det A i l Al x χ i = i ( x,x ) i j i ( x,x ) i j i x M x C ( ) + wij fm x i,x j 1 li lj ( ) + wij fc x i,x j 1 li = lj τ mpckm ( ) ( ) 2 T i μi = A i μi l i i μi x x A x l i is miminized. Slide 14/29

How to Update A matrix The A matrix A h for cluster h is acquired by solving Covariance of data points in cluster h Cost from violating must-link constraints related to cluster h Cost from violating cannot-link constraints related to cluster h τ 1 mpckm A h = 0 T Ah = Xh ( xi μh)( xi μh) xi Xh 1 T + wij ( ) ( xi xj)( xi xj) 1 li lj xi, xj Mh 2 T ' '' ' '' + wij ( ) ( xh xh)( xh xh) xi, xj Ch ( )( ) T xi xj xi xj 1 li = lj Slide 15/29 Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: e-mpck- Means, p-mpck-means 6. Call Trace and Experiments 7. Conclusions Slide 16/29

Our Contribution: empck-means Batch mode of operation Improvement in runtime: A O(N) approximation version of MPCK-Means MPCK-Means is O(N 3 ) O(N) complexity cluster initialization Skip the pair-wise constraints => O(N 2 ) Use the set of flagged spam calls, flagged legitimate calls, and the set of the rest of calls directly for cluster initialization Efficient estimation of maximally separated points Embed the estimation in the distance calculation Use a constant number of constraints in cluster assignment step Experiment results from [2] suggest that MPCK-Means can work reasonably well with only a few constraints Slide 17/29 Our Contribution: empck-means Improvement in clustering quality: Pre metrics update on the starting cluster(s) Update A matrix once before entering the main-loop of MPCK-Means Results in an initial A matrix which reflects the user feedback information better In comparison, an identity matrix is used as the initial A matrix in MPCK-Means Slide 18/29

pmpck-means For real-time spam detection: Hang up a suspect call even before media flow starts Only allowed to use features available at call establishment phase From URI, To URI, Start time, and Time since the last call from the originator of the current call For most of the time, each new data point (an incoming call) only involves a cluster assignment operation O(1) complexity Occasionally, empck-means is invoked to recondition the clustering Re-compute the clusters, A matrix, etc. Can be carried out in an asynchronous manner in the background Slide 19/29 empck-means (multi-class) With MPCK-Means, empck-means, and pmpck- Means, we create only two clusters: Cluster of spam calls and cluster of legitimate calls Because user feedback only provides a binary predicate on whether a call is spam / legitimate empck-means (multi-class) Use of expert knowledge to differentiate different types of calls Split each cluster (spam or legitimate) into three sub-clusters based on call types: Calls going to voice mail box Calls terminated by the user immediately after the call is established The remaining types of calls Slide 20/29

Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: e-mpck- Means, p-mpck-means 6. Call Trace and Experiments 7. Conclusions Slide 21/29 Name Legitimate Call Length Call Traces for Experiments Legitimate Call Interarrival time Spam Call Length Spam Call Interarrival time Total # of Legitimate Calls Total # of Spam Calls v4 5 30 1 2 171 212 v5 5 10 1 10 338 45 v6 5 30 1 10 289 94 v7 5 30 5 10 302 81 Common characteristics for spam calls: There are 6 spitters in the system 10% chance of a call being hung up by the caller Non-silence period in media stream is dominated by the spitter Common characteristics for legitimate calls: There are 90 legitimate users in the system 60% chance of a call being hung up by the caller Slide 22/29

Experiment: Effect of user feedback empck True Positive Rate across call traces True Positive: (# of actual spam calls detected) / (# of detected calls) v4 is the easiest, followed closely by v6, and then v7. v5 is the hardest. Slide 23/29 Experiment: Effect of user feedback Comparing 4 algorithms (use call trace v4) Pre metric update boosting improves the performance in empck A small amount of user feedback is enough to make the detection accurate enough Slide 24/29

Experiment: Noise in user feedback Call trace 6, user feedback fixed at 0.3 pmpck is not really usable The others work with low noise level The use of pre-metric update hurts the performance of empck when noise level is past 0.5 (the majority of user feedback is inaccurate) Slide 25/29 Experiment: Quality and quantity of user feedback 1 1 ( ) i i Volume = TP FP df dn n= 0 f = 0.1 n: noise level, f:feedback ratio TP-FP v6 avg. across all traces MPCK -0.319-0.314 empck (Multi Class) -0.330-0.314 empck -0.272-0.287 empck (TP-FP) for call trace v6 pmpck -0.371-0.341 Slide 26/29

Experiment: Scalability Call trace 7 is used for this experiment. empck is at least 15X faster than MPCK empck exhibits linear time complexity Slide 27/29 Outline 1. VoIP Overview 2. Challenges in VoIP Spam Detection 3. System Architecture 4. Semi-supervised Clustering 5. Efficient Clustering for Spam Detection: e-mpck- Means, p-mpck-means 6. Call Trace and Experiments 7. Conclusions Slide 28/29

Conclusion Propose a solution to detect VoIP spam Our solution is built upon semi-supervised clustering Able to adapt to different environments and needs Come up with scalable algorithm for batch detection of VoIP spam Useful and practical for service provider Detect VoIP spam in real-time is hard pmpck-means is barely usable due to the limited available features during call establishment Future Work Better real-time detection Sharing signatures of spam calls across ISPs Slide 29/29 START SPIT Detection Process Flow New phone call detected A1 B1 Annotate phone call with optional user feedback information B2 Collection of all detected phone calls B3 B4 Semisupervised clustering Early Detection (before media stream is established) A2 A3 Clusters of spit calls. Terminate the phone call if detected as a spit call Clusters of non-spit calls. B5 Service provider intervention Slide 30/29

Experiment / Effect of user feedback Call trace v7 / True Positive rate Slide 31/29