HTTPS Traffic Classification



Similar documents
Cleaning Encrypted Traffic

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification

SSL EXPLAINED SSL EXPLAINED

Traffic Classification with Sampled NetFlow

Virtual Private Networks

Computer Networks. Secure Systems

Early Recognition of Encrypted Applications

European developer & provider ensuring data protection User console: Simile Fingerprint Filter Policies and content filtering rules

Access Control Rules: URL Filtering

Encrypted Internet Traffic Classification Method based on Host Behavior

Towards Web Service Classification using Addresses and DNS

ETSF10 Part 3 Lect 2

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

ACL Based Dynamic Network Reachability in Cross Domain

How the Great Firewall discovers hidden circumvention servers. Roya Ensafi David Fifield Philipp Winter Nick Weaver Nick Feamster Vern Paxson

Two State Intrusion Detection System Against DDos Attack in Wireless Network

Secure Sockets Layer

DNS Basics. DNS Basics

anomaly, thus reported to our central servers.

How To Classify Network Traffic In Real Time

Cryptography and network security CNET4523

An apparatus for P2P classification in Netflow traces

TLS and SRTP for Skype Connect. Technical Datasheet

Network Traffic Characterization using Energy TF Distributions

Flow-based detection of RDP brute-force attacks

Traffic Analysis of Mobile Broadband Networks

Spikes Security Isla Browser Isolation System. Prepared for Spikes Security

Deep analysis of a modern web site

February Considerations When Choosing a Secure Web Gateway

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April ISSN

1.1.1 Introduction to Cloud Computing

NetFlow Analysis with MapReduce

Internet Security. Internet Security Voice over IP. Introduction. ETSF10 Internet Protocols ETSF10 Internet Protocols 2011

Implementation of Botcatch for Identifying Bot Infected Hosts

Lab Exercise SSL/TLS. Objective. Step 1: Open a Trace. Step 2: Inspect the Trace

Analysis of Communication Patterns in Network Flows to Discover Application Intent

Security. Contents. S Wireless Personal, Local, Metropolitan, and Wide Area Networks 1

Protocol Security Where?

Measurement of the Usage of Several Secure Internet Protocols from Internet Traces

Classifying Service Flows in the Encrypted Skype Traffic

Nokia E90 Communicator Using WLAN

ATIS Open Web Alliance. Jim McEachern Senior Technology Consultant ATIS

How To Identify Different Operating Systems From A Set Of Network Flows

Security (II) ISO : Security Architecture of OSI Reference Model. Outline. Course Outline: Fundamental Topics. EE5723/EE4723 Spring 2012

HTTPS Inspection with Cisco CWS

Flow Analysis Versus Packet Analysis. What Should You Choose?

Monitoring of Tunneled IPv6 Traffic Using Packet Decapsulation and IPFIX

STRATEGY TO BLOCK TRAFFIC CREATE BY ANTI CENSORSHIP SOFTWARE IN LAN FOR SMALL AND MEDIUM ORGANISATION

Accessing Private Network via Firewall Based On Preset Threshold Value

Encryption, Data Integrity, Digital Certificates, and SSL. Developed by. Jerry Scott. SSL Primer-1-1

WebRTC: Why You Should Care and How Avaya Can Help You. Joel Ezell Lead Architect, Collaboration Environment R&D

Wireless Networks. Welcome to Wireless

Controlling SSL Decryption. Overview. SSL Variability. Tech Note

SSL DOES NOT MEAN SOL What if you don t have the server keys?

Detecting Spam in VoIP Networks. Ram Dantu Prakash Kolan

Introduction to Network. Topics

Key Hopping A Security Enhancement Scheme for IEEE WEP Standards

Predict the Popularity of YouTube Videos Using Early View Data

Network: several computers who can communicate. bus. Main example: Ethernet (1980 today: coaxial cable, twisted pair, 10Mb 1000Gb).

Analysing the impact of CDN based service delivery on traffic engineering

Thwarting Selective Insider Jamming Attacks in Wireless Network by Delaying Real Time Packet Classification

SWE 444 Internet and Web Application Development. Introduction to Web Technology. Dr. Ahmed Youssef. Internet

McAfee Web Gateway Administration Intel Security Education Services Administration Course Training

LOAD BALANCING AS A STRATEGY LEARNING TASK

Agenda. Taxonomy of Botnet Threats. Background. Summary. Background. Taxonomy. Trend Micro Inc. Presented by Tushar Ranka

Assuring Your Business Continuity

Redirecting and modifying SMTP mail with TLS session renegotiation attacks

Gold Support for NetFlow Tracker

EVOLVING ENTERPRISE NETWORKS WITH SPB-M APPLICATION NOTE

IBM i Version 7.3. Security Digital Certificate Manager IBM

Guidance Regarding Skype and Other P2P VoIP Solutions

Architecture. The DMZ is a portion of a network that separates a purely internal network from an external network.

How to Build an Effective Mail Server Defense

Fortinet Network Security NSE4 test questions and answers:

Using traffic analysis to identify The Second Generation Onion Router

Two SSO Architectures with a Single Set of Credentials

EE 7376: Introduction to Computer Networks. Homework #3: Network Security, , Web, DNS, and Network Management. Maximum Points: 60

Outline. Outline. Outline

IP addresses. IP addresses: IPv4: 32 bits:

Computer Networks: DNS a2acks CS 1951e - Computer Systems Security: Principles and Prac>ce. Domain Name System

INTRUSION PREVENTION AND EXPERT SYSTEMS

Chapter 17. Transport-Level Security

Supervised Learning (Big Data Analytics)

Project X Mass interception of encrypted connections

A seman(c firewall for Content Centric Networking

A Server and Browser-Transparent CSRF Defense for Web 2.0 Applications. Slides by Connor Schnaith

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

EMET 4.0 PKI MITIGATION. Neil Sikka DefCon 21

Transcription:

HTTPS Traffic Classification Wazen M. Shbair, Thibault Cholez, Jérôme François, Isabelle Chrisment Jérôme François Inria Nancy Grand Est, France jerome.francois@inria.fr NMLRG - IETF95 April 7th, 2016 1 / 26

Outline 1 The HTTPS Dilemma 2 SNI-Based Filtering 3 A Multi-Level Framework to Identify HTTPS Services 4 Evaluation 5 Conclusion 2 / 26

The HTTPS Dilemma Security vs. Privacy HTTPS or HTTP-over-TLS is a protocol for secure communication over a computer network. Content providers (Google, Facebook,...) need securing contents over the web by moving to HTTPS. Despite SSL/TLS good intentions, it may be used for illegitimate purposes. The main research question Can we rely on the monitoring techniques that don t decrypt HTTPS traffic? 3 / 26

Overview of SNI What is SNI? SNI is an extension inside Client Hello Message, proposed to support virtual hosting for websites use HTTPS. Figure : TLS handshake 4 / 26

SNI-based Filtering Evaluation SNI-based filtering SNI-Filtering has two weaknesses, regarding the backward compatibility and multiple services using a single certificate. The Escape plug-in is our proof of concept exploiting SNI weaknesses. Successfully tested against 3 firewalls and top 20 visited websites such as Google Search, Facebook, Youtube, Twitter. Publication W.Shbair, T.Cholez, A.Goichot, I.Chrisment: Efficiently Bypassing SNI-based HTTPS Filtering, IFIP/IEEE IM2015. 5 / 26

Identifying HTTPS Services Flow-Based Statistical improvements One way is to combined it with algorithms from different fields like Machine Learning (ML) [1]. It has been used widely in the identification of encrypted traffic problem. Mainly used to identifying the type of applications, such as (HTTPS, Mail, P2P, VoIP, SSH, Skype, etc.). 6 / 26

Identifying HTTPS Services Flow-Based Statistical improvements One way is to combined it with algorithms from different fields like Machine Learning (ML) [1]. It has been used widely in the identification of encrypted traffic problem. Mainly used to identifying the type of applications, such as (HTTPS, Mail, P2P, VoIP, SSH, Skype, etc.). New Challenges Considering all HTTPS as a single class is not enough for security monitoring because it regroups very different services. 6 / 26

Identifying HTTPS Services Website Fingerprinting (WF) Defined as the process of identifying the URL of web pages that are accessed. Identifying accessed HTTPS encrypted web pages base on static object size parsed from unencrypted traffic [2]. 7 / 26

Identifying HTTPS Services Website Fingerprinting (WF) Defined as the process of identifying the URL of web pages that are accessed. Identifying accessed HTTPS encrypted web pages base on static object size parsed from unencrypted traffic [2]. WF Issue It fails with dynamic web pages that use HTTPS Content Delivery Network (CDN) such as Akamai. (Too fine-grained) 7 / 26

A Multi-Level Framework to Identify HTTPS Services The motivation An intermediate identification method monitors at service-level. Identify the HTTPS services without relying on header fields. Do not decrypt the HTTPS traffic. The core techniques 1 Machine Learning techniques. 2 Novel multi-level classification approach. 3 Well tuned set of features 8 / 26

Machine Learning Techniques Figure : Flat classification view The Legacy method The existing methods follow the FLAT view. Identifying the websites and applications directly. Drawbacks: low scalability, low accuracy and high error rate. 9 / 26

A Novel Multi-Level Classification Approach Figure : Multi-level presentation Multi-level method Reform the training dataset into a tree-like fashion. The top level is refereed as Class-level (Root domain) The lower Level contains individual Folds-level (Sub-domain) 10 / 26

A Multi-Level Framework to Identify HTTPS Services Figure : The work-flow of the HTTPS traffic identification framework 11 / 26

Multi-Level Classification Approach The novel evaluation method A novel method more suitable for multi-level approach: If service provider and the service name are predicted Perfect identification. If service provider is predicted but not the service name Partial identification. If neither service provider nor the service name are predicted Invalid identification 12 / 26

Methodology Overview The evaluation of the proposed solution contains 3 parts: Evaluation of the collected dataset. Evaluation of the proposed features set. Evaluation of the multi-level classification approach. Evaluation of the collected dataset Contains more than 288,901 HTTPS connections. Pre-processed to be suitable for multi-level approach. Processed to determine a reasonable threshold for the minimum number of labelled connections per service. 13 / 26

Features selection Evaluation of the proposed features set Classical 30 features from previous work [3, 4] New 12 features are proposed over the encrypted payload The 42 features are optimized by Features Selection technique The key benefits is reducing over-fitting by removing irrelevant and redundant features [5] Feature Selection result 18 features are highly relevant: 10 out of 12 from our proposed set and 8 out of 30 from the classical ones. This validates the rationale of the proposed features for identifying HTTPS services. 14 / 26

The 18 selected features Client Server Inter Arrival Time (75th percentile) Client Server Packet size (75th percentile, Maximum), Inter Arrival Time (75th percentile), Encrypted Payload( Mean, 25th, 50th percentile, Variance, maximum) Server Client Packet size (50th percentile, Maximum), Inter Arrival Time (25th, 75th percentile), Encrypted payload(25th, 50th, 75th percentile, variance, maximum) 15 / 26

Experiments and Evaluation Results Evaluation of the proposed features set By using WEKA 1 tool the features set are evaluated by C4.5 and RandomForest algorithm: Classical 30-features: C4.5 achieves 83.4%±1.0 Precision, RandomForest achieves 85.7%±0.4 Precision. Selected 18-features: C4.5 achieves 85.87%±0.64 Precision, RandomForest achieves 87.60%±0.10 Precision. Full 42-features: C4.5 achieves 86.65%±0.7 Precision, RandomForest achieves 87.82%±0.68 Precision. 1 www.cs.waikato.ac.nz 16 / 26

Minimal number of connections 17 / 26

Muti-level classification HTTPS Identification Framework Evaluation The framework has been evaluated in two steps: Evaluate each level separately, to measure the performance of each classification model. Evaluate the whole framework as one black box. Evaluation conditions: Full features set (42 features). RandomForest as ML algorithm. At least 100 connections number per service. K-Fold cross validation with k=10. 18 / 26

Evaluation Results Top Level Evaluation Experiments show that we can identifying the service provider of HTTPS traffic with 93.6% overall accuracy. Figure : Top Level of the framework 19 / 26

Evaluation Results Second Level Evaluation A separate classification models are built and evaluated for each service provider with the same approach used in the Top-level. Figure : Second Level of the framework 20 / 26

Evaluation Results Second Level Evaluation From 68 distinct service providers, 51 service providers have more than 95% of good classification of their own different services. For example, we can differentiate between 19 services run under Google.com, with 93% of Perfect identification. Table : The second level models accuracy Accuracy Range Nb of service providers - Classical Features Full Features Selected Features 100-95% 50 51 51 95-90% 5 5 5 90-80% 6 6 6 Less than 80% 7 6 6 21 / 26

Global results Evaluate the framework as black-box (Level1&2) Results show that we achieve 93.10% of Perfect identification and 2.9% of Partial identification. Figure : The complete classification model 22 / 26

Stability over time The classification errors over time We can notice that even after 23 weeks without new learning phase, we still identify 80% (error <20%) of HTTPS services. 1 Classification Error % 0.8 0.6 0.4 0.2 0 1-week 2-weeks 16-weeks 17-weeks 22-weeks 23-weeks Period per Week Figure : Effect upon classification error over time 23 / 26

Conclusion & Future work Conclusion A complete framework to identify the HTTPS services with several innovations (Multi-level classification, SNI-labelling, new set of features). Based on real traffic, the results show that despite the challenging task, a high level of accuracy of 93.10% achieved. Future Work To adapt and extend our current framework for real-time analysis identification of HTTPS services. Improve the global security of networks especially by developing a HTTPS firewall. 24 / 26

Publications Publications 1 W.Shbair, T.Cholez, A.Goichot, I.Chrisment: Efficiently Bypassing SNI-based HTTPS Filtering, IFIP/IEEE IM2015. 2 W.Shbair, T.Cholez, J.Franois, and I.Chrisment, A multi-level framework to identify HTTPS services, (To appear in IFIP/IEEE NOMS2016), April/2016. 25 / 26

References [1] W. de Donato, A. Pescape, and A. Dainotti, Traffic identification engine: an open platform for traffic classification, Network, IEEE, vol. 28, no. 2, pp. 56 64, 2014. [2] B. Miller, L. Huang, A. D. Joseph, and J. D. Tygar, I know why you went to the clinic: Risks and realization of https traffic analysis, in Privacy Enhancing Technologies, pp. 143 163, Springer, 2014. [3] Y. Okada, S. Ata, N. Nakamura, Y. Nakahira, and I. Oka, Comparisons of machine learning algorithms for application identification of encrypted traffic, in Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, vol. 2, pp. 358 361, IEEE, 2011. [4] Y. Kumano, S. Ata, N. Nakamura, Y. Nakahira, and I. Oka, Towards real-time processing for application identification of encrypted traffic, in Computing, Networking and Communications (ICNC), 2014 International Conference on, pp. 136 140, IEEE, 2014. [5] H. Kim, K. C. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, Internet traffic classification demystified: myths, caveats, and the best practices, in Proceedings of the 2008 ACM CoNEXT conference, p. 11, ACM, 2008. 26 / 26