Obfuscation of sensitive data in network flows 1

Similar documents
Network Management & Monitoring

Research on Errors of Utilized Bandwidth Measured by NetFlow

(Big) Data Anonymization Claude Castelluccia Inria, Privatics

Cisco IOS Flexible NetFlow Technology

Data attribute security and privacy in distributed database system

Network congestion control using NetFlow

Outline. Outline. Outline

Netflow Overview. PacNOG 6 Nadi, Fiji

Lab Characterizing Network Applications

Network-Wide Capacity Planning with Route Analytics

Experimentation driven traffic monitoring and engineering research

NetFlow: What is it, why and how to use it? Miloš Zeković, ICmyNet Chief Customer Officer Soneco d.o.o.

ACL Based Dynamic Network Reachability in Cross Domain

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

A Catechistic Method for Traffic Pattern Discovery in MANET

Introduction to Network Discovery and Identity

Network Monitoring and Management NetFlow Overview

Wireshark Developer and User Conference

An apparatus for P2P classification in Netflow traces

NSC E

Anonym: A Tool for Anonymization of the Internet Traffic

CS346: Advanced Databases

NetStream (Integrated) Technology White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

Network Tomography and Internet Traffic Matrices

Page 1. Outline EEC 274 Internet Measurements & Analysis. Traffic Measurements. Motivations. Applications

Network Measurement. Why Measure the Network? Types of Measurement. Traffic Measurement. Packet Monitoring. Monitoring a LAN Link. ScienLfic discovery

NfSen Plugin Supporting The Virtual Network Monitoring

Network Monitoring and Traffic CSTNET, CNIC

NetFlow use cases. ICmyNet / NetVizura. Miloš Zeković, milos.zekovic@soneco.rs. ICmyNet Chief Customer Officer Soneco d.o.o.

Cisco NetFlow TM Briefing Paper. Release 2.2 Monday, 02 August 2004

DDoS Mitigation Techniques

An Efficient Filter for Denial-of-Service Bandwidth Attacks

Course Overview: Learn the essential skills needed to set up, configure, support, and troubleshoot your TCP/IP-based network.

Network traffic monitoring and management. Sonia Panchen 11 th November 2010

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

NetFlow Aggregation. Feature Overview. Aggregation Cache Schemes

Introduction to Netflow

Detecting Botnets with NetFlow

3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A.

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Minimal network traffic is the result of SiteAudit s design. The information below explains why network traffic is minimized.

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC)

Flow Analysis. Make A Right Policy for Your Network. GenieNRM

Scalable Extraction, Aggregation, and Response to Network Intelligence

Network Security through Software Defined Networking: a Survey

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer

DATA MINING - 1DL360

Quality Certificate for Kaspersky DDoS Prevention Software

EMIST Network Traffic Digesting (NTD) Tool Manual (Version I)

First Midterm for ECE374 03/09/12 Solution!!

Strategies to Protect Against Distributed Denial of Service (DD

Configuring Static and Dynamic NAT Simultaneously

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Cisco ASA and NetFlow Using ASA NetFlow with LiveAction Flow Software

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

Catalyst 6500/6000 Switches NetFlow Configuration and Troubleshooting

Scan Detection - Revisited

co Characterizing and Tracing Packet Floods Using Cisco R

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

Routing and traffic measurements in ISP networks

The Benefits. Locator/ID Separation

Live Traffic Monitoring with Tstat: Capabilities and Experiences

Avaya ExpertNet Lite Assessment Tool

Network Log Anonymization: Application of Crypto-PAn to Cisco Netflows

Introduction to Cisco IOS Flexible NetFlow

Building Secure Network Infrastructure For LANs

Using IPM to Measure Network Performance

Appendix A Remote Network Monitoring

The Value of Flow Data for Peering Decisions


for guaranteed IP datagram routing

NetFlow Performance Analysis

and reporting Slavko Gajin

J-Flow on J Series Services Routers and Branch SRX Series Services Gateways

Flow Analysis Versus Packet Analysis. What Should You Choose?

Nino Pellegrino October the 20th, 2015

Detecting Network Anomalies. Anant Shah

Internet Traffic Trends A View from 67 ISPs

Limitations of Packet Measurement

Configuring Flexible NetFlow

UltraFlow -Cisco Netflow tools-

Network Security. Mobin Javed. October 5, 2011

Procedure: You can find the problem sheet on Drive D: of the lab PCs. 1. IP address for this host computer 2. Subnet mask 3. Default gateway address

Nfsight: NetFlow-based Network Awareness Tool

Network Data Monitoring and Analysis. Computer Networks Lecture's Seminar Lecturer:Assoc.Prof.Turgay ĠBRĠKÇĠ Prepared by Çağla TERLĠKCĠOĞULLARI

Network layer: Overview. Network layer functions IP Routing and forwarding

Transcription:

Obfuscation of sensitive data in network flows 1 D. Riboni 2, A. Villani 1, D. Vitali 1 C. Bettini 2, L.V. Mancini 1 1 Dipartimento di Informatica,Universitá di Roma, Sapienza. E-mail: {villani, vitali, mancini}@di.uniroma1.it 2 Dipartimento di Informatica e Comunicazione, Universitá degli Studi di Milano. E-mail: {daniele.riboni,claudio.bettini}@unimi.it 20 January 2012 1 InfoCom 2012, the 31st Annual IEEE International Conference on Computer Communications (to appear)

Table of contents Internet Infrastructure and Data set definition

Internet Actors IP Prefix (or network prefix): rappresentation of a set of IP, e.g. 192.168.1.0/24; Autonomos Systems (AS): is a collection of connected Internet Protocol routing prefixes under the control of one or more network operators; Internet Service Provider: is a company that provides access to the Internet; Internet exchange Point: is a physical infrastructure through which Internet Service Providers exchange Internet traffic between their networks;

Internet Infrastructure:Border Gateway Protocol (BGP) Hierarchical infrastructure: Tier 1: Full mesh network Tier 2: National Internet providers Tier 3: Local Internet Service Providers... Internet today: about 40.000 autonomous systems and 400.000 IP Prefixes

Internet routing protocol: BGP AS1 announce IP prefix X AS2 say to AS3: in order to reach IP X, packets cross through AS2,AS1 each topology change causes new updates or prefix withdraws

Data set definition:cisco TM Netflows Netflow is a network protocol developed by Cisco TM Systems for collecting IP traffic information. real time collection; active and passive timeouts; lightweight representation of network traffic; high representive; Netflow data can be used as support for Traffic and Attacks Detection, network monitoring, QoS and other network activities.

Data set fields definition A network flow has been defined in many ways. The traditional definition is to use a 7-tuple key, where a flow is defined as a unidirectional sequence of packets all sharing all of the following 7 values: Source IP address Destination IP address Source port for UDP or TCP, 0 for other protocols Destination port for UDP or TCP, type and code for ICMP, or 0 for other protocols IP protocol Ingress interface (SNMP ifindex) IP Type of Service

ExtrABIRE project: network flows probe Large set (more than 1 year) of network flows gathered from BGP router of Commercial and Istitutional Internet Service Provider. Data set expressiveness: 2 GBytes of full netflow entries contain 110 millions of flows, 2 billions ofpackets corrisponding to 5TByte of exchanged data.

The role of network flows data sets in network communities Log of network flows are a fundamental tool for modeling the network behavior, identifying security attacks, and validating research results. Security and privacy concerns inhibit the release of network data. Research experiments and evaluations of proposed algorithms use synthetic data: often random network data generated by stocastic distribution differs from real data; old and short data sets: new protocol, network paradigms as well as new network attacks strategy doesn t appear in these data sets;

Effects of the lacks of shared network flows Dark side effects: research results become hard to evaluate; research results are inconsistent; experiments are not reproducible; application of proposed strategy with real data provides unexpected results;...

Anonymity, meaning without a name or namelessness ; anonymity typically refers to the state of an individual s personal identity, or personally identifiable information, being publicly unknown. aimed to: de-anonymization of data sets; inferring private informations; obtains useful information about attack target networks.

: Taxonomy

Network flows data sets attacks: grouping by precondition 2 2 J. King, K. Lakkaraju, and A. J. Slagell, A taxonomy and adversarial model for attacks against network log anonymization, in Proc. of ACM SAC. ACM, 2009, pp. 1286 1293.

Network flows data sets attacks: Fingerprint Fingerprint: identification is performed by matching flows fields values to the characteristics of the target environment; i.e. knowledge of network topology or services of target hosts, etc.; Injection: the adversary injects a sequence of flows in the network to be logged, that are easily recognized due to their specific characteristics; e.g., marked with uncommon TCP flags, or following particular patterns

Network flows data sets attacks: Web Fingerprint In this paper we attempt to quantify the risks of publishing anonymized packet traces. [...], we examine whether statistical identification techniques can be used to uncover the identities of users and their surfing activities from anonymized packet traces. Our results show that such techniques can be used by any Web server that is itself present in the packet trace and has sufficient resources to map out and keep track of the content of popular Web sites to obtain information on the network-wide browsing behavior of its clients. 3 3 D. Koukis, S. Antonatos, and K.G. Anagnostakis,On the Privacy Risks of Publishing Anonymized IP Network Traces In Proceedings of Communications and Multimedia Security

Previous approaches Previous approaches provide encryption of identity fields (IP address) and different techniques on quantitative fields (e.g. TCP flags, traffic stats, etc.) permutation truncation generalization No formal proof of the obfuscation property of the solution proposed are provided!

Data anonymity approaches Definition (Fingerprint Quasi Identifier (fp-qi)) A field of a network flow is denoted as a fingerprint Quasi Identifier (fp-qi) if its value, possibly combined with external knowledge about the characteristics of the network hosts, can reduce the cardinality of the candidate set for source or destination IP addresses of the flow in L (obfuscated netflow dataset).

fp QI fields in netflow entry Source IP address Destination IP address Source port for UDP or TCP, 0 for other protocols Destination port for UDP or TCP, type and code for ICMP, or 0 for other protocols IP protocol Ingress interface (SNMP ifindex) IP Type of Service (flags)

Data anonymity approaches: K-anonymity K-anonymity Making any record indistinguishable in a group of at least K records based on quasi-identifier (QI) values (example) If you try to identify a man from a release, but the only information you have is his birth date and gender. There are k people meet the requirement. This is k-anonymity.

Data anonymity approaches: k-anonymity attacks k-anonymity does not provide privacy if Sensitive values in an equivalence class lack diversity (Homogeneity Attack, e.g. Bob, 27 years) The attacker has background knowledge A. MachanavaJJhala, D- Kifer, J. Gehrke, M. Venkitasubramaniam, l Diversity: Privacy Beyond k-anonymity, ACM Transactions on Knowledge Discovery from Data (TKDD)

Data anonymity approaches: l-diversity Each equivalence class has at least l well-represented sensitive values (example) In one equivalent class, there are ten tuples. In the Disease area, one of them is Cancer, one is Heart Disease and the remaining eight are Flu. This satisfies 3-diversity, but the attacker can still affirm that the target person s disease is Flu with the accuracy of 70%.

Data anonymity Drawbacks: database anonymity strategy are effective only under the assumption that each individual is the respondent of at most one record in the released microdata. In a network flows data sets, each IP (identity) can appers more and more times.. Data anonymity tecniques are not directly suitable!

Idea: Goal In this work, we propose a novel obfuscation technique for network flows that provides formal guarantees under realistic assumptions about the adversary s knowledge (fingerprint or injection attacks).

Idea: IP A: 129.19. 133.199 original flow f IP D: 66.200. 181.12 IP B: 213.16. 92.171 IP GROUP α obfuscated flow f * f*[fp-qi] = g*[fp-qi] IP GROUP β IP E: 72.149. 130.8 IP C: 194.15. 20.101 IP F: 194.158. 20.101... f* is indistinguishable from g* based on the hosts fingerprint... IP A: 129.19. 133.199 IP G: 68.120. 47.25 IP B: 213.16. 92.171 IP GROUP α obfuscated flow g * f*[fp-qi] = g*[fp-qi] IP GROUP δ IP H: 123.163. 32.80 IP C: 194.15. 20.101 original flow g IP I 203.48. 4.172 Make Group IP of K addresses based on their behavior affinity; Group flow such that at most J distinct IP share the same flow values.

: algorithm details 1/3 Input L: original set of network flows; fp-qi fp QI : set of fingerprint Quasi Indentified K: minimum group size Output L : Obfuscated data set

: algorithm details 2/3 Input L: original set of network flows; fp-qi fp QI : set of fingerprint Quasi Indentified K: minimum group size Output IP Groups: G 1, G 2,..., G j IP Groups identifier: GID 1, GID 2,..., GID j

: algorithm details 3/3 Input L: original set of network flows; fp-qi fp QI : set of fingerprint Quasi Indentified j: minimum number of ftp-indistinguishable flows τ: time granularity Output L : Obfuscated data set

: obfuscated data sets Each non fp-qi field changes as follow: src,dstip Group IP byte, packets (min, max) interval tos, proto set of values flags Xor-ed values

Suppressed flows 60 50 j=2 j=3 j=4 j=5 j=6 j=7 Suppressed flows (%) 40 30 20 10 0 1 2 4 8 16 32 Time granule τ (minutes)

Obfuscated data set quality There are no universally accepted criteria to evaluate Obfuscated or anonymized data set. Usually, many network data analysis tecniques use Information theory based approaches or statistical informations. Entropy based Query based

Experiment results: Information theory based analysis Entropy of source IP addresses distribution (one hour and week) H(x) = (p i log pi ) 9 original flows k=5 k=10 k=20 9 original flows k=5 k=10 k=20 Entropy on source IP addresses 8 7 6 Entropy on source IP address 7 5 5 12:00 12:05 12:10 12:15 12:20 12:40 12:35 12:30 12:25 Time of the day 12:45 12:50 12:55 13:00 Mon Tue Wed Sat Fri Thu Day of the week Sun Mon Tue

Experiment results: Information theory based analysis Entropy of destination IP addresses distribution (one hour and week) Entropy on destination IP addresses 10 9 8 7 original flows k=5 k=10 k=20 Entropy on destination IP address 10 8 6 original flows k=5 k=10 k=20 6 12:00 12:05 12:10 12:15 12:20 12:40 12:35 12:30 12:25 Time of the day 12:45 12:50 12:55 13:00 Mon Tue Wed Thu Sat Fri Day of the week Sun Mon Tue

Experiment results: statistical analysis We executed queries for each possible value/range, and for each minute in a one-hour time window, for a total of about 120, 000 queries. For each query, we calculated the error rate by the following formula: r t e = r t r t where r (resp. r ) is the result of the query on the original (resp. obfuscated) flows, and t (resp. t ) is the total number of original (resp. obfuscated) flows.

Experiment results: Tos, Proto, Flag bucketization 80 70 60 j=2 j=3 j=4 j=5 j=6 j=7 60 50 j=2 j=3 j=4 j=5 j=6 j=7 90 80 70 j=2 j=3 j=4 j=5 j=6 j=7 Average error (%) 50 40 30 Average error (%) 40 30 20 Average error (%) 60 50 40 30 20 10 10 20 10 0 1 2 4 8 16 32 Time granule τ (minutes) Figure: proto field 0 1 2 4 8 16 32 Time granule τ (minutes) Figure: flag field 0 1 2 4 8 16 32 Time granule τ (minutes) Figure: tos field

Experiment results: bytes and packets bucketization 60 50 j=2 j=3 j=4 j=5 j=6 j=7 45 40 35 j=2 j=3 j=4 j=5 j=6 j=7 Average error (%) 40 30 20 Average error (%) 30 25 20 15 10 10 5 0 1 2 4 8 16 32 Time granule τ (minutes) 0 1 2 4 8 16 32 Time granule τ (minutes) Figure: Query on byte field Figure: Query on packet field

K-J Obfuscation benefit Make flows indistinguishable by a fingerprint attack; Preserve traffic diversity and data quality; Formal guarantee of works (refer to paper for more details).

Questions? Thanks