Design and Implementation of an Interactive DBMS-supported Network Traffic Analysis and Visualization System



Similar documents
CISCO IOS NETFLOW AND SECURITY

UltraFlow -Cisco Netflow tools-

HTGR- Netflow. or, how to know what your network really did without going broke

Signature-aware Traffic Monitoring with IPFIX 1

Get Your FIX: Flow Information export Analysis and Visualization

IPV6 流 量 分 析 探 讨 北 京 大 学 计 算 中 心 周 昌 令

NSC E

Distributed Network Traffic Monitoring and Analysis using Load Balancing Technology

Research on Errors of Utilized Bandwidth Measured by NetFlow

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump

Network Monitoring and Management NetFlow Overview

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

Introduction to Netflow

EMIST Network Traffic Digesting (NTD) Tool Manual (Version I)

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Overview. Why use netflow? What is a flow? Deploying Netflow Performance Impact

An overview of traffic analysis using NetFlow

Cisco IOS Flexible NetFlow Technology

Introduction to Cisco IOS Flexible NetFlow

Symantec Event Collector for Cisco NetFlow version 3.7 Quick Reference

Scalable Extraction, Aggregation, and Response to Network Intelligence

TEIN2 Measurement and Monitoring Workshop Netflow.

How good can databases deal with Netflow data

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Using Argus to analyse network flows. David Ford OxCERT Oxford University Computer Services

Appendix A Remote Network Monitoring

NfSen Plugin Supporting The Virtual Network Monitoring

Network Management & Monitoring

NetStream (Integrated) Technology White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

Wireshark Developer and User Conference

HP Intelligent Management Center v7.1 Network Traffic Analyzer Administrator Guide

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC)

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

Plugging Network Security Holes using NetFlow. Loopholes in todays network security solutions and how NetFlow can help

Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks

Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík

Network security Exercise 10 Network monitoring

Catalyst 6500/6000 Switches NetFlow Configuration and Troubleshooting

LogLogic Cisco NetFlow Log Configuration Guide

PANDORA FMS NETWORK DEVICE MONITORING

PANDORA FMS NETWORK DEVICES MONITORING

Netflow Overview. PacNOG 6 Nadi, Fiji

NetFlow Aggregation. Feature Overview. Aggregation Cache Schemes

ECE 578 Term Paper Network Security through IP packet Filtering

NetFlow/IPFIX Various Thoughts

Nfsight: NetFlow-based Network Awareness Tool

Enabling NetFlow on Virtual Switches ESX Server 3.5

Characteristics of Network Traffic Flow Anomalies

Exercise 7 Network Forensics

Ranch Networks for Hosted Data Centers

HUNTING ATTACKERS WITH NETWORK AUDIT TRAILS

How To Stop A Ddos Attack On A Network From Tracing To Source From A Network To A Source Address

Network Performance Monitoring at Minimal Capex

modeling Network Traffic

Cisco IOS NetFlow Version 9 Flow-Record Format

WhatsUpGold. v NetFlow Monitor User Guide

Host Discovery with nmap

ICND2 NetFlow. Question 1. What are the benefit of using Netflow? (Choose three) A. Network, Application & User Monitoring. B.

How To Create A Network Monitoring System (Flowmon) In Avea-Tech (For Free)

CISCO INFORMATION TECHNOLOGY AT WORK CASE STUDY: CISCO IOS NETFLOW TECHNOLOGY

Recommendations for Network Traffic Analysis Using the NetFlow Protocol Best Practice Document

Bandwidth Management for Peer-to-Peer Applications

Flow Based Traffic Analysis

ABSTRACT. GOFF, BRIAN DAVID. Distributed Resource Monitoring Tool and its Use in

NFQL: A Tool for Querying Network Flow Records [6]

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Flow Analysis. Make A Right Policy for Your Network. GenieNRM

A Critical Investigation of Botnet

Practical Experience with IPFIX Flow Collectors

Using The Paessler PRTG Traffic Grapher In a Cisco Wide Area Application Services Proof of Concept

Configuring NetFlow. Information About NetFlow. Send document comments to CHAPTER

NetFlow The De Facto Standard for Traffic Analytics

The Value of Flow Data for Peering Decisions

Monitoring Replication

Introduction to Passive Network Traffic Monitoring

Network Measurement. Why Measure the Network? Types of Measurement. Traffic Measurement. Packet Monitoring. Monitoring a LAN Link. ScienLfic discovery

AlienVault Unified Security Management Solution Complete. Simple. Affordable Life Cycle of a log

Netflow For Incident Detection 1

Chapter 11 Cloud Application Development

2010 Carnegie Mellon University. Malware and Malicious Traffic

[Optional] Network Visibility with NetFlow

Flow Analysis Versus Packet Analysis. What Should You Choose?

Configuring SNMP and using the NetFlow MIB to Monitor NetFlow Data

Keywords Attack model, DDoS, Host Scan, Port Scan

Configuring Flexible NetFlow

Integrated Traffic Monitoring

Netflow Collection with AlienVault Alienvault 2013

Securing and Monitoring BYOD Networks using NetFlow

Using UDP Packets to Detect P2P File Sharing

Transcription:

Design and Implementation of an Interactive DBMS-supported Network Traffic Analysis and Visualization System 1 Hyun-chul Kim, 2Jihoon Lee Dept. of Computer Software Engineering, Sangmyung Univ., hyunchulk@gmail.com *2,Corresponding Author Dept. of Information and Telecommunication Engineering, Sangmyung Univ., vincent@smu.ac.kr 1, First Author Abstract Measurement, analysis, and visualization of traffic data are critical day-to-day tasks for effective and efficient network management and operation. In this paper, we design and implement a novel interactive DBMS (Database Management System)-supported network traffic analysis and visualization system. To this end, we first survey and compare existing tools for network traffic measurement and analysis in terms of their supported features and functionalities. Then, we choose to extend FlowScan, one of the most widely-used software tool for network traffic analysis and visualization, with DBMS support as well as Web-based interactive query interface. We describe the overall system architecture and implementation, and demonstrate that our system can be used to easily identify heavy hitters (e.g., malicious attackers), in near real time. Keywords: Traffic Measurement, Visualization, FlowScan 1. Introduction Measurement, analysis and visualization of traffic data are very important tasks with regard to effective and efficient network management and administration [1, 2, 7]. The two common approaches for network measurement are the passive and active ones. Passive techniques watch target traffic as it passes by, for example, at routers, switches or hosts, without generating (or injecting) additional traffic packets in the network. n the contrary, active measurement techniques inject packets into the network or transmit packets to other servers or hosts, to test, diagnose, or measure network delay, connectivity, path, etc. For the purpose of measurement of traffic volume, per-application traffic analysis, per-as traffic analysis, and top-user behavior analysis, passive measurement approach is used. Table 1 shows existing passive measurement tools and their supported useful features. As shown in the table, no tool is currently supporting all the useful features. While FlowScan [1], though it is one of the most widely network traffic measurement tools), supports many good features like time series graph (as shown in Figure 1) through which users can easily understand overall network traffic trend, it does not provide interactive interfaces for further detailed analysis; e.g., exactly identifying heavy hitters, etc. In this paper, our goal is to design and implement the first passive measurement software where all the important features in Table 1 are supported. To this end, after conducting careful and in-depth analysis of the nine existing tools, we have decided to extend FlowScan with DBMS and Web-based query interface. The rest of this paper proceeds as follows. In Section 2, we first introduce FlowScan. Section 3 describes the architecture design, implementation, and usage examples of our proposed and developed system. Section 4 concludes the paper. 2. FlowScan 2.1. Architecture As the first step, we have analyzed FlowScan system architecture and its codeset. FlowScan mainly consists of four different architectural components: (1) Netflow [9] collection module, (2) Netflow analysis module, (3) Netflow storage module, and (4) visualization module. Three of the four modules (i.e., Netflow collection, storage, and visualization modules) had actually been borrowed from other open source software; FlowScan's Netflow collection module uses CAIDA s cflowd, while storage and visualization module use RRD (Round Robin Database). In International Journal of Digital Content Technology and its Applications(JDCTA) Volume7, Number13, Sep 2013 166

this paper, from the next subsection, we mainly focus on the Netflow analysis module, as we design our to-be-developed system to put the various Netflow analysis results and related data into a database system and handle them. In Figure 2, cflowdmux and cflowd together form Flowscan s collection module, while flowscan, rrd, and rrdtool is analysis, storage, and visualization modules, respectively. Table 1. Passive Measurement Tools: Comparison MRTG Flow -Scan Cflowd with arts tcpdump MADAS Flowtools Flow Collector Flow Analyzer Raw data Collection Data analysis Visualize Time Series Graph Raw flow dump Query Interface GUI Traffic volume Per-app. Analysis Top User Top AS Figure 1. FlowScan s time series graph 167

Figure 2. FlowScan Architecture 2.2. Netflow analysis module Using cflowd, Flowscan dumps Netflow data, collected at and sent by routers every 5 minutes. Then, flowscan module is called to analyze (for every 5 minutes as well) the received flow data files and then sleeps again for five minutes. The flowscan module reads the dumped data files and applies the analysis module on every single collected flow, classifying them according to the flow s port number, protocol number, output interface, etc. Among the analysis modules, SubNetI.pm and CampusI.pm are the representative ones, through which classification of the flow is performed (e.g., whether it is TCP, FTP, or/and inbound traffic), that put the analysis results into the RRD database. Figure 3 shows the pseudo-code of the CampusI.pm module, where the wanted function is applied to target flows to classify them according to the destination address, interface number, etc. 3. Proposed system Figure 4 describes our proposed system, comparing with its predecessor FlowScan. Basically, we have added three additional modules to FlowScan, which are flow database management module, data aggregation module, and query module, as described in this section. 168

3.1. Database management module The Netflow analysis module in FlowScan finds whether a flow under processing (i) is of inbound or out-bound, and (ii) can be classified using its layer-4 port number information or not. Right after this finding process, our proposed system injects all the analyzed information of the flow into database. ur proposed database system records the following information for every flow: source IP, destination IP, input interface, output interface, source port, destination port, ICMP type, the number of packets, bytes volume, next hop, flow starting time, ending time, protocol, source AS, destination AS, TCP flag, inbound_or_outbound, known_application_or_ unknown (by its port number). We used MySQL as DBMS, one of the world s most popular open source database. package CampusI; sub perfile { # Remember the time-stamp from the filename. whence = filename2time_t(filename) sub wanted { if (exporter_hop(flow::next_hop)) { # This flow is destined for another "local" flow-exporting router. # We'll catch it later, if and when it's exported by that router. return 0 if (outbound_hop(flow::next_hop) or outbound_interface(flow::output_if)) { # This flow is outgoing. outbound_total++ elsif (inbound_address(flow::destination_address)) { # This flow is incoming. inbound_total++ else { # This flow (an "intranet" flow) is unwanted. return 0 return 1 sub report { update_rrd_files(whence, inbound_total, outbound_total) Figure 3. CampusI.pm s pseudo-code 169

3.2. Web-based query interface nce all the flow data are stored at our database system, our system allows users to extract useful information from the database through Web-based (using Perl CGI module) query interface. Through the interface, our system provides a set of predefined queries such as top 10 users, per-as analysis, per-port analysis, per-protocol analysis, etc., which are typically most often useful and informative tasks for network operators. nce users set parameter values using Web interface, the chosen parameter values are handed over to Perl CGI scripts that (i) parse them, (ii) generate SQL queries based on the parsed results, (iii) submit the SQL queries to the NetFlow database, and (iv) convey back the query results to the users through the Web interface. Figure 5 shows a screen shot of the developed Web query interface, tested with a predefined Top 10 Users query. At the time of this writing, our system supports the following predefined queries: (i) overall traffic statistics, (ii) all flows in a specific time period, (iii) trace flows per a user, (iv) per-protocol usage statistics, (v) per-port usage statistics, (vi) next hop statistics, (vii) per-as information and traffic matrix, (vi) packet size and flow distribution, (vii) top 10 users, protocols, ports, ASes. All these information are extracted from raw NetFlow flows, which contain a variety of information as shown in Figure 6. Figure 4. FlowScan (left) and our proposed system (right) 3.3. Data aggregation ne notable problem we have confronted building our Netflow database system is, its too large volume. According to our experience, at a mid-sized university campus with (1,000) people inside, a border router exports around 100 K flows in 15 minutes on average, which means around 9.6 M flows per day and 3.5 G flows per year would have to be stored into and 170

handled at our database system. In fact, when we tested with a query handling one-day data (dealing with approximately 9.6 M flows), it took 1 minute to get the query results. The larger time span of data a user wants to process, the longer time they would have to wait. Besides, storing all the raw flow information in our database system takes up too much disk space. To alleviate this limitation, our system performs data aggregation. In particular, data aggregation in our system has been designed to speed up predefined queries. For every 15 minutes, it performs the flow aggregation process based on the five guidelines, reflecting the predefined queries most (likely to be) often used by network operators; per port, per protocol, per top user, per AS matrix, per next hop IP, as shown in Table 2. The proposed aggregation scheme has significantly contributed in reducing disk storage usage as well as query response times. For example, it took only a few seconds to handle a query with one week s flow data. Also, as it takes too much disk space to store and archive full raw flow dump data for a long time, network operators may prefer to choose to store only aggregated data for a long-term data collection. Designing a data aggregation scheme for the largest storage space saving with the smallest information loss is left as our future work. Figure 5. Web-based query interface: a screen shot (IP addresses and date information are masked out due to privacy concerns) 3.4. Deployment and Usage Examples ISPs or network operators are often interested in finding hosts or flows consuming most of network resources (e.g., heavy hitters, scanners, spammers, etc.). ur system s user interface makes it easy and convenient for users to quickly identify heavy hitters by providing both time series graph as well as interactive query system, as shown in Figure 7 and 8. We have deployed our developed system at a border router of a university campus network. While operating, when FlowScan generates a time series traffic plot like Figure 7, indicating 171

that sudden abnormal traffic spikes are observed, then network operators, using our Web-based query interface, can easily and successfully identify who they are (i.e., IP addresses), along with the detailed information such as what protocol and ports they use, how large is the flows, how many they are, from when to when, etc., as shown in Figure 8. Consequently, using our developed tool, we were able to exactly identify a DDoS attack destined to a machine (whose IP address was x.204.97.179) on our network. FLW index: 0xc7ffff router: A.B.C.D src IP: E.F.G.H dst IP: I.J.K.L input ifindex: 13 output ifindex: 12 src port: 16000 dst port: 2036 pkts: 1 bytes: 58 IP nexthop: M.N..P start time: Sat Dec 8 20:34:41 end time: Sat Dec 8 20:34:41 protocol: 6 tos: 0 src AS: 9848 dst AS: 1781 src masklen: 13 dst masklen: 24 TCP flags: 0x18 (PUSH ACK) engine type: 1 engine id: 3 Figure 6. A sample flow information (Netflow ver. 5) Table 2. Aggregated Data Table: Schema Table Port srcport dstport pkts bytes Flows time Table Protocol protocol pkts bytes flows time protocol tcpflag Table Topuser srcip dstip pkts bytes Flows Time in/out Table Nexthop input_if output_if pkts bytes Nexthop Flows time Table Asmatrix input_as output_as protocol srcport dstport src_as Pkts protocol Bytes dst_as in/out unknown in/out in/out Flows nexthop time in/out 172

Figure 7. Abnormal traffic peak observed at around 23:00 by FlowScan Figure 8. Web-based query results, identifying the source addresses of coordinated attack and their target address (x.204.97.179), as well as various information of those attack flows (exact date and IP address information are masked out due to privacy concerns). 4. Conclusion In this paper, we have proposed, developed, deployed, and tested our own software, which is an extension of the widely-used tool FlowScan with additional DBMS support and Web-based interactive query interface. Future work include further database and query performance optimization, 173

visualization of query results (in a more graphical way; currently only in a form of tables), platform independence (currently working only on FreeBSD with MySQL), and detection of application flows that passes through dynamically allocated ports, such as those of P2P applications [4, 5, 6, 8]. 5. Acknowledgements This research was supported by a 2013 Research Grant from Sangmyung University. 6. References [1] D. Plonka, Flowscan: A network traffic flow reporting and visualization tool, In Proceedings of the USENI LISA, 2000. [2] J. Lee, H. Kim, H. Hwang, K. Chon, Y. Chin, Report on the costs and benefits of cache hierarchy in Korea, In Proceedings of International WWW Caching Workshop, 1998. [3] H. Kim, D. Lee, J. Lee, J. Suh, Kilnam Chon, A Measurement Study of Storage Resource and Multimedia Contents on a High Performance Research and Educational Network, Lecture Notes in Computer Science, vol. 2720, pp. 108-117, 2003. [4] M. Balitanas, T. Kim, Using Incentives for Heterogeneous peer-to-peer Network, International Journal of Advanced Science and Technology, vol. 15, pp. 23-36, 2012. [5] R. Mghirbi, H. Majdoub, K. Arour, B. Defude, A Controlled Knowledge Base Evolution Approach for Query Hits Merging in P2P Systems, International Journal of Advanced Science and Technology, vol. 47, pp. 101-114, 2012. [6] J. Kim, D. Kim, S. Maeng, Document Management Monitoring System in P2P Environment, Journal of the Korea Academia-Industrial cooperation Society, vol. 14, no. 3, pp. 1402-1408, 2013. [7] M. Beck, T. Moore, The Internet2 Distributed Storage Infrastructure Project : An Architecture for Internet Content Channels, Computer Networks and ISDN Systems, vol. 30, no. 22-23, pp. 2141-2148, 1998. [8] I. Clarke,. Sandberg, Freenet: A Distributed Anonymous Information Storage and Retrieval System, Lecture Notes in Computer Science, vol. 2009, pp. 46-66, 2001. [9] CISC NetFlow, http://www.cisco.com/web/go/netflow. 174