!! Big Data Network Traffic Summary Dataset Deliverable Provisioning Subsystem ONTIC Project (GA number 619633) Deliverable D2.4 Dissemination Level: PUBLIC Authors Alexandru Mara, Álex Martínez, Alberto Mozo, Bo Zhu. UPM Jose María Ocón, José Julian Quirós. SATEC Sotiria Chatzi, Evangelos Iliakis. ADAPTIT Fernando Arias, Vicente Martin. EMC Version ONTIC_D2.4.2014.01.26.2.2
Version History Previous version Modification Modified Summary date by 2014.12.26.V.1 2014.12.26 Adaptit Compilation of partners contribution and correction 2014.12.27.1.1 2015.1.15 Adaptit Compilation of partners corrected contribution and correction 2015.01.15.1.1 2015.01.19 UPM Acronyms, references and some changes on content 2015.01.19.2 2015.01.20 Adaptit Changes in format, incorporations of remarks 2015.01.20.2.1 2015.01.26 UPM Updated references, format, and some sections Quality Assurance: Quality Assurance Manager Reviewer #1 Reviewer #2 Name Mozo Velasco, Alberto (UPM) López, Miguel Angel (SATEC) Ordozgoiti, Bruno (UPM) 2 / 32
Copyright 2015, ONTIC Consortium The ONTIC Consortium (http://www.http://ict- ontic.eu/) grants third parties the right to use and distribute all or parts of this document, provided that the ONTIC project and the document are properly referenced. THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE 3 / 32
Table of Contents 1. ACRONYMS AND DEFINITIONS 8 2. EXECUTIVE SUMMARY 10 3. SCOPE 11 4. INTENDED AUDIENCE 12 5. SUGGESTED PREVIOUS READINGS 13 6. INTRODUCTION 14 7. PROVISIONING SUBSYSTEM REQUIREMENTS 15 7.1 Constraints, assumptions and dependencies... 15 7.2 External interface requirements... 15 7.2.1 User interfaces... 15 7.2.2 Software Interfaces... 16 7.2.3 Hardware Interfaces... 16 7.3 Functional requirements... 16 7.4 Performance requirements... 16 7.5 Logical database requirement... 17 7.6 Software System attributes... 17 8. PROVISIONING SUBSYSTEM ARCHITECTURE 18 8.1 Hardware Architecture... 18 8.1.1 Riverbed TurboCap... 19 8.1.2 Capture and storage server... 20 8.1.3 HDD Bays... 20 8.1.4 Pre-Processing server... 21 8.1.5 FTP servers... 21 8.2 Software Architecture... 21 9. IMPLEMENTATION TESTING AND DEPLOYMENT 24 9.1 ISP subsystem deployment... 24 9.2 Hardware and Software constraints and solutions... 26 9.3 Data captured... 27 10. STORAGE MODULE 29 10.1 Physical storage architecture and constraints for ONTS (ONTIC dataset)... 29 10.2 Public access to ONTS... 29 11. DATA MIGRATION CONSIDERATIONS 30 4 / 32
11.1 Migration to EMC cluster... 30 11.2 Migration to Google Cloud... 31 12. REFERENCES 32 5 / 32
List of figures Figure 1: Hardware architecture of data capturing process.... 18 Figure 2: TurboCap connection modes.... 20 Figure 3: Software architecture of data capturing process... 22 Figure 4: Connections of capture hardware in Interhost's network.... 24 Figure 5: Complexity of Interhost's network.... 25 Figure 6: ONTS data formats... 28 Figure 7: ONTS dataset data transformation for WP3 and WP4... 30 6 / 32
List of tables Table 1: D2.2 Acronyms... 8 7 / 32
1. Acronyms and Definitions Acronyms Acronym AWB Defined as Pivotal Analytics Workbench Table 1: D2.2 Acronyms BGP-4 Border Gateway Protocol Version 4 CPU DNS DoW FTP GB Gbps GUI HDD HDFS HSRP IP ISP JBOD LAN LOD MB MTTR NAS ONTIC ONTS OS OSPF PB RAID Central Processing Unit Domain Name System Description of work File Transfer Protocol Gigabyte Gigabits per second Graphical User Interface Hard Disk Drive Hadoop Distributed File System Hot Standby Router Protocol Internet Protocol Internet Service Provider Just a Bunch Of Disks Local Area Network Last Order Date Megabyte Mean Time to Repair Network-Attached Storage Online Network Traffic Characterization ONTIC Network Traffic Dataset Operational system Open Short Path First Protocol Petabyte Redundant Array of Independent Disks 8 / 32
RAM TB UDP USB VRRP WP Random Access Memory Terabyte User Datagram Protocol Universal Serial Bus Virtual Router Redundancy Protocol Work Package 9 / 32
2. Executive Summary Deliverable D2.4 aims at describing the requirements of the provisioning subsystem, as well as its design. The provisioning subsystem is a component planned in a way that can ensure the provision of the Big Data platform with traffic network summary records obtained from traffic network flows in real time. This deliverable starts with a presentation of the constraints of the provisioning subsystem as well as the assumptions made in order to start designing it. Afterwards, the requirements of the interfaces are introduced. More specifically, the user interface and the requirements of the software and hardware interfaces are presented. In subsequent sections, the functional and performance requirements of the provisioning subsystem are examined, the constraints on the design of the system are analyzed and the requirements of the logical database are presented. Additionally, the attributes of the provisioning system software are listed. This deliverable also addresses the subject of the design and architecture of the subsystem. Initially, the hardware architecture is described and some suggested technologies are reported. Then, the software architecture is introduced. Finally, the architecture implementation, the tests carried out and the constraints and limitations that were met during the deployment, both on software and hardware, are presented. The data capture and storage modules are described and analyzed. The deliverable ends with an analysis of the options for data migration. 10 / 32
3. Scope This document is part of WP2 of the ONTIC project, which deals with designing, deploying, managing and provisioning a Big Data Network Traffic Summary dataset (ONTS) with real ISP network flows, as stated in the DoW of the project. The main purpose of Deliverable D2.4 is to identify the requirements and constraints of the provisioning subsystem, which will have to capture and store actual network traffic in real time. Based on these requirements, a description of a first design and implementation of this module will be made. This document will be, as a consequence, the starting point for deliverable D2.5: Progress on Provisioning Subsystem, where the evolution of the requirements and design of this module will be updated. 11 / 32
4. Intended audience The intended audience of this document includes all the partners of the ONTIC consortium, especially those involved in tasks related with data collection, management and maintenance. It also includes any reader interested in knowing more about the ONTS dataset, like for example where and how it is being captured. 12 / 32
5. Suggested previous readings The information provided in this deliverable is explained in such a way that no previous specific knowledge is needed apart from a basic background in the fields of Computer Science, Hardware Engineering and Requirements Analysis. Nevertheless, the information provided in the following document can be found to be useful: ONTIC. Deliverable D2.2. Big Data Architecture. Jan. 2015. 13 / 32
6. Introduction 619633 ONTIC. Deliverable2.4 One of the cornerstones of the ONTIC project is the ONTS dataset that will be collected during a period of 24 months. This dataset, which will be made publicly available to the scientific community in order to allow researchers worldwide to validate their work against up-to-date network information, will be first used by the consortium to validate cutting-edge scalable algorithms focused on QoS management, intrusion detection and congestion control. Obtaining, storing and exploiting such a huge dataset poses a series of problems and requires a robust end efficient provisioning system to be developed. Its architecture, for instance, needs to be robust and fault-tolerant to avoid data loss and dynamic in order to be easily adapted to any changes in the requirements. This subsystem has three main parts, the capturing module that obtains the information for an ISP, the storage module which saves the data in a Big Data Storage platform and the migration module which allows moving the information to a Distributed Computational Framework in order to perform feature management and aggregation tasks or to directly execute and validate our algorithms. The ONTS dataset will be collected during a period of 24 months and contains the first 64 bytes of the header of each IP network packet that crosses the core network of Interhost, a subsidiary of SATEC. Considering the traffic volume of 1.2 Gbps, in the link where the raw data is being captured, the system is currently gathering more than 500 GB of compressed information per day (1.6 TB of raw uncompressed data). The process has been running discontinuously for five months now, and the ONTS dataset s size is currently 45TB. Physically, due to budget constraints, this information is stored in HDD bays with a total capacity of 15TB each. Stored data can be taken to an optional pre-processing step where the network traffic is aggregated per flows. Finally, the information is made available to the rest of the consortium, upon prior request, using internal FTP servers. The design, implementation and deployment of this provisioning subsystem, as well as a precise description of the ONTIC network traffic dataset, are presented in this document. Additionally, the constraints and requirements for both the provisioning subsystem and dataset are outlined in sections of this deliverable. 14 / 32
7. Provisioning Subsystem Requirements The data provisioning subsystem needs to cope with huge amounts of incoming data packets per second, process and store them in real time. Additionally, it has to absorb the constant changes and peaks in network traffic during the capturing process. Therefore, the provisioning subsystem has to be very efficient and adaptable in terms of both hardware and software characteristics. Moreover, given that significant data loss is unacceptable, the system also needs to provide error management, fault resilience and recovery mechanisms. Some additional requirements are intended to assure data security and integrity so they specify who can manage and interact with this data. It is also required for the provisioning subsystem itself to be completely independent and avoid any interaction with the captured information. This is meant to reduce the probability of data corruption or contamination. This provisioning subsystem will be used during the whole project, which means it has to strictly fulfill all the above-mentioned constraints, as well as others related with usability, software and hardware. These constraints are presented in detail in the following subsections. Prior to this, some initial assumptions, constraints and dependencies considered for the architectural design and subsystem deployment will be shown. 7.1 Constraints, assumptions and dependencies The main goal of the system is to provide a full set of real network traffic data, similar to the one described in [1], for testing the different algorithms that will be released throughout the project. The main assumption at this part of the project is that the type and quality level of the selected network is representative of a standard ISP. Given the topology of the network operated by Interhost, it is evident that the selected traffic will be appropriate for the suggested testing. There are not external dependencies that might delay or impact the development. The main constraint the provisioning subsystem could face, is that the capture card selected to fit the budget is near its LOD life, which could add some limitation in case of hardware failure. Other constraints related to the capture card are the proprietary drivers which are only supported in very few and old (approximately 7 years old) OSs, which definitely impacts the selection of the server hardware and software. Related with the dataset, the main constraint is that it cannot be provided entirely to the consortium or the public in general at any moment in time. Due to hardware limitations and budget restrictions only a subset of this dataset will be accessible at a certain moment. To access the rest of the information an electronic request procedure has been created. 7.2 External interface requirements Regarding the interface that the provisioning subsystem presents, three main sections can be distinguished. 7.2.1 User interfaces User interaction with the provisioning system is based on BASH scripting as it was required to be simple, practical, to provide enough flexibility and to be changed and developed fast. Moreover, the system presents all the available options in textual menus as the operators are advanced technicians which do not require any kind of GUI s. 15 / 32
The capture and storage modules of the provisioning subsystem can be accessed, modified and monitored remotely. This characteristic responds to the data management team requests to have the captured data and the subsystem itself under control at any time. 7.2.2 Software Interfaces The provisioning subsystem s software has been developed and runs under a Linux operating system. In addition to the user interface coded in BASH, C programming language is used to provide the required efficiency and security, while maintaining the code easy to read and understand. Incoming and outgoing items managed by the software Incoming data consists of all the traffic that goes through the ISP network sniffed in a promiscuous mode. Outgoing data consists of files with the information captured from the network. Services and libraries required The system makes use of the libpcap library in order to capture packets [2]. 7.2.3 Hardware Interfaces The server used to obtain the information has to meet some security characteristics imposed by the ISP, so this hardware was provided by Interhost. The HDD bays were required to be robust to avoid data loss and to provide enough read/write operations per second to cope with the information flows. 7.3 Functional requirements The functional requirements for the provisioning subsystem are the following: The system is able to gather all the packet traffic going through the ISP s network. The system is customizable in terms of the amount of data to be captured from each packet. The system saves all the packets collected into files. The system is configurable in the size of the output files. This is based on the ability to change the number of packets stored to each file. The system can save the packets into compressed files to optimize storage space. The system is able to convert the per-packet traffic into per-flow aggregated information. The system can export the outputs to a distributed filesystem. The system minimizes its own impact on the traffic that tries to capture. 7.4 Performance requirements The performance requirements are the following: The system is able to cope with a total amount of network traffic of, at least, 1.2 Gbps. The system can process this information flow and store part of it without any data loss. 16 / 32
7.5 Logical database requirement The requirements related to the logical data base are: The system proposed does not use any kind of traditional database. All the outputs go into files to maximize the overall capture throughput. 7.6 Software System attributes The software system attributes are: Reliability: The system has a MTTR of less than 24 hours in case of a hardware failure and less than two hours time in case of software related failures. Availability: Security: The capture system exhibits an availability of not less than 99.999% during the time that the system is scheduled to run, that means an estimated downtime of less than 5 minutes each year (8760 hours). The system ensures that data gathered is protected from unauthorized access. The system prevents operation for unauthorized personal. Maintainability: The system is documented sufficiently to allow new technicians to learn to use and modify the system on their own in less than 3 days. Portability: The system is intended to be deployed just on the hardware and with the OS finally chosen. There isn t any requirement of further hardware or OS support. 17 / 32
8. Provisioning Subsystem Architecture The ONTIC network traffic dataset is one of the key results that the project will provide -when finalized- to the scientific community. Therefore, data provisioning architecture is one of the main pieces designed developed and deployed during the project s initial months. A very robust architecture is required in order to assure that the data is being correctly captured, to prevent information loss and to obtain a very clean and continuous dataset. Another issue that this architecture must face is the volume of data to be managed per second. This amounts of information, which up to our knowledge are much higher than the ones captured at any currently active projects (for example the ones described in [1] and [3]) pose a major challenge and require a scalable and modular architecture. Moreover, the traffic volume requires very high storage capacity. In order to reduce these needs, online compression of the information needs to be performed. The architecture proposed by the ONTIC consortium to face the abovementioned constraints and fulfill the requirements presented in section 7 of this document is described in the next paragraphs separated in its hardware and software components. 8.1 Hardware Architecture The hardware architecture proposed for the whole data capturing process and all the elements it s comprised of can be seen in the following image. Figure 1: Hardware architecture of data capturing process. 18 / 32
This architecture is designed as a low cost, efficient and effective solution where network data traces are first captured, stored on specific hardware and then prepared to be used by the algorithms developed in WP3 and WP4 for online and offline data analysis. The architectural design has suffered major modifications since the first versions were proposed, mainly on the data capture and transport aspects. For instance, the initial approach considered sending the information in real time form the ISP facilities to the final storage location at the Universidad Politécnica de Madrid over the internet. This design, among other drawbacks, would have enormously increased the network traffic across the ISP s core network and contaminated the packet captures with huge amounts of artificial data (high number of packets with the captured data would have crossed the core network on their way to UPM and been captured by the system as the rest of traffic). Currently, network traffic data is being obtained from the core network of Interhost-SATEC, a medium size ISP and partner of the ONTIC project. The traffic rate at the capture point is of approximately 1.2 Gb per second. Out of this huge data flow only the first 64 bytes of each packet are stored, which still represents around 20 MB per second. In order to cope with this amount of streaming information, commodity hardware is not enough, so specially tailored hardware is needed. An example of this situation is assigning very precise timestamps to each packet, a key task in order to obtain a well ordered dataset and to be able to precisely simulate data injection to the online traffic analysis algorithms in the future. To this aim the Riverbed TurboCap [5], described in the following section, was acquired. 8.1.1 Riverbed TurboCap Most network interface controllers are not able to timestamp the packets with enough accuracy to fulfill ONTIC s requirements. Therefore, the consortium decided to use two Riverbed TurboCap Copper 2 LAN adapters. Among other advantages, these devices provide microsecond level accuracy timestamps, built-in network protocol stack and the ability to connect them in pass-through or packet mirroring modes. The built-in network protocol stack speeds up the packet processing by bypassing the OS kernel s stack. In other words, the packets do not need to be forwarded to the OS to be processed, as this can be done on the card itself [5]. The Riverbed TurboCap at SATEC s facilities is currently being used in mirroring mode, which means that network packets are replicated and then processed. Another available option is to use pass-through mode, in which the network card is serially connected to the core network so that no packet replication is needed. The major drawback and main reason for not using this connection type is that any problem in the capturing server could easily result in a complete ISP s core network failure. Figure 2 below shows both options. 19 / 32
8.1.2 Capture and storage server The Riverbed TurboCap is installed on one of SATEC s high performance servers. In addition to capturing data, this server stores temporal captures on its local HDD, compresses them and sends the results to the HDD bays. The hardware specifications of the capture and storage server are: Manufacturer and model: Dell PowerEdge 2950 Processor: x2 Dual Core Intel Xeon processors RAM: 12 GB HDD: 4TB for local storage Ethernet interface: Gigabit Extreme 8.1.3 HDD Bays Figure 2: TurboCap connection modes. HDD bays are used to increase the total data storage capacity. With this method, approximately a month of uninterrupted network traffic data can be gathered. The exact timespan to fill each bay depends on the compression rate used and the data captured per packet. Once a bay is full, it is manually substituted by an empty one. This process usually takes less than two minutes to be completed and no data is lost as it is stored on the capturing server s local HDD (as explained in section 8.2 ). Full bays are then taken to UPM s facilities and stored as explained in section 10 of this document. Each bay holds five 3TB top quality of-the-shelf Western Digital Red NAS disks, for a total capacity of 15TB. Raid Bay boxes implement both USB3 and esata interfaces which provide enough writing capacity even if the system was set to capture the whole data packets. The writing capacities for each of them are: USB3: sustained rate of approximately 1Gbps and with peaks up to 5Gbps[6]. esata: sustained rate of approximately 3Gbps[7]. Currently the bays are connected to the data capturing and storage server using the esata interface. 20 / 32
8.1.4 Pre-Processing server Some of the algorithms considered for WP3 and WP4 of the ONTIC project require as input perflow statistics that have to be obtained out of the raw data. Gathering these features requires a pre-processing phase. A server at UPM facilities is used exclusively for this task. The software selected and the process of obtaining the features can be found in section 8.2. The server used has the following characteristics: Manufacturer and model: Dell PowerEdge 2950 Processor: Intel Xeon processor RAM: 16 GB HDD: 3TB for local storage Ethernet interface: 10 Gigabit Ethernet The information generated in this step is stored in other HDD bays. 8.1.5 FTP servers To make the network traffic data available for the whole ONTIC consortium, three FTP servers are used. These servers, currently online, provide access to the information stored in the bays. This same mechanism will be used to provide public access to the information obtained once the project has ended. 8.2 Software Architecture The provisioning subsystem s software architecture can be seen in Figure 3 below. The OPC (ONTIC Project Capturer) is the module that captures data by controlling the TurboCap. This raw data is compressed by another specific module and then stored in a HDD bay. In case the bay is not detected, the system compresses and stores the information temporally on the server s hard disk. Additionally, the software informs the data management team about problems by generating logs entries. The output of the provisioning subsystem will be used as input for the scalable algorithms to be developed in WP3 and WP4 of the project. This work packages address the offline and online traffic characterization scenarios respectively and require data in different formats. For WP3, the raw data captured from each packet needs to be transformed in per-flow aggregated information. The definition of flow used here is the same as in [8], this is, considering the 5-tuple (Source IP, Destination IP, Source Port, Destination Port, Protocol). Basically, the system groups in a unique flow the packets that share in common all the previous characteristics. This representation method enormously reduces the processing overload and storage requirements with no information loss; therefore, it is convenient for the offline algorithms. As the data in WP3 is going to be analyzed with no strict time constraints, the aggregation step can be performed in an offline fashion, by the provisioning subsystem, and then provide the results to the offline algorithms. This aggregation process is performed using TSTAT [9], a tool developed at POLITO. On the other hand, WP4 requires online analysis of the information, so the aggregation process also has to be performed as this data is captured. The online algorithms will need to transform the raw data into aggregated information and use it to make decisions almost at the same time. For this reason, the provisioning subsystem provides raw data to WP4 algorithms. To fulfill WP3 aggregated information requirements and WP4 raw data capture needs, specially tailored software modules have been created as shown in Figure 3. 21 / 32
Figure 3: Software architecture of data capturing process Following, the 3 software modules that constitute the provisioning subsystem are described in more detail; these are: the OPC, the Compressor and the Watchdog service. OPC: The OPC is a fully configurable capture module that uses the libpcap library to save traces from the network interface. The captured traces are fully compatible with most network analysis tools, such as tshark/wireshark, tcpdump, netmate, TSTAT or softflowd, among others compatible with the pcap format. It works by opening the network interface, capturing packets of any size and saving the first N bytes (configurable at the start of the execution) of each trace. The default byte length to be captured is set to 64 Bytes to obtain the headers at link, internet and transport levels and possible optional headers as explained in section 9.3. The number of traces captured in each file can also be configured, as well as a temporary local folder where the files will be saved as an extra security mechanism. Default value for capture files is set to 10 GB which corresponds to around ten minutes of continuous packet header capture. This size was selected as a good tradeoff between number of files obtained in the final dataset and the amount of information lost if a file gets corrupted. Compressor: The compressor works along with the OPC module. Waits for new capture files to appear in the temporary local folder provided to the OPC module. When a new file appears, it s compressed and moved/copied to a final location in the RAID bays, depending on the configuration of the Compressor module. Gzip is used to compress files by default, given its very high performance. Tests show that for network traffic data gzip clearly outperforms other common compression software like tar and p7zip in terms of time and compression rate. Moreover, the resulting file format after compression,.pcap.gz, is a standard format used by the main pcap analysis tools 22 / 32
(netmate, TSTAT, wireshark ). Nevertheless, the module is flexible to allow the use of any other data compression software. Gzip provides 9 levels of compression (1-9, from lower to higher compression rate) which determine the processing time and output size. Currently, the system uses a compression rate of 3 to be able to process all the packets. Although this may seem very low, the compressed files size obtained is a fifth of the original one s size. The main reason for this is the low entropy of the data. Watchdog: The Watchdog module provides fault resilience capabilities to the OPC. It monitors every 5 seconds if the OPC process is running on the machine and if not, executes the OPC module again with the latest parameters used. This module is integrated in the OS boot, meaning that it s fault tolerant to crash errors and power outages. It also generates logs entries in case of software malfunction for later analysis. 23 / 32
9. Implementation Testing and Deployment Data capture and compression modules of the provisioning subsystem have been deployed at the ISP facilities in order to start gathering the ONTS dataset. The deployment tasks are in detail presented here, as well as the characteristics of Interhost, where this process has been carried out. The hardware and software constraints faced in order to build a functional capture platform are also shown. Finally, the solution of the above-mentioned problems and how a more polished version of this capturing architecture was obtained is described. 9.1 ISP subsystem deployment INTERHOST has core-routers in high availability with several operators of IP Transit, and connections within its own facilities. This first switching level (core-switches in high availability), distributes traffic to the next level, which are crossed by all traffic to or from the Internet. These switches are configured with a unique port in SPAN mode, so they get all the traffic traversing them (into the switch). In one of these ports, a computer in promiscuous mode (with an interface without IP address) is connected in order to access and capture all the network traffic. This server has an external administrative access that allows managers to connect remotely and to securely carry out installation, setup and/or maintenance tasks. This administrative connection is authorized only from certain IP addresses and across INTERHOST s secure tunnel services. The picture below shows the first approach to data capturing where two servers were used to perform this task. Currently only one is installed so it is connected to both core-switches. Figure 4: Connections of capture hardware in Interhost's network. 24 / 32
INTERHOST is a hosting services company. It provides a set of basic services such as DNS, email, backup, monitoring, computer hosting, Web services, internal applications managers, systems own mail, etc. with a variety of volumes, operating systems, hardware and software. It is a large and complex environment. Figure 5: Complexity of Interhost's network. Figure 5 depicts a logical view of the network. Different areas are observed: Communication area: This is a section of service equipment, all of which is branded INTERHOST. Here the core routers connected to other INTERHOST sites and shared communications equipment support dedicated customer lines and IP transit operators. This whole area is interconnected by core routing protocols such as BGP-4 (internal and external) and OSPF between different INTERHOST s offices, providing them with high availability routing. The internal connections of these core-routers also have implemented high availability protocols at the interfaces (HSRP or VRRP, depending on equipment) for each connected device. The servers are connected to the Internet and protected by specific security settings of the core, not using firewalls with access layer. High performance area: This is another section of INTERHOST service equipment. In this area one can find the mail relays (outdoor delivery), streaming proxies, DNS cache, monitoring, etc. Hardware with high bandwidth consumption, which does not use firewalls (in order to avoid degrading the performance) is located in this area as well. Although it is exposed to the Internet without a firewall protection, this equipment retains a very carefully fortified configuration. 25 / 32
Front-End area: This is too a section of INTERHOST service equipment. In this area the servers authentication database for services, mailbox servers, mail servers service delegated administration, DNS managers, INTERHOST servers cloud, etc. are located. These are entry points for the Internet services offered. This hardware is protected by a combination of firewalls and load balancers with specific equipment. Back-End area: This is likewise a section of INTERHOST service equipment. Here the storage servers, backup and monitoring servers of the Data Process Centre and some customers servers are located. This hardware does not have direct Internet connectivity. Management area: This is also a section of INTERHOST service equipment. Here the servers used in management, configuration and provisioning services are located. These devices are completely isolated and no internet connection exists. Client area: Though shown in this picture as a single point, it s clearly the wider area of equipment and traffic volume. Equipment of several types is used: Servers with firewalls in dedicated mode and shared firewalls. Clients Servers on shared networks and dedicated networks. Servers and Computers communications managed by INTERHOST, just for the client or jointly. The client type varies: from simple web, banking, customers with ecommerce teams, clients teams internal management of companies, etc 9.2 Hardware and Software constraints and solutions Data capturing and management tasks were the biggest challenges the ONTIC project had to face during the first year. The main reason for this was that such an ambitious system able to deal with gigabytes of network data per second surpassed both current software and hardware capacities. Next, some of the most important constrains faced during the deployment process and their solutions are presented. TurboCap: We suspect we are close to surpassing Riverbed s hardware capacity, with extremely low percentages of capture traces lost at interface level. In numbers, we lose approximately half a second of traces every 7 days. We were obliged to use Fedora 10 as our Operating System, which is 7 years old, in order to use Riverbed TurboCap proprietary drivers as they were embedded inside a Fedora 10 kernel and no open source version is available. This constraint produced a number of problems down the line showed in the OS section below. Operating System: Due to the fact that we had to use an old OS, we only had access to a deprecated source and software repository, so in turn we had to work with out of date versions of necessary software, namely libpcap and its dependencies. The OS is also not compatible with most modern hardware. Storage: During the first months of the project, the network traffic data obtained was stored in the RAID bays configured in JBOD mode, formatted in ext4. Unfortunately, this outdated ext4 Fedora 10 driver failed to provide integrity for both the data and the filesystem itself due to high performance simultaneous read/write requests. 26 / 32
Tstat: We reached Tstat [9] limits and the maximum number of simultaneous TCP flows was largely surpassed. When executed in parallel, output files weren t correctly formed. Modifications of these tools and OS were needed in order to cope with the network traffic flowing through the core network of Interhost-SATEC. Tstat: The software was recompiled with additional changes to accommodate the needs of the project. The number of TCP/UPD pairs was increased from default to 1,000,000 each, along with increasing the waiting time to detect interrupted TCP connections from 600 seconds to 1800, close to the supported limit. The size of the array for still open connections was also increased and the parallel execution library was disabled to fix output file creation. With these changes, the system works well when aggregating traffic, but the amounts of complete and incomplete TCP flows are not reasonable and different form the results given by other tools. Due to this, and in order to be completely sure about the results of the aggregation process, the consortium is currently studying alternatives to Tstat. Storage: Disks mounted in the RAID bays were formatted in an updated fresh compilation of the xfs filesystem instead of the old ext4. The JBOD mode was changed by a RAID0 mode which provides parallel access to all the disks in the bay. In addition other changes were made like creating temporary pcap files in the local drive before sending the compressed files to the disks in the RAID bay. 9.3 Data captured The core network of our ISP is crossed by a huge variety of network traffic, but the project is specifically focused on the TCP/IP model. Other frames, like for instance Jumbo Ethernet, are ignored. In addition, only IPv4 is considered given the negligible percentages of IPv6 present at Interhost. The 64 Bytes captured per packet by default have been set in order to include all the headers at link, Internet and transport levels. Although this header is 54 Bytes in total, as shown in figure 6 below, the limit was established at 64 Bytes in case optional header were used. Also, powers of two are more efficiently processed by computers. 27 / 32
Figure 6: ONTS data formats The amount of data to be managed per second has been frequently mentioned along this document. Here, for the sake of clarity, will present the main numbers. The core network of Interhost-SATEC where data is being captured is crossed by around 300.000 packets/s. Taking an average value of 500 Bytes/packet, the traffic we are facing is: 300.000 packets/s x 500 Bytes/packet = 1.2Gbps Out of this data only 64 bytes are stored, so each day we obtain: 300.000 packets/s x 64 Bytes x 3600 s/min x 24 hours/day = 1.6 TB/day The resulting data is compressed with a 5 to 1 ratio, so the final space on disk is: 1.6 TB/day x 0.25 = 0.4 TB/day 28 / 32
10. Storage Module 619633 ONTIC. Deliverable2.4 This section presents all the tasks related with the storage of the ONTS dataset. Detailed descriptions about how the information is managed at the ISP facilities, how it is moved to its final destination at UPM and how it is made available to the consortium will be given. The first approach to how the resulting dataset is going to be publicly offered is also discussed. 10.1 Physical storage architecture and constraints for ONTS (ONTIC dataset) The network data captured at the ISP facilities is first stored in the capturing machines HDD. Once the compression process is run over these data, the results are stored in the HDD bays presented in section 8.1.3. Every month a UPM team moves to the ISP facilities and substitutes the full bay with a new empty one. This bay changing mechanism is a low cost process which does not contaminate the captures or overload the ISP s network with remote file copies, which would pollute the captures by crossing the core network where information is captured. The bay containing data is then taken to UPM s facilities where is connected to one of the three secured FTP servers and made available to the consortium. Given the server s characteristics, only two bays can be connected to each of them at the same time. If any partner requires other data not available in the connected bays, a formal request must be sent to the team in charge of the storage management and the information required is provided as soon as possible. The unconnected bays are safely stored in lockers, so access to them is strictly controlled. 10.2 Public access to ONTS The ONTIC dataset will be made publicly available at the end of the project. To access these data, users will need to fulfill an online registration form including information such as name, institution, email address, intended use for the data, among others. This information will be used by the consortium to control the access, keep records and obtain different kinds of statistics. 29 / 32
11. Data Migration Considerations Once the main algorithms developed in WP3 and WP4 are ready to be tested in a real environment, the network data obtained from Interhost-SATEC will be transferred to a Map- Reduce/Stream-Reduce cluster. The selected cluster of machines will be used to test both the algorithms efficiency and scalability. Two clusters are currently being considered. The choice will depend on the results of the distributed computational platforms performance tests. These tests are currently in development, and are described in detail in deliverable D2.2 of the project. If Hadoop is selected as the best option, the super cluster in which the offline algorithms will be tested is the AWB provided by EMC. In other case, the Google Cloud service will be used for both online and offline algorithms testing. The mentioned platforms are described in more detail in the following sections. 11.1 Migration to EMC cluster Figure 7: ONTS dataset data transformation for WP3 and WP4 In the final stages of the ONTIC project, the algorithms developed in WP3 and WP4 could be tested in the AWB, a Hadoop cluster built to provide a collaborative, community-focused, open and innovative platform for rapid discovery and demonstration of solutions to the world's biggest data challenges. This Pivotal AWB Hadoop cluster has the following main characteristics: 1000-node Hadoop Cluster. Contains the entire Hadoop stack consisting of HDFS, PIG, HIVE, HBase, Mahout Physical Hosts - More than 1,000 nodes Processors - Over 24,000 CPU s RAM Over 48TB of memory 30 / 32
Disk capacity More than 24PB of raw storage. In order to move the ONTS dataset to the AWB cluster, the following should be taken into consideration: Due to network bandwidth considerations, only up to 4 TB of data is recommended to be transferred across a network; some tests will be necessary to determine the amount of data to be sent using a network connection. Also, some data compression techniques will help to maximize the data to be moved to the AWB cluster. Ιn the context of the ONTIC project, different frameworks are being tested: Hadoop, Spark, HAWQ, Storm, Cisco OpenSOC. As AWB cluster does not have available all of this frameworks, final testing over this cluster will be restricted to run over the Hadoop and HAWQ technologies. 11.2 Migration to Google Cloud If the results of the Benchmarks that are currently being run show that Spark or HAWQ perform better than Hadoop, the massively distributed platform which will be used is Google Cloud. This platform will be used in any case to determine the behavior of the online algorithms. As a partner of the consortium, Adaptit will provide their access to the service so the algorithms and dataset can be deployed and tested. The Google Cloud platform can run any kind of Linux virtual machine, so our massively distributed cluster with the selected framework can be deployed and tested at a massive scale. 31 / 32
12. REFERENCES 619633 ONTIC. Deliverable2.4 [1] K. Cho, K. Mitsuya, and A. Kato, Traffic data repository at the WIDE project, in USENIX 2000 Annual Technical Conference: FREENIX Track, Jun. 2000. [2] Garcia, L. M. (2008). Programming with Libpcap±Sniffing the Network From Our Own Application in Hakin9-Computer Security Magazine, Feb. 2008. [3] CAIDA dataset [ONLINE] www.caida.org/home/about/annualreports/2013/#data [4] Reverbead Turbocap [ONLINE]www.cacetech.com/documents/CACE%20turbocap_flyer.pdf [5] Riverbed Technology, Turbocap User s Guide, May 2013. [6] Saade, J.; Petrot, F.; Picco, A.; Huloux, J.; Goulahsen, A., "A system-level overview and comparison of three High-Speed Serial Links: USB 3.0, PCI Express 2.0 and LLI 1.0," Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2013 IEEE 16th International Symposium on, vol., no., pp.147,152, 8-10 April 2013 [7] esata Specifications [ONLINE] www.serialata.org/esata [8] Nguyen, T.T.T.; Armitage, G., "A survey of techniques for internet traffic classification using machine learning," Communications Surveys & Tutorials, IEEE, vol.10, no.4, pp.56,76, Fourth Quarter 2008 [9] TSTAT software, TCP STatistic and Analysis Tool, Politecnico di Torino, http://tstat.polito.it/ 32 / 32