1 Comprehensive IP Traffic Monitoring with FTAS System Tomáš Košňar CESNET, association of legal entities Prague, Czech Republic Abstract System FTAS is designed for large-scale continuous flow-based IP traffic monitoring. It is primarily developed and operated for needs of CESNET e-infrastructure (national ICT infrastructure for research and development in the Czech Republic) and for needs of connected infrastructures and networks. This contribution contains selected examples of solutions of typical user requests in the area of finding and visualisation of traffic of interest its statistical post-processing or its periodical reporting as well as requests in the area of on-the-fly and ex-post anomaly and attack detection. Keywords: flow-based monitoring, IP traffic monitoring. 1 Introduction Contemporary ICT infrastructures are complex systems and despite the fact, that they are built on the same principles and standards, each of them has its own specific aspects usually focusing its purpose - architecture and technologies applied, strategy and manners of administration, user community with specific behaviour and similar. These small differences may imply rather diverse demands on solutions and tools in the area of security and incident handling. Hand in hand with this it does not make sense to apply fixed nor closed all-in-one security solution in the era of high dynamics of ICT development, its growing variability and changes in global user community behaviour when new types of security threats occur frequently. We have to be able to defend our infrastructures and users against known attacks and threats but also have to be able to analyse, understand and eliminate any new or unknown or not typical attempt to break through as fast as possible. I'm trying to express (at least for the traffic monitoring area) that the key role in ensuring secure environment for users and efficient incident handling with minimal impact on infrastructure we are responsible for plays a) skilled, motivated and faithful personnel supported with b) open and flexible tools that are able to provide
2 (beside others) complex traffic analysis on demand. FTAS system is one of such attempts that primarily tries to be something like generic platform for complex traffic analysis while being open for customisation that might lead into automated statistical reporting as well as automated attack detection based on parameters specific for particular network environment. 2 CESNET e-infrastructure and large scale flow based traffic monitoring CESNET e-infrastructure is complex ICT infrastructure that focuses on direct support of research and development community in the Czech Republic and its specific needs. This type of infrastructure is globally known as NREN - National Research and Education Network. NRENs are built, developed and operated on different basis than commercial networks. They are driven by user communities which develop and implement a lot of specific demanding applications and also require dedicated single-purpose networks at different layers beside the shared IP. Network infrastructure itself must be non-blocking and offer a lot of free available capacity anytime. Natural NREN behaviour from traffic course perspective represent for example frequent jumps in tens of Gbps which will be considered as attacks or anomalies in commercial networks. For these and other reasons NREN network behaviour must be transparent as much as possible without any traffic regulation (unless extreme attack elimination). This approach requires from security perspective careful and consistent monitoring, anomaly detection and availability of traffic analysis tools in the whole infrastructure at all relevant layers otherwise it may become very dangerous (capacity reasons) as source or as mirror of massive network attacks. Also the cooperation with administrators and security teams in connected networks has to be very close and well organised. Flow based IP traffic monitoring has a long history at CESNET. We started to develop dedicated SW systems since flow based traffic information became available - "ip-accounting" at the beginning then all generations of so called NetFlow data. Our main SW system for large scale flow based monitoring is called FTAS (FlowBased Traffic Analysis System). Beside that we also started to focus on monitoring based on accelerated programmable HW and develop and operate multi-purpose probes at our external lines. CESNET e-infrastructure is from the service perspective multi-layer hybrid network. We build and operate optical layer (based on DWDM, currently up to 100 Gbps speed) as well as IP/MPLS layer above (running 100 GE core). Simplified topology of IP/MPLS backbone with flow information sources for large scale monitoring (blue - PE, CE routers) and HW accelerated probes (yellow) at all external lines is in Figure 1.
3 Figure 1: flow information sources in CESNET e-infrastructure IP/MPLS core. 3 FTAS System FTAS system is developed and operated for a long time (first flow-based tools before 2000, first generation in 2002) and for reasons mentioned in the introduction section (specific demands of different user groups) it may be nowadays considered as user driven system - most of new features and functions are implemented according to user community requests. Main purposes of the FTAS system are: To allow to provide detailed traffic analysis in short history (weeks) without any traffic condition known before. To provide statistical post-processing of traffic of interest to aggregate big volumes of data while keeping the characteristics of the traffic for a long time (months, years). To provide periodical reporting based on statistical or security based flow information processing. To provide traffic anomaly detection in several ways - from on-the-fly (input flow information stream perspective) to post-processed (based on stored flow information retrieval).
4 FTAS system logically consists of several more or less linked components. The basic component collects, processes and visualises flow information received from the network (i.e. from primary flow information sources). FTAS may be operated in single-host (Figure 2) or multi-host (Figure 3). Figure 2: FTAS single-host architecture example. Figure 3: FTAS multi-host architecture example. FTAS is IP version transparent with full IPv6 support - at the flow information transport layer as well as in the whole chain of flow information processing. It is able to process almost all known data formats in this area like NetFlow export versions 1, 5, 7, 9 , IPFIX (or v10) [2, 3, 4, 5] and sflow . In case of sflow it basically parses samples of packets in sflow records. Internal data structure currently represents set of most demanding flow information fields and is easily extendable (while keeping backward compatibility) - including variable length fields from IPFIX, significant fields defined in NetFlow Flexible Extension, NetFlow Secure Event Logging and similar. Basic flow information processing chain is in Figure 4 with one addition not shown here. There is a possibility (in case of robust
5 flow information sources) to multiplex input stream into several parallel ones (even to different FTAS nodes) to spread utilisation of resources as needed. Figure 4: FTAS input flow information processing chain. Processed and stored flow information data may be statistically post-processed (Figure 5) in order to keep characteristics of the traffic while reducing the volume of data (average aggregation is usually better than 1:10-e2). Figure 5: FTAS post-processing schema. There are multiple ways how to access data collected and processed with FTAS. First group represents Web-based access to data. Basic interactive web-based user interface serves for traffic information selection and visualisation and for system administration and focuses on skilled users - network and service administrators, CSIRTs and security specialists. It is designed for two phase work - query once & visualise multiple. Query form offers either simplified (structured) or comprehensive look up user interface that enables to set query conditions without limits.
6 New query and visualise in a single step interface for requests generated by devices (specific URL construction) is under testing. Interactive UI scheme is in Figure 6. Figure 6: FTAS interactive user interface scheme. For let's say ordinary users or users who aim at statistical overview and manager type reports and users that need click-and-see interface type there is a standalone module called FTAS-reporter. It supplies real user behaviour (given by configuration) in never ending loop and creates trees of static HTML documents bound together with vertical and horizontal links and indexes. It uses interactive user interface on background (Figure 7). Figure 7: FTAS-reporter and interactive UI schema. Second group of accessing results processed by FTAS represent notifications typically transported with .notifications may occur in different parts of flow information processing. First place where notification can occur is detected event in anomaly detection module in FTAS flow information processing core. This anomaly detection is tied with traffic filter identifying traffic of interest and traffic bursts represented in flow
7 count limit for period. This is immediate notification based on actual state without any knowledge in traffic history - therefore the limits here shall be secure and thus high. Simplified scheme of this processing is in Figure 8. Figure 8: FTAS anomaly module behaviour and notification schema. Second place where notification can occur is from within FTAS-reporter. It is usually based on flows stored by anomaly detection module (same as in previous case, but without notification) which are periodically post-processed for longer (configured) period - let's say 10 minutes. Here we can set up softer limits in "anomaly detection" module (as it is a prerequisite only) but we set up hard summary limits for behaviour during the whole period - we look on continuance of such event in this case. Example of anomaly detection processing in FTAS-reporter is in Figure 9. Figure 9: Anomaly detection in FTAS-reporter. Last and specific notifications are aggregated summary reports concerning several observed traffic characteristics (usually top-lists of something) notified all in one for calendar period - typically last day. This functionality is completely in FTASreporter module which prepares plain-text reports as configured (instead creating structures of HTML documents), joins them together and sends at the end to configured destinations. This functionality is an example of alternate visualisation of the same thing to fit local habits of our users (we provide similar reports with different style of visualisation for different groups of users). Typical sub-reports are:
8 top-list of local downloaders, top-list of Microsoft ports users, top-list of nodes accessing SMTP port (none real user writes s per day), top-lists of traffic from/to SSH, SNMP or DNS ports from/to local network etc. 4 FTAS practical examples In previous section I've tried to describe basic principles on which FTAS is based. To understand it better I give a few simple practical examples how it can be used in everyday practice. Example 1, incident ex-post verification: our CSIRT received message about TCP SYN flood from our AS against particular network in period X having packet length greater than 800 Bytes without any detailed information. In this case we use FTAS interactive UI and will analyse whether this flood a) has origin in our AS and b) has origin in network with appropriate prefix allocation let's assume for this case that we did not apply BCP-38 nor other technique of reverse path checks. Query condition example for traffic selection is in Figure 10. It might be applied on data from all backbone edge routers (we retrieve flow source and its interface indexes as well to discover traffic origin). We found approx. 212k flow records and in the first step we may observe the detailed sample (Figure 11) as well as its course in time (Figure 12) and finally aggregated summary information including interface to verify the attack origin (Figure 13). With the help of internal automated interface index translation (service of our another monitoring system G3) we can see SNMP ifindex translation into interface descriptions and addressing (Figure 13). All output examples are visualisations of the same query result in FTAS interactive UI. Example 2, on-the-fly anomaly detection and notification: automated detection (on-the-fly) with notification of potential sources of TCP SYN flood from our AS. This is an example of FTAS traffic filter configuration that acts like anomaly detector. Filter configuration consists of two parts - traffic selection condition (Figure 14) and set up how to process and store such traffic information (Figure 15). Corresponding notification example is in Figure 16. From the event and content perspective it demonstrates the same TCP SYN anomaly as in the first example.
9 Figure 11: FTAS interactive UI, results - attack verification. Figure 10: FTAS interactive UI, query condition to find attack.
10 Figure 12: FTAS interactive UI, results - attack verification. Figure 13: FTAS interactive UI, results - attack verification. Figure 14: FTAS traffic filtering condition configuration.
11 Figure 16: FTAS on-the-fly anomaly notification example. Figure 15: FTAS filter as traffic anomaly detector configuration.
12 Figure 17: FTAS reporter, anomaly detection - overview page. Figure 18: FTAS reporter, anomaly detection - detailed report.
13 Figure 19: FTAS reporter, anomaly detection - alternate overview page.
14 Example 3, anomaly detection in FTAS reporter: example of ex-post periodical detection of internal IP addresses that might attack port numbers 135, 445, We demonstrate overview page (chronological in Figure 17), single anomaly detail report (Figure 18) and alternate overview page (top-list in Figure 19). Examples presented above demonstrate only a fragment of FTAS functionality. Their purpose is to show the there's more than one way how to do it principle (notice: perl programming motto) which we try to incorporate into the FTAS system. 5 FTAS as a service in CESNET e-infrastructure In CESNET e-infrastructure we provide FTAS based services internally for network, service administrators and CSIRTs of course. Beside that we also provide flow-based monitoring services powered by FTAS for our users. There are several typical architectures of delivering such services. The key points are: a) which FTAS installation and b) which flow information data (topology aspects) to use. Simplest architecture is to use primary FTAS installation in CESNET einfrastructure backbone and filtered flow information from nearest router (from user network perspective) export (Figure 20). Users don't need to take care of anything in this case. On the other side they have no information about traffic which does not come across the backbone border. Figure 20: FTAS service architecture 1. Second option is to use primary FTAS installation in CESNET e-infrastructure backbone and export flow information from local devices to it (Figure 21). User has to take care of proper flow information export only and gets information about internal traffic too. In many cases users use mixed architectures 1 and 2 - exporting flow information from critical internal devices only.
15 Figure 21: FTAS service architecture 2. Figure 22: FTAS service architecture 3 Third service model (Figure 22) is based on dedicated FTAS installation in user network (shared administration of that node[s]). This gives users full freedom how FTAS will be configured (independent classification maps etc.). On the other side they have to take care of both - hosting HW and flow information export. Last but not least we offer to our users ad hoc traffic analysis on demand - this makes sense for users who don't want to use traffic monitoring regularly (solve incident handling only). There are currently more than 50 institutions in CESNET e-infrastructure user community which use FTAS in at least one of these service architectures and a lot of them use also additional services like dedicated reporting, various anomaly detection and others. Primary FTAS installation in CESNET e-infrastructure backbone currently consists of 17 nodes and 340 CPUs. Volume of flow-based information data processed (including internal redistribution) in this installation during 2014 is in Figure 23.
16 Figure 23: FTAS in CESNET e-infrastructure backbone, volume of processed data in References Claise, B., Cisco Systems NetFlow Services Export Version 9, IETF, RFC 3954, October Boschi, E. and B. Trammell, Bidirectional Flow Export Using IP Flow Information Export (IPFIX), IETF, RFC 5103, January Boschi, E., Mark, L., Trammell, B. and T. Zseby, Exporting Type Information for IP Flow Information Export (IPFIX) Information Elements, IETF, RFC 5610, July Claise, B., Trammell, B. and P. Aitken, Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information, IETF, RFC 7011, September Claise, B. and B. Trammell, Information Model for IP Flow Information Export (IPFIX), RFC 7012, September Phaal, P., Panchen, S. and N. McKee, InMon Corporation's sflow: A Method for Monitoring Traffic in Switched and Routed Networks, IETF, RFC3176, Septempber 2001.