Extracting Performance Metrics from NetFlow in Enterprise Networks 2nd EMANICS Workshop on NetFlow/IPFIX Usage Jochen Kögel jochen.koegel@ikr.uni-stuttgart.de 8. October 2009 Universität Stuttgart Institute of Communication Networks and Computer Engineering (IKR) Prof. Dr.-Ing. Andreas Kirstädter
Overview Motivation and scenario Monitoring in enterprise networks What could be extracted from NetFlow? Metrics extraction process Evaluation and results Conclusion and outlook 2
Motivation and scenario Monitoring in global enterprise networks Global MPLS-Cloud connects several locations Network metrics (RTT, delay, loss,...) monitored by active probes (partly, no full mesh) Unsampled NetFlow (v5) from many routers (own routers + customer edge) Application level: Response time measured by active probing (E2E-probes) Correlating network metrics with application response times: NetFlow-based? Extract metrics from NetFlow-Data to enrich/validate/replace active measurements? 3
Motivation and scenario What could be extracted from NetFlow Records? Round-Trip-Time (RTT) data from one router (depends on routing) One way delay data from several routers (+ synchronized clocks) Packet loss Flow contention (roughly) Partly covered in QoS-Monitoring-Section of RFC 5472 (IPFIX Applicability), but no investigation on flow-level so far (?) Constraints Incomplete NetFlow data (table contention, packet loss) Rerouting, ECMP, disjoint paths 4
Metrics extraction process Preprocessing Steps 1. Join flow records of same forward flow based on 5-Tuple (JoinedFlow) 2. Build FlowAcrossExporters: associate JoinedFlows of all exporters 3. Associate forward and reverse flows (BiFlow) RTT Extraction 1. Take complete BiFlows 2. Calculate start-flow record offset (mid-flow offset also contains server response time) Delay Extraction 1. Step through JoinedFlows of FlowAcrossExporters 2. Calculate delay between every exporter pair based on start time 5
Evaluation scenario NetFlow data NetFlow data of 3 days taken for this evaluation Filter: TCP-Connections of E2E-Probe and IP SLA flows IP SLA (active measurement between routers) Three UDP measurement flows per minute Reports on average RTT every five minutes 6
Evaluation 1: NetFlow-RTT/E2E vs. IP SLA RTT Comparison of active measurement results and NetFlow-RTT RTT measurements in different directions, but same path 7
Evaluation 1: NetFlow-RTT/E2E vs. IP SLA RTT IPSLA vs. TCP-Flows of E2E-Probes 650 600 IP SLA/Netflow RTT Comparison Netflow RTT, E2E Probe IPSLA RTT 440 420 Netflow/IPSLA RTT,E2E Comparison IPSLA RTT smoothed Netflow RTT, E2E Probe, smoothed 550 400 500 380 RTT/ms 450 RTT/ms 360 340 400 320 350 300 300 280 250 11.09 260 11.09 Left: without further processing, right: smoothed (window 15) 1-4 NetFlow-RTT/E2E samples per IPSLA-sample NetFlow-RTT timestamp may differ several seconds from IPSLA-Measurement time Closer look on flows from IPSLA-Measurement 8
Evaluation 2: NetFlow-RTT/IP SLA vs. IP SLA RTT Compare RTT gained from IP SLA with RTT calculated from measurement flows 9
Evaluation 2: NetFlow-RTT/IP SLA vs. IP SLA RTT 650 600 Netflow/IP SLA RTT Comparison Netflow RTT IP SLA RTT 400 380 Netflow/IP SLA RTT Comparison IPSLA RTT smoothed Netflow RTT smoothed 550 360 500 340 RTT/ms 450 400 350 RTT/ms 320 300 300 280 250 260 200 240 150 11.09 220 11.09 Left: without further processing, right: smoothed (window 20) ~15 NetFlow-RTT samples per IP SLA sample IP SLA flows reuse 5-Tuple matching problem reason for difference? 10
Evaluation 3: NetFlow-RTT vs. NetFlow-delay 1. Calculate NetFlow-delays independently of NetFlow-RTT 2. Match Netflow delay samples based on timestamp and sum up Direct comparison possible (if R3 and R4 were inaccurate, we would notice) 11
Evaluation 3: NetFlow-RTT vs. NetFlow-delay Delay of complete path: R1 - R4 600 550 500 Netflow RTT/Netflow delay sum (R1 R4) Netflow delay sum Netflow RTT/E2E 450 400 350 300 Difference between RTT, delay sum (6 outliers dropped) RTT/ms 450 400 350 300 samples 250 200 150 100 50 250 11.09 0 10 8 6 4 2 0 2 4 6 8 10 Difference/ms Sum of forward and reverse delay almost equal to NetFlow RTT Delay contribution of segment from R4 to reverse proxy negligible 12
Evaluation 3: NetFlow-RTT vs. NetFlow-delay Delay of partial path: R1 - R3 600 550 500 Netflow RTT/Netflow delay sum (R1 R3) Netflow delay sum Netflow RTT/E2E Difference between RTT, delay sum (15 outliers dropped) 180 160 140 120 RTT/ms 450 400 samples 100 80 350 300 60 40 20 250 11.09 0 5 0 5 10 15 20 25 30 35 Difference/ms Delay contribution of segment between R3 and R4 measurable 13
Conclusion and Outlook Conclusion NetFlow as basis for performance metrics: "QoS Monitoring" and server response Comparison to RTT of IP SLA data: differences, but trends are the same Comparison of NetFlow-delay and NetFlow-RTT delay contribution of network segments measurable Outlook Comparison on a large scale Improved algorithms to deal with missing or inaccurate records Compensation of record loss by combining information from several routers Take knowledge about record loss into account (IPFIX Reliability Statistics?) Is active per-packet measurement required at all? 14