ntopng: Realtime Network Traffic View Luca Deri <deri@ntop.org> 3/28/14 1
ntop in 1998 In 1998, the original ntop has been created. Available for Unix and Windows under GPL. Contrary to many tools available at that time, ntop used a web GUI to report traffic activities. Modern enough to be usable from 90s mobile phones. 3/28/14 2
It Was Time For a New ntop Clean separation between the monitoring engine and the reporting facilities. Robust, crash-free engine (ntop was not really so). Platform scriptability for enabling extensions and changes at runtime without restart. Realtime: most monitoring tools aggregate data (5 mins usually) and present it when it s too late. Many new features including HTML 5-based dynamic GUI, categorization, (generic) flow collection, system drill-down, DPI. 3
Welcome to ntopng [1/2] 4
Welcome to ntopng [2/2] The C++ monitoring engine is designed to be fast (10 Gbit line rate), resource savvy, and be accessible via Lua scripts. Scriptability enable the creation of dynamic HTML 5 pages without having to understand/modify the inner ntopng engine or low-level monitoring concepts. In ntopng every object is serialisable in JSON (JavaScript Object Notation) that is the native format that modern web browsers can handle. This means that through HTTP and JavaScript you can create dynamic web pages for realtime monitoring as every activity in ntopng is asynchronous. 5
Lua-based ntopng Scriptability A design principle of ntopng has been the clean separation of the GUI from engine (in ntop it was all mixed). This means that ntopng can (also) be used (via HTTP) to feed data onto third party apps such as Nagios or OpenNMS. All data export from the engine happens via Lua. Lua methods invoke the ntopng C++ API in order to interact with the monitoring engine. PS: Lua is a simple to use, fast, crash-free scripting language that is used to script many popular applications ranging from Wireshark to networkbased games. 6
Using ntopng as Live Data Source In essence ntopng is your source of traffic monitoring information. Data sources include: Captured packets (native in ntopng). Collected flows (NetFlow/sFlow sent by nprobe). Collected events received via ØMQ (e.g. firewall events or syslog). As ntopng natively speak JSON, it can be export monitoring data towards applications such as: Splunk Kibana/ElasticSearch 7
Integrating Live ntop Apps with Splunk We have developed a free (GPLv3) Splunk application (available on the Splunk store) that shows how to collect generic (e.g. flows) or specific (e.g. HTTP) traffic and visualise it. 8
Integrating Live ntop Apps with LogStash/ElasticSearch/Kibana 3/NetEye Similar to Splunk, it is possible to export live traffic reports to LogStash/ElasticSearch/Kibana/NetEye. nprobe (soon ntopng too) allow to do live JSON streaming to such apps as follows: --tcp <elasticsearch host>:<port> For those brave enough to move to the next level we are working at a direct ntop -> Hadoop Distributed File System (HDFS) integration so that you can store all events and flows onto a big data system. Currently we support Apache Kafka (distributed messaging), but we are planning to add native support for Flume in the foreseeable future. 9
Everything is in Realtime In ntopng all counters can be polled by the browser (or any other application via HTTP) while they are updated. All charts, graphs, counters report the current value without delays. 10
Why Realtime is so Important? Most monitoring tools are not able to show what is happening when such activity is happening. Paradigms such as flow-based monitoring are inherently non-realtime as they accumulate packets for some time (e.g. 1 or 2 mins) and then report average values. SNMP-based monitoring tools poll counters every 5 minutes, so you will always see average values. So, is realtime view just a plus or a compulsory feature? 11
Realtime vs Non-Realtime Using average counters you miss many details that might explain you why your network performance is poor. 12
Generating Traffic-based Alarms ntopng is scriptable even for generating alarms based on traffic conditions. Lua can be used to extend featured alarms, so that the ntopng can trigger events à la carte as every network administrator knows its network best. 13
Is Packet Analysis Always A Good Option? Analysing network packets is definitively a simple way to see what happens on a network. But Some protocols are encrypted (more and more will be). Sniffing traffic is not always an option (privacy, need to setup network taps/span ports ). How can you capture traffic on your cloud-based VMs? You are unable to understand what is really happening on your core servers where even serving a simple HTML page is complex (reverse proxy <-> HTTP server <-> PHP script <-> Database). In essence: can we finally monitor our services at high granularity, without sniffing traffic, watching process interactions and pin-pointing resource waste (CPU, memory, I/O, network)? 14
Say hello to sprobe At ntop we have decided that it was time to complement our traditional network-based tools with an (optional) agent (installed on every monitored system we want to drill-down) able of viewing in realtime what a network probe will never be able to do. Leveraging on sysdig, an open-source technology developed by the creators of WinPcap and Wireshark, we are working at a system probe (thus the name sprobe) that is able to view what is happening on a system. sprobe is initially available for Linux but it will be ported to other platforms (including Windows) in the near future. 15
What is sprobe? It is a system probe that can track in realtime all activities: Network I/O CPU Usage Memory In essence it allows to tell you: Where inside your system is the bottleneck. What user and application caused the bottleneck. What is the latency introduced by each and every active application. What network activities are performed by which application. What is the real network protocol complementing DPI. All live, in realtime, when the bottleneck is happening. 16
sprobe Processes Drill Down [1/2] Client vs Server Live Traffic View and Live Latency View 17
sprobe Processes Drill Down [2/2] 17
Flows View (with Details You Have Never Seen Before) Who is Doing What System Load Live Memory Usage Application Latency (usec not msec) 19
Users vs Processes vs Traffic 20
Realtime System + Network View Combining system with network view allows you to spot where your bottlenecks are. You can drill down in realtime down to users and processes to identify exactly what is happening where. We can measure traffic activities as well latency with an accuracy the network cannot offer (microseconds) and with high reliability. You can install sprobe on those systems where capturing traffic would not be possible or feasible (VMs and cloud services). sprobe migrates with your elastic services when servers dynamically grow, move, or shrink. 21
Final Remarks Monitoring realtime activities is compulsory today. Periodic activity monitoring does not allow bottlenecks to be spotted properly (we know we have a problem, but we are unable to say exactly who is the responsible for it). Thanks to the asynchronous and multithreaded ntopng monitoring platform, it is possible to report live activities while triggering alerts, analysing network traffic, exporting data to third parties products via HTTP/JSON. All at 10 Gbit, using the open source software ntop has created. 22