UNIVERSITY OF CALIFORNIA 100G Network Monitoring with Bro and Time Machine Vincent Stoffer Cyber Security Engineer CENIC Conference March 11th, 2015 Irvine, CA
Agenda Intro / overview 100G monitoring challenges Bro! Time Machine Questions
Overview Lawrence Berkeley National Laboratory Located in Berkeley, CA "Bringing science solutions to the world" Unclassified DoE research facility operated by University of California Functions much like a research university
Computing overview ~5000 users ~10,000 hosts Distributed computing resources Many guests and visitors Open network to enable collaboration and research
100G monitoring challenges Expensive hardware No product solution Overall traffic volume overwhelming sensors log volumes Elephant flows Scaling up and down Maintain same visibility and protections
Overview Optical taps 100G, 10G, 1G Collect at packet broker Previously expensive proprietary hardware Merchant silicon changed the game Send out to monitoring devices
cpacket cvu, 10G monitor devices installed @LBL 2011 Apcon, 10G monitor devices installed @LBL 2007 Arista,100G monitor devices installed @LBL 2015
10G @ LBL since 2007 Mostly flat network Simple tapping setup External & Internal Dynamic firewall in the middle Apcon -> cpacket tapping infrastructure
100G Berkeley Lab approach Scale up our setup on 10G Moving from duplication to advanced aggregation New device needed Based on previous work from Scott Campbell at NERSC
100G Device requirements 100G and 10G ports Filtering at ingress & egress Port speed agnostic Aggregation, symmetric load-balancing No oversubscription limits API for dynamic filtering/shunting Filtering for arbitrary IP headers and TCP flags Every port can be input/output Create port groups Send output to load-balanced groups and single ports IPv6 support
100G Monitoring device options Commercial / Appliance Commodity network (proprietary / hybrid) Commodity network + SDN (scipass/flowscale)
100G Monitoring device eval Endace Access Brocade MLXe Arista 7504+7150
We chose Arista Flexible interface including GUI High density - 6 port 100G line card (supports LR-4) plus 144 10G ports! Easy to use API dynamic shunting! Relatively low cost Lots of peers using
Arista 7504 Arista 7150
Cluster-in-a-box (Arista + myricom + 1 super Micro ) 10G Cluster (cpacket + Force10+12 Super Micro s) LBL since 2007
General Architecture Split 100Gb link into 5 (or more) streams of 10G to feed each node Further divide each 10G stream into 10x1Gb so each of the worker nodes sees 1/50th of the traffic When our sustained traffic is 20Gbps (high estimate), each worker sees about 400 Mbps of the traffic Scale up as necessary
Network cards - Myricon Sniffer10G Support for Linux, FreeBSD Myricom 10G cards only Supports only one tool in 2.0 (multiple tools in 3.0) Company/IP in some flux
Traffic Distribution to the Cluster
Traffic per node
Shunting Heavy Tail Effect* is the observation that a small number of network flows will dominate the overall volume of data transferred for a given time By detecting and removing the data component of these heavy tail flows, analysis load is dramatically reduced without sacrificing security *Scott Campbell s work
Filters for Shunting Exclusions (IP pairs, netblocks, ports/protocols) Research networks / affiliates Resnet? Identify Elephant flows allow Control traffic Dynamic - Holy Grail Bro, API, near real time
Dumbno Python program for shunting Written by Justin Azoff Uses Arista JSON API to add ACLs which allow only control packets Bro s reaction framework feeds data real-time Connection details are preserved
Load Balancer Traffic split/node IDS Arista (7504+7150) Myricom 10GPCIE2-8C2, Myricom 10G sniffer drivers Load Balancer Arista Brocade Endace Gigamon Open Flow others? UNIX OS Bro FreeBSD-10.1 Traffic split/node IDS PF_RING Packet Bricks + netmap Endace DAG UNIX OS Snort Suricata Linux FreeBSD This table provides alternative tools and technologies for various parts of a 100G monitoring system.
BROverview Questions??
Open Source Network Monitoring Philosophy Know thy network Focus on people not products Commodity hardware UNIX/Linux focused Free & open source software Super adaptable
What is Bro? www.bro.org Not your typical IDS/IPS A monitoring platform A standalone network monitor A programmable framework An ecosystem
Bro History
Hardware Commodity servers (Supermicro) Linux/FreeBSD Network cards (Intel, Myricom, high end DAG)
Bro platform Apps Bro Platform Tap Log Recording Intrusion Detection Vuln Mgmt Programming Language File Analysis Custom Logic Standard Library Packet Processing Network Traffic
Bro platform Apps Bro Platform Tap Log Recording Intrusion Detection Vuln Mgmt Programming Language File Analysis Custom Logic Standard Library Packet Processing Network Traffic
Bro log types Connection logs Protocol logs Custom logs Alerting and debug logs Log formats: ASCII (plain text, default) Elasticsearch SQLite Dataseries (HP) binary output
>ls *.log app_stats.log communication.log conn.log dhcp.log dns.log dpd.log files.log ftp.log http.log irc.log known_certs.log known_hosts.log weird.log notice.log reporter.log smtp.log socks.log software.log ssh.log ssl.log stderr.log stdout.log syslog.log traceroute.log tunnel.log modbus.log
Bro connection logs (conn.log) Netflow ++ Stateful connection records Includes originator and responder Total byte counts, connections times, history and more
conn.log Mar 3 16:35:36 128.3.x.x 45191 http 0.023945 ShADadfF 6 671 worker-2-5 ClmuHr1gC6p76JbdVl 207.62.80.166 80 tcp 351 9886 SF T 0 11 10466 (empty)
Field Value Description ts 1425429336.809148 UNIX timestamp uid ClmuHr1gC6p76JbdVl Unique ID id.orig_h 128.3.x.x Originator IP id.orig_p 45191 Originator port id.resp_h 207.62.80.166 Responder IP id.resp_p 80 Responder port proto tcp IP Protocol service http Application protocol duration 0.023945 Duration orig_bytes 351 Bytes by originator resp_bytes 9886 Bytes by responder history ShADadfF State history
Bro application logs Full protocol level details Configurable Unique ID consistent across all logs Contents based on protocol
dns.log Mar 3 16:35:36 CHlGTa39L4ViNKf5wb 128.3.x.x 32609 131.243.5.1 53 udp 52600 cenic2015.cenic.org 1C_INTERNET 1 A 0 NOERROR F F T T 0 207.62.80.166 7973.000000 F
http.log Mar 3 16:35:36 ClmuHr1gC6p76JbdVl 128.3.x.x 45191 207.62.80.166 80 1 GET cenic2015.cenic.org / Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36 0 9695 200 OK - - - (empty) - - -- FrQ9Ct3IucTKymFao7 text/html HOST, CONNECTION,ACCEPT,USER-AGENT,DNT, ACCEPT-ENCODING,ACCEPT-LANGUAGE - - /
Great, but what do I need all that for? Ground truth for your network (Know thy network) Troubleshooting Analytics / reporting DFIR Use to build alerts and take actions
Know thy network - examples Basic logs Connections HTTP SMTP DNS
Bro platform Apps Bro Platform Tap Log Recording Intrusion Detection Vuln Mgmt Programming Language File Analysis Custom Logic Standard Library Packet Processing Network Traffic
Notices / Alerts Bro is event based Almost any event can trigger a notice (notice.log) Then you can take action More typical IDS function
Some example notices Address_Seen Scan::Address_Scan Scan::Port_Scan SSH::Password_Guessing Traceroute::Detected NTP::NTP_Monlist_Queries SSL::Invalid_Server_Cert SMTPurl::SMTP_Link_in_EMAIL_Clicked SMTPurl::SMTP_WatchedFileType SMTPurl::SMTP_Embeded_Malicious_URL HTTP::HTTP_SensitiveURI HTTP::SQL_Injection_Attacker Software::Vulnerable_Version TeamCymruMalwareHashRegistry::Match
Alert actions Notify via email/sms/etc. Shell scripts Firewall/device integration ACLd Total flexibility
Bro platform Apps Bro Platform Tap Log Recording Intrusion Detection Vuln Mgmt Programming Language File Analysis Custom Logic Standard Library Packet Processing Network Traffic
Bro policy Core - Generates events Scripting - Does stuff with them Not a signature though of course there is a way to do that :)
Bro policy philosophy Don t ask what Bro can do, better to ask what do you want to do? NTP monlist SIP scanners Tor ban SMTP URL SSH foreign login
Beyond Bro? But Bro can do everything??!! Bro provides us amazing metadata and beyond, but we sometimes need more Enter Time Machine
Time machine??
Time Machine background Stefan Kornexl Graduate thesis project Technische Universität München Stefan Kornexl, Vern Paxson, Holger Dreger, Anja Feldmann, and Robin Sommer. 2005. Building a time machine for efficient recording and retrieval of high-volume network traffic. In Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement (IMC '05). USENIX Association, Berkeley, CA, USA
Time Machine Creates pcap files with indexes Killer feature: "connection cutoff" Cutoffs defined per port Assumption: interesting stuff in the first N bits
Time Machine config class "smtp" { filter "port 25 or port 587"; cutoff 25m; filesize 2000m; } class "encrypted" { filter "port 22 or port 443"; cutoff 500k ; filesize 2000m; }
Traffic numbers Average 2-4 Gb/s Spikes to 10-20 Gb/s Roughly 25 TB / day full traffic 750 TB / month!
Storage Our goal was 6 months of packet capture With full traffic, we could do <1 week After multiple iterations/tuning of our buckets
March 2015 config buckets capture MB daily GB http 5 500 smtp 25 50 encrypted 500k 200 udp 5 20 icmp 64k 1 53 tcp/udp 5 15 else 5 150 TOTAL 936 6mo TB 170 From 750TB/ month!
But it s not full packet capture... Unless you are under regulatory requirements, doing full packet capture is probably wrong Once tuned, we want more horizontal but not more vertical (shallow TM) Incidents (SIP)
Buckets Number of conns thres hold conns < threshold conns > threshold Capture coverage with Threshold (%) Capture size Actual traffic on the wire udp 13,149,143 5M 13,142,093 7,050 99.94 20 G 400 G http 21,586,940 5M 21,568,519 18421 99.91 480 G 6100 G https 8,332,603 500K 8207340 125263 98.49 200 G 2300 G icmp 5,168,723 64K 5,168,004 719 99.98 935 M 984 M smtp 1,005,569 25M 1005400 169 99.98 60 G 66 G dns 53,450,492 5M 53450434 58 99.99 17 G 9G ssh 4,445,375 500K 4443373 2002 99.95 2G 2100 G
Time machine - retrieval Indexes may be helpful TCPdump as the retrieval interface (BPF) Command line find in your buckets Off to wireshark or whatever
Time machine - Bro Bro connects to Time Machine Bro can request data from TM to pass to an analyst or to perform retroactive processing
Time machine - shortcomings IPv6 support (LBL branch) Indexes don t persist between restarts (Fix coming?) Searching and collating can be a pain No searching above layer 4
Time machine - future Persistent indexes Shunted traffic Load-balanced TM?
How to get started Download Bro: www.bro.org Check out Security Onion: www. securityonion.net Time Machine: www.bro. org/community/time-machine.html Berkeley Lab 100G technical doc
Discussion / Questions? Vincent Stoffer - vstoffer@lbl.gov or security@lbl.gov