Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Six Days in the Network Security Trenches at SC14 A Cray Graph Analytics Case Study WP-NetworkSecurity-0315 www.cray.com

Table of Contents Introduction... 3 Analytics Mission and Source Data... 3 Analytics Approaches... 3 Analytics Successes... 3 Outbound Scanning... 4 Outbound SYN Flooding... 4 Linux SSH Brute-Forcing... 5 Wireless Network Failure... 6 Conclusion... 6 Acknowledgements... 7 WP-NetworkSecurity-0315 Page 2 of 7 www.cray.com

Introduction Through engagements with its customers and partners, Cray has applied graph analytics to computer network analysis, helping identify threats and risks to enterprise-scale networks. One of these networks is SCinet, the high-bandwidth network that supports SC, the annual supercomputing technical conference and exhibition. SCinet is operational for approximately seven days every year during the conference, linking the convention center to research and commercial networks around the world. Supercomputing 2014 (SC14) had the largest network to date. With 1.2 Tbits/s reaching the show floor, it supported 11,000 devices. The network had a /17 (32,000 IP addresses) of publicly routable IPv4 space (and some IPv6 space). SCinet is staffed by more than 300 volunteers from national labs, universities and vendors who work together over the entire year to design, construct, operate and dismantle the network every year. Cray has participated in SCinet on the network security team in various forms since 2012. Analytics Mission and Source Data At SC14, the SCinet network security team had two primary missions: 1. Detect and mitigate any outbound scanning or attacking behavior. 2. Identify, quantify and inform likely compromised hosts on the network. Cray supported these missions with Discover, a 2-TB Cray Urika-GD graph discovery appliance. Cray also supported an additional mission identifying rogue domain name system (DNS) and Dynamic Host Configuration Protocol (DHCP) servers utilizing Spark Streaming, software that is part of the Urika-XA extreme analytics system stack. This provided a streaming, alerting analytics capability that highlighted new servers as soon as the network security sensors observed their traffic. Data is key when performing analysis, and at SCinet, the network security team used Bro, an opensource deep packet inspection system to monitor nearly 300 GB/s of network bandwidth. These sensors generated nearly 1.6 billion records, which Discover parsed into 18.6 billion RDF triples for analysis. Analytics Approaches Cray used three analytics approaches/algorithms with Discover, a Urika-GD graph discovery platform, to develop answers to the network security team s analytics questions. These included basic search queries ( Find and visualize patterns of behavior that look like this ), Jaccard scoring ( What are these systems downloading malware likely using as command and control channels? ) and betweenness centrality ( Where should we start cleaning up this mess? ). Using Spark Streaming, Cray monitored DHCP and DNS logs, maintaining for each a histogram of uses (count, earliest and latest) and emitting an event when a new DHCP or DNS server was encountered. These analytics could be run either in a continuous streaming mode or in a batch mode for retrospective analysis. Analytics Successes These approaches were very fruitful for Cray and the SC14 network security team. At SC14, the SCinet security team observed several network security events, including: WP-NetworkSecurity-0315 Page 3 of 7 www.cray.com

Outbound scanning Outbound SYN flooding Linux SSH brute-forcing Wireless network failure Cray conducted analysis detecting, correlating and analysis of these events. Outbound Scanning Outbound scanning can be detected using a graph concept called dispersion: the out-degree of a given node. In computer network analysis, dispersion is used with port and protocol combinations to show nodes exhibiting unusual behaviors in a specific area. Cray used dispersion at SC13 and SC14 to successfully identify outbound scanning. At SC14, Cray identified infected clients on the wireless network scanning for vulnerable servers or sending malware out of the network. Outbound SYN Flooding SYN flooding is a computer network attack in which a client opens partial connections to a host and leaves them open. This attack is designed to remove a host from the network by exhausting its ability to accept inbound connections. If the host isn t on the network, it isn t available in the current available-everywhere networked-computing environment. At SC14, the network security team observed two clients on the network participating in SYN flood attacks. These two network events accounted for 86 percent of all network flow observed on the network. Figure 1 shows the flow count profile. Figure 1. Flow count from SC14, with SYN floods highlighted. Note the log-scale Y access. All of the network security team s tools identified these SYN floods. The team then looked for their root cause. Cray combined alerts from one intrusion detection system (IDS), showing a malware download on an odd port, with network flow and Jaccard similarity scoring to identify likely infected client behaviors. Jaccard similarity scoring identifies similarities by looking for entities with multiple shared nodes connecting the entities while the entities are not directly connected. Figure 4 visually illustrates the connection between tcp port 9162 (the port with the detected malware download from the IDS) and tcp port 7668 (the port that Jaccard scoring identified with behavior similar to 9162). These identified behaviors showed additional SCinet clients that may have been infected by the same type of malware. Figure 2 shows an example of this analysis. No other tool at SCinet performed this analysis, which then guided the rest of the security team in their follow-up examinations. The root cause of the infection was believed to be malware targeting Linux systems running SSH servers. The network security team spent the entire conference chasing the infections from this malware. WP-NetworkSecurity-0315 Page 4 of 7 www.cray.com

Linux SSH Brute--Forcing In SSH brute-forcing, an adversary connects to an SSH server and attempts to guess the password for an account on the system. The adversary repeats this behavior thousands of times. These attacks are mitigated by strong passwords and restrictions on the external hosts from which the server accepts connections. SCinet enacted neither of these mitigations; every booth and attendee at the conference chose their own passwords for their accounts, and all hosts were allowed to receive SSH connection requests from any internal or external clients. With the Linux malware in the environment, the network security team spent the four days of the conference identifying infected hosts using Figure 2. Betweenness centrality. Scinet:245.140 is the most central host in this IDS tools and then notifying graph. infected users. The team identified multiple hosts per day and visited three to four booths each day to inform them of their infection and recommend mitigations. Cray participated in this analysis by identifying likely internal hosts that may have been compromised using graph search and visualization techniques over a set of merged datasets, including flow, IDS alerts, DHCP events and HTTP activities, to enable forensic analysis. Figures 2 and 3 show sample visualizations of these SSH connection networks identified in our analysis. Cray applied betweenness centrality (a Urika-GD platform-specific SPARQL extension) to extracted SSH connection networks to prioritize network hosts for network security mitigation. Figure 2 shows an example of betweenness centrality. WP-NetworkSecurity-0315 Page 5 of 7 www.cray.com

Wireless Network Failure One afternoon during SC14, the SCinet network went down briefly and then restarted. Once the network came back up, clients were unable to reconnect to it. The wireless and network security teams worked together to identify the cause of the failure. Cray aided this analysis using Spark Streaming to identify DHCP servers present on the network. Cray s Spark Streaming analytics workflow allowed the combined team to quickly rule out a rogue DHCP server as a cause of the network failure. Figure 3. SSH connection chain visualization. Figure 4. Jaccard scoring. Urn:9162 is the seed port, and urn:7668 is the likely candidate target port. Conclusion Cray participated for another year in the SCinet network security team and used the Urika-GD graph discovery appliance and Spark Streaming to develop and perform a number of analyses in the four days of the SC14 conference. Using 1.6 billion records converted into 18.6 billion triples, these analyses utilized various graph techniques to quickly generate new analytics workflows in minutes to hours that execute within seconds. About the Urika-GD Graph Data Analytics Appliance Cybersecurity is one of the top use cases for the Urika-GD appliance. The appliance enables enterprises to: Discover unknown and hidden relationships and patterns in big data. Build a relationship warehouse, supporting inferencing/deduction, pattern-based queries and intuitive visualization. Perform real-time analytics on the largest and most complex graph problems. The Urika-GD system features a large shared memory and massively multithreaded custom processor designed for graph processing and scalable I/O. With its industry-standard, open-source software stack enabling reuse of existing skill sets and no lock in, the Urika-GD appliance is easy to adopt. The Urika- GD appliance complements an existing data warehouse or Hadoop cluster by offloading graph workloads and interoperating within the existing enterprise analytics workflow. WP-NetworkSecurity-0315 Page 6 of 7 www.cray.com

Acknowledgements Written by Eric Dull, formerly of Cray Inc. Cray s Peter Himmelfarb, system administrator, provided configuration and support during SC14. 2015 Cray Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the copyright owners. Cray is a registered trademark, and the Cray logo and Cray Urika-GD are trademarks of Cray Inc. Other product and service names mentioned herein are the trademarks of their respective owners. WP-NetworkSecurity-0315 Page 7 of 7 www.cray.com