Delivering 160Gbps DPI Performance on the Intel Xeon Processor E5-2600 Series using HyperScan



Similar documents
Securing the Intelligent Network

How to Build a Massively Scalable Next-Generation Firewall

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

Intel DPDK Boosts Server Appliance Performance White Paper

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

Next-Generation Firewalls: Critical to SMB Network Security

Achieve Deeper Network Security

Cisco Integrated Services Routers Performance Overview

Comparing Multi-Core Processors for Server Virtualization

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

The Dirty Secret Behind the UTM: What Security Vendors Don t Want You to Know

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Achieve Deeper Network Security and Application Control

PRODUCTS & TECHNOLOGY

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

Getting More Performance and Efficiency in the Application Delivery Network

Boosting Data Transfer with TCP Offload Engine Technology

Virtualized Security: The Next Generation of Consolidation

SonicWALL Clean VPN. Protect applications with granular access control based on user identity and device identity/integrity

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

IBM Proventia Network Intrusion Prevention System With Crossbeam X80 Platform

How Solace Message Routers Reduce the Cost of IT Infrastructure

Game changing Technology für Ihre Kunden. Thomas Bürgis System Engineering Manager CEE

Advanced Core Operating System (ACOS): Experience the Performance

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Informatica Ultra Messaging SMX Shared-Memory Transport

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Content-ID. Content-ID URLS THREATS DATA

Developing High-Performance, Flexible SDN & NFV Solutions with Intel Open Network Platform Server Reference Architecture

VMWARE WHITE PAPER 1

Managing Latency in IPS Networks

Bivio 7000 Series Network Appliance Platforms

Content-ID. Content-ID enables customers to apply policies to inspect and control content traversing the network.

Unified Threat Management Throughput Performance

Symantec Enterprise Firewalls. From the Internet Thomas Jerry Scott

Meeting the Challenges of Virtualization Security

Understanding the Benefits of IBM SPSS Statistics Server

Stingray Traffic Manager Sizing Guide

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

WHITE PAPER. How To Compare Virtual Devices (NFV) vs Hardware Devices: Testing VNF Performance

Enabling Real-Time Sharing and Synchronization over the WAN

Using Palo Alto Networks to Protect the Datacenter

How Network Transparency Affects Application Acceleration Deployment

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS

High-Performance Network Data Capture: Easier Said than Done

COUNTERSNIPE

White Paper. Enhancing Website Security with Algorithm Agility

Frequently Asked Questions

How To Get To A Cloud Storage And Byod System

Accelerating Business Intelligence with Large-Scale System Memory

evm Virtualization Platform for Windows

REAL-TIME WEB APPLICATION PROTECTION. AWF SERIES DATASHEET WEB APPLICATION FIREWALL

White Paper Abstract Disclaimer

Going Beyond Deep Packet Inspection (DPI) Software on Intel Architecture

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Network Security Platform 7.5

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Integrating Web Messaging into the Enterprise Middleware Layer

Cisco Application Networking for Citrix Presentation Server

Overcoming Security Challenges to Virtualize Internet-facing Applications

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Leading Virtualization 2.0

Moving Beyond Proxies

SQL Server 2008 Performance and Scale

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

Load Balancing Security Gateways WHITE PAPER

WHITE PAPER. Extending Network Monitoring Tool Performance

Comparison of Firewall, Intrusion Prevention and Antivirus Technologies

First Line of Defense to Protect Critical Infrastructure

Consolidating Multiple Network Appliances

Intel Network Builders Solution Brief. Intel and ASTRI* Help Mobile Network Operators Support Small Cell Networks

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Open Source in Government: Delivering Network Security, Flexibility and Interoperability

Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management

Integrated Grid Solutions. and Greenplum

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Why Choose Integrated VPN/Firewall Solutions over Stand-alone VPNs

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Accelerating Business Intelligence with Large-Scale System Memory

Firewall Migration. Migrating to Juniper Networks Firewall/VPN Solutions. White Paper

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Benchmarking Cassandra on Violin

Going Beyond Deep Packet Inspection (DPI) Software on Intel Architecture

TIME TO RETHINK NETWORK SECURITY

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Alteon Switched Firewall

Transcription:

SOLUTION WHITE PAPER Intel processors Pattern Matching Library Software Delivering 160Gbps DPI Performance on the Intel Xeon Processor E5-2600 Series using HyperScan HyperScan s runtime is engineered for high-performance from the ground up. EXECUTIVE SUMMARY The requirement for advanced security equipment to scan data streams for malicious content at line rate presents significant challenges to network security vendors. Security equipment, overwhelmed by the data rates of present-day networks, is more likely to miss attacks, leading to increased risks of security breaches. While highspeed content scanning, also known as deep packet inspection (DPI), exists today, this technology typically requires purpose-built hardware to operate at high speeds, a solution that often leads to increases in development and manufacturing costs. A software-based approach that takes advantage of multi-core Intel architecture can provide a cost-effective and scalable solution that also has the flexibility to evolve with the changing needs of the system. The Intel HyperScan software pattern matching library, designed for both single-core and multi-core processors, offers a software-based DPI solution that can scale from under 1Gbps up to 160Gbps, depending on the number of cores used. By integrating DPI technology optimized for multi-core processors, HyperScan provides a cost-effective software solution for scanning data content at line rate in security equipment ranging from small network appliances to large network elements. Deep Packet Inspection Newer processor architectures with higher clock rates, larger caches and other advances have enabled network security equipment to incorporate capabilities previously found only in end systems. No longer limited to just bridging and forwarding, network elements are now scanning packets as they pass through. Taking advantage of this technology, network administrators now deploy intrusion detection/prevention systems (IDS/IPS) and network anti-virus and malware scanners alongside the traditional firewall. In some cases, several security functions are incorporated into a universal threat management (UTM) appliance. Unlike the traditional firewall that looks at packet headers only, this more advanced type of equipment uses DPI technology to examine the content of each packet to detect threats. Instead of basing security decisions solely on fields in the packet header, DPI technology allows a security application to peer deep into the contents of a data stream to try to identify harmful intent. This more detailed examination of the data stream has an added cost, however, as scanning the content of data streams is CPU-intensive and can become unfeasible as the traffic load increases.

4500 4000 3500 Mbps 3000 2500 2000 1500 1000 500 0 Model 1 Model 2 Model 3 Model 4 Model 5 Basic Firewall Performance Deep Inspection Performance Figure 1 Firewall Performance: Basic versus Deep Inspection The efficacy of DPI as a tool, therefore, depends largely on how well it performs under load. A system that performs flawlessly under contrived theoretical conditions but fails under heavy traffic presents a major security problem. Consider the analogy of a country s border services agent who is given the new mandate to pull over every automobile crossing the border to conduct a detailed search of its contents. While this approach may provide added security at remote crossings where traffic is sparse, it would cause severe congestion at major crossings and result in chaos as border agents arbitrarily turn back automobiles to relieve congestion, or worse, to blindly allow entry without any controls. This is exactly what can happen in security equipment that cannot keep up with the incoming data rates. Packets can drop off queues causing end systems to retransmit, further exacerbating the situation, or packets can be admitted blindly in a fail-open condition. To prevent this from occurring and to increase performance, network security vendors have typically had to rely on hardware-assisted DPI technology to perform detailed scanning at network speeds. This purpose-built silicon was expensive, difficult to use, and often imposed a different programming paradigm on the software, making it frustrating to maintain and costly to deploy across the product line. The availability of multi-core processors, however, has created the opportunity for well-designed DPI software to approach and even exceed previous hardware-based solutions in performance. Pattern Matching Central to many DPI implementations is the notion of pattern matching, the ability to compare and match incoming byte streams against a database of known offending patterns called signatures. These signatures represent potential malicious content and can take the form of simple literal strings or more complex patterns such as a specific arrangement of bytes broken up by a variable amount of other possibly irrelevant data. This latter type of signature is often described using some form of regular expression syntax, sometimes combined with proprietary grammar. While literal searches can, themselves, be quite intensive, regular expression searches require significant CPU resources such that they become problematic to perform at high speeds, especially when searching for thousands of such signatures. Figure 1 shows the marked differences in performance of a 2

representative system running a basic firewall performing basic packet filtering, and the same system running deep packet inspection against a set of rules and patterns. A good pattern matcher should be able to perform comparisons of the incoming byte stream against the signature database with deterministic performance regardless of the number of signatures. In other words, the pattern matcher should perform much better than the brute-force method of comparing the incoming byte stream with each signature sequentially. Going back and forth repeatedly in a data stream is extremely cache-inefficient, and does not scale as the number of patterns grows. Some software-based pattern matchers overcome this brute force approach by using trie-based algorithms where the set of patterns being searched is organized into data structures that map the relationship among patterns. When an incoming packet arrives, the software merely walks the map to look for matches, similar to how some IP address lookups are performed. While this represents an improvement over brute force approaches as the number of patterns grows, it also has its deficiencies. Depending on the number of signatures and the size of the processor cache, walking a trie in this way may require quite a few memory accesses per packet, with the worst case being one memory access per byte received, in which case performance may not turn out to be an improvement over the brute force method it is trying to replace. Therefore, while DPI technology is becoming increasingly sophisticated, it s also creating the challenge of effectively scaling performance. The deeper and more granular the inspection, the more processing is needed, and the more important it is to find alternatives to current published state-of-the-art mechanisms for scanning. HyperScan Pattern Matching Library While many pattern matchers in the industry are implemented with simple sequential approaches, with cacheunfriendly algorithms, or as pieces of specialized hardware that are often challenged by latency and scalability issues, equipment manufacturers can now take advantage of Intel software pattern matching solutions to cost-effectively drive and scale DPI performance. HyperScan is an OS-independent, multithreaded software pattern matching library. Easy to integrate, HyperScan is a drop-in replacement for libpcre, not only supporting a large subset of Perl Compatible Regular Expression (PCRE) syntax, but providing performance orders of magnitude better than libpcre. When deployed on an Intel architecture platform, HyperScan takes advantage of features such as hyperthreading, receive side scaling, and SIMD instructions to provide up to 160Gbps in scanning performance. Aside from classic regular expressions, HyperScan supports a wide variety of other signatures required for most security and data networking applications, including anchors, character classes, and bounded repeats. Unlike sequential approaches that go back and forth over the data stream repeatedly for each pattern in the database, HyperScan performance is not directly dependent on the number of patterns being searched. The data stream is scanned for all regular expressions in the signature set simultaneously and matches are returned to the application as they are found. HyperScan performance is deterministic and does not exhibit drastic swings in performance usually found in traditional scanning systems. HyperScan is capable of scanning each incoming packet independently or as a recombined data stream to detect attacks that span packets. A security application, for example, can reassemble a TCP stream and invoke the HyperScan library for scanning. HyperScan keeps track of any partial matches so that it can restart scanning from where it left off when more data arrives on that stream. Small Signature Footprint A key component of any regular expression searching solution is the compiler, which reduces the set of signatures to byte code that can be understood by the underlying pattern matching engine. Some compilers produce a small signature footprint, but these tend to be for engines that support sequential searches where the benefit of a small database is offset by poor scalability. Other compilers build large and complex databases, which contain pattern relationship information to allow searches of thousands of signatures to be executed in parallel, but at the expense of a large memory footprint. A database that allows parallel scanning is inherently larger since it must store the relationships among all the patterns, which could lead to exponential growth depending on the nature of the signatures. The downside is that a large database can overflow from processor cache leading to excessive memory accesses, sometimes as often as one access per byte processed. While both of these approaches may produce good results in contrived scenarios, they may lead to sub-optimal performance when running in a real-world environment with thousands of signatures and thousands of data streams. The HyperScan regular expression compiler and engine, on the other hand, support parallel scanning for thousands of signatures without the associated large memory footprint inherent to traditional parallel scanning databases. Using proprietary techniques, the HyperScan compiler builds a database that separates out the key portions of the signature set, resulting in a database footprint small enough to fit entirely into a processor cache for most use cases. By keeping the database compact, external memory accesses are rarely required under normal operation. 3

Tier-1 Vendor IPS Signatures with HTTP test traffic Signature Set Type HyperScan Throughput (Gbps) 1 Thread 2 Threads 4 Threads 8 Threads 16 Threads 18 Threads Streaming, 69 complex signatures 11.9 23.3 46.3 74 132.4 159.5 Streaming, 142 complex signatures 6.2 12.4 24.5 43.1 81.8 95.2 Streaming, 43 complex signatures 3.8 7.5 14.9 25.7 48.5 56.7 Streaming, 235 complex signatures 1.4 2.8 5.5 10 19.1 20.8 Non-streaming, 13K medium-complexity signatures 1.2 2.3 4.7 8.6 16.5 19.9 Non-streaming, 8K medium-complexity signatures 2.2 4.4 8.7 16 30.6 34.4 Figure 2 HyperScan Performance on Intel Xeon Processor E5-2600 Series Depending on configuration, an external memory access often only takes place when a match occurs, which is the ideal behavior for a pattern matching engine. Linear Scaling in Performance HyperScan s multi-threaded architecture takes advantage of symmetric multithreading to scale performance linearly with the number of hardware threads used. Each scan runs independently from each other scan, allowing for concurrent processing of different data streams without adverse performance impact. Multi-threading by itself is not sufficient to guarantee good scanning performance. While other software pattern matchers may indeed be multi-threaded, the contention by each thread for the shared pattern database may limit scalability. HyperScan s database, with its low memory footprint, coupled with the large caches in Intel processors, means that each thread scans data against a database that resides in its local cache. This dramatically reduces the amount of shared memory contention in multi-core systems, leading to a more linear progression without the traditional flattening of the performance curve as the number of threads increase. Throughput (Gbps) 180 160 140 120 100 80 60 40 20 0 Benchmarking the Intel Xeon Processor E5-2600 Series Pattern matching performance measurements can be influenced by a number of factors. The types and numbers of signatures, the content of the incoming traffic, and the number of matches or partial matches found in the data can all affect the benchmarking results. For the results to be meaningful, therefore, the tests must use real signatures and real network traffic. Performance benchmarking was conducted on the HyperScan library running on a new dual-socket, quad-core (total eight cores) Intel Xeon processor E5-2600 series-based platform using a complete set of current IPS signatures sourced from a leading security equipment vendor. The input was taken from real HTTP traffic captured and played back from a PCAP file. A simple application was HyperScan Performance on Intel Xeon Processor E5-2600 Product Family, 2.7 GHz 1 2 4 8 16 32 Thread Count written to read the PCAP file into memory and invoke the HyperScan APIs packet by packet, simulating the behavior of a real network application such as an IPS or a web proxy. Data was matched in streaming mode for cases where the threats might span multiple packets, and in nonstreaming mode for threats that would be contained within a single chunk of data. The signature set used included multiple variants of both to-client / to-server and URI sets. All signatures were compiled into their runtime database in less than 3 seconds. The benchmarking application specifically measured the raw pattern matching performance, excluding the time spent in reading the PCAP file and in pre- and post-scan processing. All the data used for pattern matching was resident in-memory for this benchmark. 4

The results (figure 2) show near linear scalability up to 8 threads, and a slight leveling off when approaching 32 threads where raw DPI scan performance tops out at 160Gbps. Figure 3 shows the performance progression when running the HyperScan library against a common set of signatures and data stream but varying the actual processor itself. All platforms were dual-socket, quad-core systems (total 8 cores). By taking the same HyperScan software library with the same APIs onto different Intel Xeon processors, raw scan performance can scale up and down with relative ease to match the desired equipment performance and price point. Packet Processing on Intel Architecture Platforms There are a number of reasons why software running on Intel architecture platforms can achieve superior packet processing performance. Intel s ticktock strategy of rapid introduction of new micro-architectures coupled with improvements in process technology has enabled multi-core Intel processors to extend their lead in both control and data plane processing. Recent innovations include the replacement of the front-side bus (FSB) with the point-to-point Intel QuickPath Interconnect, symmetric multi-threading, support for non-uniform memory access (NUMA), embedded and wider memory controllers, as well as new streaming instructions for even faster CRC calculations. This tighter integration directly results in packets being brought in from the NIC and deposited into local memory more efficiently than ever before. Once the packets are in memory, application software can use Intel Data Plane Development Kit (Intel DPDK) and Intel QuickAssist Technology to speed up packet processing with features such as interrupt-free packet reception and transmission, pre-fetching and cache warming, NUMA awareness, real-time buffer management and zero-copy buffers, lockless rings, and optimized longest prefix match and flow classification. These features, along with the high clock rates and large caches make platforms based on Intel processors well-suited for both low-touch and high-touch applications, enabling market-leading packet processing capabilities as well as accelerating intensive datapath workloads such as cryptography, compression, and deep packet inspection. Backed by a strong set of supporting tools, these architecture advancements together with Intel DPDK and Intel QuickAssist Technology are enabling a new generation of products that can deliver the most scalable and best performing solutions quickly and efficiently to the market. 180 160 140 120 100 80 60 40 20 0 Throughput (Gbps) Intel Xeon Processor E5400 Series Intel Xeon Processor E5500 Series Intel Xeon Processor E5-2600 Product Family Figure 3 Scaling HyperScan Performance with Multi-core Intel Processors 5

Conclusion Network security equipment vendors often have to make compromises when trying to consolidate their products onto a single platform. Hardware and software designed for light-touch processing and simple packet filtering in a SOHO appliance can be significantly different from those designed for advanced security applications in a large enterprise. Yet the savings in initial development cost and ongoing software maintenance make consolidation a worthwhile goal. Having parallel development organizations create similar products on different platforms is no longer viable due to market pressures to reduce costs while continuing to quickly offer new products. Vendors are consequently looking for agile platforms that provide predictable performance and higher levels of scalability and flexibility so that the same software can be used on all products in the product family. This is only possible by adopting software-only solutions that take advantage of multi-core processing to scale up and down the product line from small sub-gbps appliances to large multi- Gbps network equipment. The HyperScan pattern matching library enables advanced security applications to scale their DPI performance linearly up to 160Gbps. The same library can be used on small appliances just as cost effectively as on large enterprise-scale network equipment merely by adjusting the number of hardware processor threads recruited. The combination of Intel architecture processors and HyperScan library enables security solution vendors to create a single platform that readily scales network throughput and satisfies the needs of different market segments without the added cost of working with multiple platforms. For more information about Intel processors, visit www.intel.com. Copyright 2013 Intel Corporation. All rights reserved. Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the United States and/or other countries. *Other names and brands may be claimed as the property of others. Printed in USA 1013/MS/SD/PDF Please Recycle 326830-002US