Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN By Paul Stevens, Advantech Network security has become a concern not only for large businesses, but also for medium-sized firms. As threats to the network grow more prevalent and destructive, medium-sized businesses need enhanced security for access control, user authentication, and attack protection. This enhanced security requires a leap in performance, particularly in VPN performance. VPN performance is critical due to the growing number of tunnels required to support remote access users, backhaul regional office connections or secure wireless access points. This is bad news for medium-sized businesses, which have typically been priced out of VPN acceleration and end up compromising on features and performance. To address these needs, a new breed of platforms based on Intel s new EP80579 Integrated Processor delivers untouchable network security performance for at least half the price of previous platforms. With as much as 1600 Mbps of VPN throughput, they deliver a nocompromises approach to security for medium-sized business customers. This article presents a technology overview of an Intel EP80579-based Network Application Platform design. It compares it with previous solutions and shows why the Intel EP80579 will remake the network appliance market. Finally, it reviews packet processing acceleration methodologies, and shows how to use these methodologies on the Intel EP80579 platform. Intel EP80579-based Platform vs. 4-chip Solution Let us begin by reviewing previous solutions. Until now, most entry- to mid-range network application platforms used a 4-chip solution. For example, Advantech s FWA-3700 uses: A 1.8GHz Intel Pentium M processor, An Intel 915GM Graphics and Memory Controller Hub, An Intel ICH6-M I/O Controller Hub and An add-on PCI Crypto accelerator card for IPsec or SSL VPN solutions. This platform achieves a typical IPsec VPN throughput of 200 Mbps in a set-up using 256 byte packets and with 2048 IPsec VPN tunnels. However at this throughput the CPU operates at 100% capacity and CPU power consumption is as high as 31W. The Intel EP80579 Integrated Processor with Intel QuickAssist Technology replaces all three chips plus the accelerator card with a single System-on-a-Chip (SoC). This provides the following improvements: 2x improvement in cost 8x improvement in throughput 10x improvement in headroom 20% power savings 45% decrease in board space Advantech s new Intel EP80579-based FWA-3240 platform (Figure 1) illustrates these advantages. Initial simulated results for this platform yield 1600Mbps of IPsec VPN throughput with as little as 10% CPU utilization, all with a power reduction of almost 20%. (The Intel EP80579-based FWA-3240 figures use the fast path model, while the 4-chip FWA-3700 uses the look-aside model. We describe both models later in this article.)
With the highly integrated Intel EP80579 SoC, security appliance OEM s can forgo specialized co-processors and dedicated security hardware while remaining cost-effective (as much as 50% reduction for an equivalent configuration) and extremely power-efficient. Board size also decreases by nearly 45 percent thanks to the reduced real-estate requirements of the Intel EP80579 SoC. Best of all, the SoC is backwards code compatible with earlier Intel processors. Many security vendors already incorporate Intel x86 processors. These vendors can run existing software applications on the Intel EP80579 because it is backward code compatible with earlier Intel processors. Figure 1 Advantech FWA-3240 System and Block Diagram Intel EP80579 Architecture Overview Let s take a closer look at the Intel EP80579 architecture to see what makes it so compelling. We ll focus in particularly on the Acceleration layers. As shown in Figure 2, the Intel EP80579 is a SoC integrating an Intel Architecture processor with memory and I/O controllers. It also has integrated Intel QuickAssist Technology, which provides acceleration of cryptographic and packet processing. The Intel EP80579 is priced at $54 to $95, and has a thermal design power (TDP) of 13 to 21 W. (Pricing and power depend on the speed grade.) The four main components of the Intel EP80579 are as follows: The IA-32 core is based on the Intel Pentium M processor. It runs at 600-1200 MHz, with a 256 Kilobyte 2-way level 2 (L2) cache. The Integrated Memory Controller Hub (IMCH, sometimes known as the north bridge ) provides the main path to memory for the IA core and all peripherals that perform coherent I/O. Coherent I/O includes the PCI express, the IICH south bridge, as well as transactions from the Acceleration and I/O Complex to coherent memory.
The Integrated I/O Controller Hub (IICH, sometimes known as the south bridge ) provides a set of PC-compatible I/O devices. These include two SATA 1.0/2.0 controllers, two USB 1.1/2.0 host controllers supporting two USB ports, and two 16550-compatible serial UART interfaces. The fourth and most significant component from a network appliance platform perspective is the Acceleration and I/O complex (AIOC). This complex includes the Intel QuickAssist Technology which provides the following components: The Acceleration Services Unit (ASU) provides acceleration of packet processing for common protocols (IP Forwarding, IPsec) as well as a fast packet classification engine with support for firewall, NAT and IPsec based actions. The Security Services Unit (SSU) provides acceleration of common symmetric cryptography algorithms such as AES, 3DES, DES and RC4, as well as asymmetric algorithms like RSA, Diffie-Hellman and DSA. It supports message digest/hash functions such as MD5, SHA-1, SHA-2 and HMAC. It also supports true random number generation. Other components within the Acceleration and I/O Complex include: Three Gigabit Ethernet (GbE) media access controllers (MACs). Three High Speed Serial (HSS) interfaces which support up to 12 T1/E1 TDM interfaces. Although not shown explicitly in Figure 2, the AIOC also contains logic to allow agents to access on-chip SRAM and external DRAM. Based on BIOS configuration, this logic routes requests to external DRAM either directly to the memory controller, or through the Memory Controller Hub (MCH) for coherency with the IA processor s L2 cache. There is also a ring controller, which provides 64 rings (circular buffers) that can be used for message passing between software running on the IA core and firmware running on the ASU. Acceleration Services Unit Security Services Unit Local Expansio n Bus (16bits @ 80MHz) MDIO x1 CAN x2 SSP x1 IEEE-1588 TDM Interface (12 MAC #2 MAC MAC 256KB ASU SRAM Acceleration and I/O IA32 IMC Transparent PCI-to-PCI EDMA IA32 core L2 Cache 256 KB FSB Memory Controller Hub IIC APIC, DMA, Timers, Watchdog Timer, RTC, HPET (x3) SPI LPC1.1 SATA 2.0 x2 USB 2.0 x2 UART x2 GPIO x36 PCI Express Interfac e X1 (Gen1, 1x8, 2x4 or 2x1 root Memory Controller (DDR-2 400/533/667/800, Figure 2 Key components of the Intel EP80579 Integrated Processor
Acceleration Models Security software supports multiple usage models of the acceleration capability, called acceleration models. The supported models are look-aside, fast path, inline. Look-aside Model Figure 3 illustrates the look-aside model. In the look-aside model, every packet goes directly from the Gigabit Ethernet MAC to the IA core. This model involves little or no acceleration of the packet processing. Once the IA core receives packets, it can send them to the SSU for cryptographic processing. The crypto functions include encryption, decryption, and authentication support for symmetric (bulk) and asymmetric (public/private key) algorithms. The IA core invokes these functions using an API that supports algorithm chaining. With chaining, a single call to the API carries out one cipher and one hash (in either order), thereby reducing the number of function calls and the associated latency. Integrated Security Accelerators Crypto Engine IA32 core Figure 3 Look-aside Model The advantage of the look-aside model is its ease of implementation. Many vendors already use PCI-based crypto accelerator devices that rely on the look-aside model. Vendors can easily replace these PCI devices with The Intel EP80579's integrated security acceleration features. The downside of the look-aside model is that its lack of packet acceleration limits it to the low end of the SMB market. Fast Path Model The look-aside model does not scale well to gigabit rates on the single-core Intel EP80579, especially in the case of small packets. The CPU cycles required to process each packet, and to handle the interrupts associated with its arrival, constitute a serious bottleneck. The fast path acceleration model addresses this limitation by processing packets entirely in the fast path (that is, on the ASU) without ever sending the packet to the IA core. Figure 4 illustrates the logical system level view for a fast path configuration. Integrated Security Accelerators Classification, Firewall, IP Forwarding, IPsec Engine Crypto Engine IA32 core Figure 4 Fast Path Model
In Figure 4, one of the Gigabit ports connects to the external network and the other to the internal network. An IPsec acceleration engine sits between the Gigabit ports. The IPsec acceleration engine runs on the ASU and works with the crypto engine or SSU. It encrypts packets going into the external network on an IPsec VPN tunnel and decrypt packets coming from the external network on an IPsec VPN tunnel. In a strict fast path model, all packets are processed entirely in the fast path, meaning that they enter the system, and are processed (including IPsec processing, NAT processing, route lookup etc.) without ever interrupting the IA core. Therefore, this model allows scaling up to gigabit per second line rates. Figure 4 does not show the Internet key exchange (IKE) processing. The IA core performs this processing, using the look-aside model to accelerate the public key cryptography required by the protocol. IKE processing is a relatively low-frequency event, so it does not significantly impact the scalability of the fast path model. Inline Model The inline model describes those cases where packets are sent to the IA core after an accelerator performs some amount of packet processing, cryptographic processing, or other accelerated processing. A typical example is the case where an SSL-encrypted TCP stream terminates on the host. In this scenario, accelerators handle the TCP processing, SSL record processing and cryptographic processing (including encryption/decryption and authentication) and send the plaintext stream to the host. This offloads a significant number of processing cycles from the OS stack, freeing up the IA core to do other things. Figure 5 illustrates the inline acceleration model. The TCP/SSL engine implements TCP termination on the fast path. Denial-of-service (DoS) attack prevention mechanisms include the use of SYN cookies to prevent TCP SYN flood attacks. The TCP/SSL engine also provides a complete fast path implementation of SSL record processing. The SSL handshake is implemented on the IA and uses the look-aside model to accelerate the cryptographic functions. Using the TCP/SSL engine, applications can implement transparent inline acceleration of an SSL VPN. Integrated Security Accelerators Classification, Firewall, IP Forwarding, IPsec Engine Crypto Engine TCP/SSL IA32 core Figure 5 Inline Model Combining the Models Real-world applications typically combine the models above, through the creation of policies with appropriate actions dependent on matching a set of classifiers. For example, IPsec traffic may be handled using either the look-aside or fast path acceleration models: In the look-aside model, packets are routed to the IA core. A software IPsec implementation (such as Openswan) uses the Look-aside Crypto API to accelerate the encryption/decryption and authentication aspects of the protocol. In the fast path model, packets are routed through the fast path IPsec implementation. Only the first packet in the first flow of every tunnel will result in events to the IA core. These packets can be routed to an IKE stack to initiate security association negotiation.
IP forwarding can be done entirely in the fast path, as can simple firewall actions such as dropping, rejecting and TTL scrambling. TCP splicing can also be done in the fast path. TCP termination and SSL can be implemented using the inline acceleration model. Other traffic can be routed to the OS stack without any packet processing. Regardless of the model used, cryptographic operations can be accelerated using the Look-aside Crypto API. Conclusion The Intel EP80579 platform delivers performance without sacrificing programmability. It provides enough CPU margin to respond to dynamic threats whilst offering the capacity for additional valueadded software services. This means that medium-sized businesses can now benefit from VPN acceleration without having to compromise on features and performance. Compared to past solutions, The Intel EP80579 offers dramatic improvements in cost, power, and board space, all while offer major advances in throughput and headroom. With all of these advantages, The Intel EP80579 is set to revolutionize the network appliance market. Refs: [1] The Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Programmer s Guide.