Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)



Similar documents
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Stateful Inspection Technology

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

D1.2 Network Load Balancing

International Journal of Enterprise Computing and Business Systems ISSN (Online) :

Recon Montreal

A Research Study on Packet Sniffing Tool TCPDUMP

PROFESSIONAL SECURITY SYSTEMS

Parallel Firewalls on General-Purpose Graphics Processing Units

Cisco Integrated Services Routers Performance Overview

Cisco PIX vs. Checkpoint Firewall

Network Security Demonstration - Snort based IDS Integration -

ΕΠΛ 674: Εργαστήριο 5 Firewalls

Security Overview of the Integrity Virtual Machines Architecture

Firewalls and IDS. Sumitha Bhandarkar James Esslinger

Gigabit Ethernet Design

Intel DPDK Boosts Server Appliance Performance White Paper

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Protecting and controlling Virtual LANs by Linux router-firewall

CS 356 Lecture 19 and 20 Firewalls and Intrusion Prevention. Spring 2013

Firewalls Overview and Best Practices. White Paper

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

ΕΠΛ 475: Εργαστήριο 9 Firewalls Τοίχοι πυρασφάλειας. University of Cyprus Department of Computer Science

Network Defense Tools

10 Gbit Hardware Packet Filtering Using Commodity Network Adapters. Luca Deri Joseph Gasparakis

- Introduction to PIX/ASA Firewalls -

TMOS Secure Development and Implementation

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Semantic based Web Application Firewall (SWAF V 1.6) Operations and User Manual. Document Version 1.0

Firewalls. Ahmad Almulhem March 10, 2012

Chapter 9 Firewalls and Intrusion Prevention Systems

Design Issues in a Bare PC Web Server

Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University

Accelerate In-Line Packet Processing Using Fast Queue

Introduction to MPIO, MCS, Trunking, and LACP

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

Chapter 14 Virtual Machines

Extensible Network Configuration and Communication Framework

Resource Utilization of Middleware Components in Embedded Systems

co Characterizing and Tracing Packet Floods Using Cisco R

Chapter 11 Cloud Application Development

SIDN Server Measurements

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

CS 356 Lecture 17 and 18 Intrusion Detection. Spring 2013

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

Chapter 1 - Web Server Management and Cluster Topology

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

A host-based firewall can be used in addition to a network-based firewall to provide multiple layers of protection.

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

10/100/1000Mbps Ethernet MAC with Protocol Acceleration MAC-NET Core with Avalon Interface

Performance Evaluation of Linux Bridge

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Router Architectures

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

Stream Processing on GPUs Using Distributed Multimedia Middleware

Building A Secure Microsoft Exchange Continuity Appliance

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Low-rate TCP-targeted Denial of Service Attack Defense

CMPT 471 Networking II

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Cisco Application Networking for Citrix Presentation Server

WHITE PAPER. Extending Network Monitoring Tool Performance

Enhanced Diagnostics Improve Performance, Configurability, and Usability

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

Security Technology: Firewalls and VPNs

Collecting Packet Traces at High Speed

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

We will give some overview of firewalls. Figure 1 explains the position of a firewall. Figure 1: A Firewall

Proxy Server, Network Address Translator, Firewall. Proxy Server

How Much Broadcast and Multicast Traffic Should I Allow in My Network?

Lustre Networking BY PETER J. BRAAM

IMPLEMENTATION OF INTELLIGENT FIREWALL TO CHECK INTERNET HACKERS THREAT

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

Firewall Introduction Several Types of Firewall. Cisco PIX Firewall

Firewalls P+S Linux Router & Firewall 2013

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

PCI Express High Speed Networks. Complete Solution for High Speed Networking

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

What is Firewall? A system designed to prevent unauthorized access to or from a private network.

IP Phone Security: Packet Filtering Protection Against Attacks. Introduction. Abstract. IP Phone Vulnerabliities

EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES. Ingo Kofler, Robert Kuschnig, Hermann Hellwagner

Technical White Paper BlackBerry Enterprise Server

VMWARE WHITE PAPER 1

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Classification of Firewalls and Proxies

Network Access Security. Lesson 10

Cisco Application Networking for IBM WebSphere

Transcription:

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE CCNC 2006 proceedings. Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC) Yaron Weinsberg, Elan Pavlov, Yossi Amir, Gilad Gat, Sharon Wulff Department of Computer Science The Hebrew University Of Jerusalem Email: {wyaron,elan}@cs.huji.ac.il Abstract We have implemented a firewall application on a Network Interface Card (NIC). We have tested the CPU utilization and the bandwidth in a variety of scenarios. The benefits of offloading code are most pronounced when rejecting packets. Our results suggest significant benefits of offloading applications and in particular firewall logic to a NIC. I. INTRODUCTION There are many communication applications that act on every incoming packet. Offloading such applications to the network interface card (NIC) has many potential advantages. Utilizing the onboard computational power of the NIC can reduce the demands put on the CPU. If the NIC can process incoming information it can avoid costly interrupts to the CPU. In addition, the NIC can serve as a gatekeeper thus avoiding potential threats to the CPU. Furthermore, applications on a NIC can be built such that they are system and OS independent. An application of particular promise for offloading is a firewall application. Since a firewall is an application that filters packets by a user defined security policy, earlier filtering (especially discarding packets) has a potential for significant improvements in performance. A firewall application on a NIC also has the additional advantage that it is harder for an adversary to modify than a software application running at the host. We have designed and implemented a firewall application which we call SCIRON (Secure-Communication IntegRated Over NIC) on a NIC. The system consists of three elements: The firewall logic, a management console and a policy builder. This paper presents SCIRON, and shows that offloading full applications has significant advantages and market potential more so than TCP offload engines [9] (TOEs) or protocol specific offloaded extensions. II. RELATED WORK Numerous firewall applications exist in the market today. The objective of firewall applications is to protect the network from external and internal attacks. Firewall applications range from commercial products developed by leading companies in the industry (such as Checkpoint [2] and Cisco [3]), to lightweight free tools such as Linux IP-Chains [10] and more advanced open source solutions such as Snort [7] and ClamAV [4]. These products share a common philosophy of filtering communication at the network stack layer. The packet filtering module is integrated into the operating system s kernel and intercepts each incoming and outgoing packet. The packet is evaluated against the Firewall s security policy and will be either discarded or allowed access to (or from) the protected computer. Recently, 3COM has independently developed [1] an intrusion detection system (IDS) suit which is a distributed firewall bundle for servers and desktops, priced at around 800$. Our philosophy of offloading firewall logic to the NIC is similar to their product but has significant cost advantages. Furthermore, although there is evidently commercial potential, there has been no research released on the quality of such a solution. III. SYSTEM DESCRIPTION The system is divided into three main parts: The firewall logic is what controls which packets are accepted and which are filtered. For purposes of comparison we implemented the firewall both on the NIC and as a driver. We then compared in section IV the results of the implementation on a NIC vs the results of the implementation as a driver. The second part is a policy builder which creates the policy that regulates the firewall logic. Since the number of rules effects the speed of the system we attempt to find a small set of rules that implements the given policy. As this problem is non-polynomial time complete we utilize several heuristics (detailed in the full paper). In addition we warn of conflicting or redundant rules. Finally, there is a management console which controls the loading of rules to a specific computer in the network as well as logging and other management activities. A. Overview and Motivation Offloading firewall logic to a NIC offers several benefits, as an off the shelf OS independent implementation. It is also harder to tamper with a hardware implementation as opposed to a software implementation. Finally, it allows flexibility in accessing a remote host without jeopardizing the whole segment. Firewall applications are computationally expensive for several reasons: The host s CPU is repeatedly interrupted by the NIC on incoming packets. The processing power required to handle the interrupts is wasted if the packet is doomed to be discarded. 1-4244-0086-4/05/$20.00 2005 IEEE. 1013

An adversary can try to perform a denial of service (DoS) attack by sending packets from many computers, in an attempt to overload the system. The networking stack has significant overhead. The PCI [6] bus is a major bottleneck especially in today s incline towards faster networking fabrics (should be reduced in the future PCI express). Offloading firewall activities to the NIC can evade some of these issues. We have implemented a static, 5-tuple (protocol, IP-source-address, IP-destination-address, source-port, destination-port) firewall, at the NIC. Our motivation was to examine the benefits from such an offload and to measure the expected performance in CPU utilization and overall throughput. B. Environment Our programmable interface card is based on the Tigon chipset. The Tigon programmable Ethernet controller, released in 1997 is used in the family of 3Com s Gigabit NICs. The Tigon controller supports a PCI host interface and a full-duplex Gigabit Ethernet interface. Figure 5 shows a block diagram of the Tigon. The Tigon has two 88 MHz MIPS R4000-based processors which share access to external SRAM. Each processor has a one-line (64-byte) instruction cache to capture spatial locality for instructions from the SRAM. In the Tigon, each processor also has a private on-chip scratch pad memory, which serves as a low-latency softwaremanaged cache. Hardware DMA and MAC controllers enable the firmware to transfer data to and from the system s main memory and the network, respectively. SRAM of 1 MB. This NIC does not provide a CPU interrupt mechanism. Although in our architecture, the NIC s operating system is designed as a non-preemptive kernel, our work does provide the OS specification for interrupt-enabled NIC hardware. C. Programming Model SCIRON [11] is based on a previous project conducted in our lab called STORM [12]. STORM provides a framework on top of the original NIC s firmware which enables a developer to install predefined hooks. Hooks can be installed both at the firmware level and/or at the kernel level (inside the NIC s driver). STORM s framework also enables adding custom events which are triggered by the driver. These triggers can be used as a communication method between the firmware and the network driver. Figure 2 presents the modules and hooks provided by STORM s framework. The Rx and Tx stubs provide the hooks necessary for intercepting traffic on the NIC. These hooks, enable SCIRON s firewall to filter traffic according to a predefined security policy. The firmware s trace module provides development and debugging capabilities of firmware code. We utilize this capability for transmitting packets from within the NIC for remote logging. Fig. 2. STORM s modules Fig. 1. Tigon Controller Block Diagram Scratch Pad CPU A CPU B Scratch Pad Memory Bus Memory Bus Arbiter External RAM Read DMA PCI Interface Write DMA MAC Figure 3 presents a sample hook invocation performed by STORMś framework. The storm pre recv hook is invoked in the receive control flow of the NIC s firmware. PCI Full-Duplex Gigabit Ethernet The Tigon Chipset The Tigon controller uses an event-loop approach instead of an interrupt driven logic. The motivation is to increase the NIC s runtime performance by lowering the overhead imposed by interrupting the host s CPU each time a packet arrives or a DMA request is ready. Furthermore, on a single processor the need for synchronization and its associated overhead is eliminated. Our system is comprised of the Netgear GA620T NIC, which uses Tigon version II chipset, with an external Fig. 3. Installing a Hook /* call the hook and get packet verdict */ BOOL allowed = storm_pre_recv(pkt); if (!allowed) { /* discard packet */ storm_discard(pkt) } /* allow packet */... The storm pre recv hook receives a pointer to the beginning of the communication packet, i.e. the ethernet header, and 1014

returns false if the packet is to be discarded or true if it is to be allowed (forwarded to the operating system). SCIRON s firewall is implemented as a set of such firmware hooks that are installed using STORM s API. These hooks are compiled and linked with the firmware image and are installed during NIC initialization. In order to simulate common kernel-based firewalls for performance evaluation, we have also installed hooks at the driver layer (also using STORM s framework). All comparisons shown in Section IV, compare the same firewall code (with the same filtering policy) between the driver based firewall and the NIC based firewall. Currently the firewall code is fully stateless thus state is not saved between successive hook invocations. Fig. 4. SCIRON s Managment Console Architecture D. SCIRON Architecture This section presents the main components of SCIRON runtime. The runtime is comprised of two main components: The SCIRON enforcement module and SCIRON s management console. 1) Enforcement module: The enforcement module is the engine of SCIRON s firewall that actively enforces the security policy upon incoming and outgoing packets. SCIRON s firewall is an ordered 5-tuple firewall. When a packet arrives, a sequential pass over the rules is performed. The action (accept or reject) associated with the first rule that matches the packet header is performed. If there is no match, the default policy action (reject all) is performed 2) Management Console: SCIRON s management console provides remote administration and logging capabilities. Administrators can remotely install security policies at enforcement modules of machines in their domain. This is done by communicating with SCIRON s embedded enforcement module using a proprietary protocol called SRPP (SCIRON Remote Policy Protocol). An administrator can also determine the policy for monitoring and logging events to the management console. This is done by marking specific rules as log-rules. Packets caught by these rules will generate a log packet containing the packet s information. The logged packet is then sent to the management console. Allowing real-time monitoring and tracking of the network activities, enables the administrator to immediately act upon potential attacks. SCIRON management console is comprised of the following modules: (1) Management console GUI - a tool used for defining and managing the security policy. (2) Log viewer - A server application which receives log packets sent by the various enforcement modules and displays them graphically to the administrator. (3) Policy builder - a tool for verifying the correctness of the security policy defined by the administrator, by searching for shadowed and redundancy rules. The verifier implements the algorithm presented in [8] IV. EXPERIMENTAL RESULTS There are many parameters that influence firewall performance. Such issues as, the number of rules, current CPU utilization, packet size, ratio of incoming to outgoing packets, total number of packets, number of packets accepted vs number rejected etc., can all potentially influence performance. Performance is measured using two parameters. The first is the load on the CPU and the second is the throughput. In this section we present and discuss several typical results. In the first scenario we present (Figure 5), the firewall discards all the packets it receives. During this scenario the CPU is only running system processes. The CPU is on the left of the graph and throughput is on the right. Fig. 5. Receiving - Discarding all packets As we expect, in this scenario the CPU utilization when using the firewall implemented on the NIC is approximately zero, whilst for the same firewall on the host it is quite high. The second scenario presented is the complementary scenario in which the firewall accepts all received packets. In this case we see that the CPU utilization and throughput is much higher when the SCIRON firewall is deployed (see Figure 6). The host CPU utilization is very high (98.12%), leaving very little CPU time to the host applications. Most of the CPU cycles are consumed in the networking stack and interrupt handlers. Although the CPU has more computational power than the NIC it has to do less work for each incoming packet (as rule matching is done by the NIC) leading to higher throughput. 1015

Fig. 6. Receiving - Accepting all packets Fig. 8. Sending - Accepting all packets The third scenario, given in Figure 7, is probably a more realistic behavior for a typical host machine. Fig. 7. Receiving - 50% acceptance rate It is evident that the NIC based firewall has better performance both in CPU utilization and throughput. Finally, the last scenario is somewhat less typical. In this scenario all user packets are compliant to the firewall rules, hence all packets are forwarded. As we can see in Figure 8, SCIRON firewall performance is inferior to the host s firewall. Since, all outgoing packets have to be processed, the computational power of the firewall becomes a bottleneck as the host CPU speed is faster than the NIC s processor. We expect that this difference will be less outstanding in high end NIC cards. In practice, most machines are driven by incoming bandwidth and not outgoing bandwidth. A. Extensions Offloading code can potentially exacerbate the security problem by adding more opportunities for bugs. Unfortunately, even if an offloaded protocol design can be shown to be secure, this does not imply that all of its implementations would be secure. In fact, many (if not most) security holes are implementation bugs, not specification bugs. Hackers actively find and exploit bugs, and an offloaded code bug could be much more severe than traditional user-level applications bugs, because it might allow unbounded and unchecked access to host memory. An additional minor problem is that in some scenarios implementation of a firewall on a NIC suffers drawbacks compared to standard implementations. These scenarios typically involve a heavy load of outgoing packets. These two problems lead us to consider a mixed paradigm: Utilize a NIC firewall for preliminary filtering of incoming traffic along with a conventional firewall for additional filtering. Any bugs offloaded to the NIC can easily be dealt with via the conventional firewall. Since most of the performance gain is due to faster discarding of unwanted packets this solutions conserves most of the benefits of offloading the firewall logic to the NIC. In addition the conventional firewall can filter outgoing packets thereby eliminating the bottleneck associated with the NIC filtering. In the future we intend to look at a Stateful Packet Inspection (SPI) firewall, in which filtering depends on prior packets received to avoid such attacks as denial of service (DoS). We expect that the advantages of an SPI firewall on a NIC will be substantial although less pronounced than for a stateless packet inspection. We also intend to port this work to the NICOS operating system which was also developed in our lab [5]. V. CONCLUSIONS We learned that offloading firewall logic to a NIC has many advantages. In scenarios with a heavy incoming packet load (especially if packets need to be discarded) a firewall offloaded to a NIC significantly improves both CPU utilization as well as packet throughput. On the less likely scenarios of heavy outgoing packet traffic, offloading firewall logic to a NIC is slower than conventional firewalls. It is important to note that our implementation is based on an obsolete NIC. We expect that the performance gain will be more pronounced when utilizing an advanced NIC. Although current NICs hardware 1016

is continuously improving, the host CPU speed will likely continue to be faster than NIC hardware. In order to further improve the sending flow performance, a mixed paradigm can be used. In this model, the processing of outgoing packet is performed at the host while the incoming packets are processed in the NIC. We will further study this kind of solution in future research. REFERENCES [1] 3com embedded firewall. http://www.3com.com/prod. [2] Checkpoint. http://www.checkpoint.com/. [3] Cisco systems inc. http://www.cisco.com/. [4] Clamav. http://www.clamav.net/. [5] Network Interface Card Operating System (NICOS). Homepage at http://www.cs.huji.ac.il/ wyaron/. [6] PCISIG industry organization, PCI specification. http://www.pcisig.com/specifications. [7] Snort. http://www.snort.org/. [8] E. S. Al-Shaer and H. H. Hamed. Discovery of policy anomalies in distributed firewalls. In INFOCOM, 2004. [9] A. Currid. TCP offload to the rescue. Queue, 2(3):58 65, 2004. [10] L. Journal. Building a firewall with ip chains. www2.linuxjournal.com/article/3622. [11] Y. Weinsberg. SCIRON, Secure Communication IntegRated over NIC. Homepage at http://www.cs.huji.ac.il/ wyaron/sciron.html/. [12] Y. Weinsberg. STORM, Super-fast Transport Over Replicated Machines. Homepage at http://www.cs.huji.ac.il/ wyaron/storm.html/. 1017