Algorithms for Advanced Packet Classification with Ternary CAMs

Similar documents

ECE 578 Term Paper Network Security through IP packet Filtering

Scalable Prefix Matching for Internet Packet Forwarding

DRAFT Gigabit network intrusion detection systems

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Principle and Implementation of. Protocol Oblivious Forwarding

Open Flow Controller and Switch Datasheet

A Fast Pattern-Matching Algorithm for Network Intrusion Detection System

IP Address Structure

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

FINAL ASSESSMENT/EXAMINATION JULY 2015 PLEASE READ ALL INSTRUCTIONS CAREFULLY BEFORE YOU BEGIN THIS EXAMINATION

EMIST Network Traffic Digesting (NTD) Tool Manual (Version I)

Transport Layer Protocols

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Flow Monitoring With Cisco Routers

Wireshark Developer and User Conference

Software Defined Networking

Symantec Event Collector for Cisco NetFlow version 3.7 Quick Reference

Network Traffic Monitoring an architecture using associative processing.

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Application Note. Stateful Firewall, IPS or IDS Load- Balancing

Compiling PCRE to FPGA for Accelerating SNORT IDS

Enhancing High-Speed Telecommunications Networks with FEC

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Memory Basics. SRAM/DRAM Basics

Programmable Networking with Open vswitch

Stateful Inspection Firewall Session Table Processing

(Refer Slide Time: 02:17)

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

UNITE: Uniform hardware-based Network Intrusion detection Engine

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

MIDeA: A Multi-Parallel Intrusion Detection Architecture

Software Defined Networks

SDN. WHITE PAPER Intel Ethernet Switch FM6000 Series - Software Defined Networking. Recep Ozdag Intel Corporation

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Network Measurement. Why Measure the Network? Types of Measurement. Traffic Measurement. Packet Monitoring. Monitoring a LAN Link. ScienLfic discovery

Authenticated encryption

Lecture 8. IP Fundamentals

How to Build a Massively Scalable Next-Generation Firewall

CPU Organisation and Operation

An Algorithm for Performing Routing Lookups in Hardware

CCNA Tutorial Series SUBNETTING

IP address lookup for Internet routers using cache routing table

Configurable String Matching Hardware for Speeding up Intrusion Detection. Monther Aldwairi*, Thomas Conte, Paul Franzon

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

SDN and Streamlining the Plumbing. Nick McKeown Stanford University

Index Terms Domain name, Firewall, Packet, Phishing, URL.

CS268 Exam Solutions. 1) End-to-End (20 pts)

Firewall Compressor: An Algorithm for Minimizing Firewall Policies

Hardware and Software

Midterm. Name: Andrew user id:

NetFlow & BGP multi-path: quo vadis?

A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet traffic between two intrusion detection systems

Limitations of Packet Measurement

AsicBoost A Speedup for Bitcoin Mining

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Architecture of distributed network processors: specifics of application in information security systems

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

We Are HERE! Subne\ng

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Network Layer: Network Layer and IP Protocol

Fast Analytics on Big Data with H20

How to Make the Client IP Address Available to the Back-end Server

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Cisco CCNP Optimizing Converged Cisco Networks (ONT)

A Comparison of Ruleset Feature Independent Packet Classification Engines on FPGA

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

EFFICIENT DATA STRUCTURES FOR LOCAL INCONSISTENCY DETECTION IN FIREWALL ACL UPDATES

Subnetting Study Guide

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

Securing EtherNet/IP Using DPI Firewall Technology

NetFlow & BGP multi-path: quo vadis?

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Outline. The Problem BGP/Routing Information. Netflow/Traffic Information. Conclusions

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

QUEUES. Primitive Queue operations. enqueue (q, x): inserts item x at the rear of the queue q

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

Muse Server Sizing. 18 June Document Version Muse

TCP/IP Cheat Sheet. A Free Study Guide by Boson Software, LLC

The Load Balancing System Design of Service Based on IXP2400 Yi Shijun 1, a, Jing Xiaoping 1,b

How do I get to

OPERATING SYSTEM - VIRTUAL MEMORY

Scalable High-Speed Prefix Matching

Cisco Networking Academy CCNP Multilayer Switching

Network Based Intrusion Detection Using Honey pot Deception

Network Intrusion Simulation Using OPNET

Plaxton routing. Systems. (Pastry, Tapestry and Kademlia) Pastry: Routing Basics. Pastry: Topology. Pastry: Routing Basics /3

Transcription:

Algorithms for Advanced Packet Classification with Ternary CAMs Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary (Cypress Semiconductor)

Packet Processing Environment Rule: acl-id src-addr src-port dst-addr dst-port proto (e.g. acl1231 128.32.0.0/8 0-1023 32.12.1.1/16 1024 tcp) Hdr Payload Search Key Rule Action ACL Database permit/deny, update counter Packet matches a set of rules based on the header Examples: routers, intrusion detection systems

Packet Processing Environment Rule: acl-id src-addr src-port dst-addr dst-port proto (e.g. acl1231 128.32.0.0/8 0-1023 32.12.1.1/16 1024 tcp) Hdr Payload Search Key Rule Action ACL Database How are the rules stored? permit/deny, update counter TCAMs gaining widespread deployment 6 million TCAM devices deployed Used in multi-gigabit systems that have O(10,000) rules

Ternary Content Addressable Memory RAM: input = address, output = value CAM: input = value, output = address

Ternary Content Addressable Memory Memory device with fixed-width arrays Each bit is 0, 1 or x (don t care) Search is performed against all entries in parallel and the first result is returned Search key 011101xx001100x10 width = W bits TCAM 00100x1x001110x0x 01110xxx001100xxx 1111101x1101000xx row 1 row 2 row n Output is 2 width = W bits

Ternary Content Addressable Memory Benefits: Deterministic Search Throughput single cycle search irrespective of search key TCAM Search key 011101xx001100x10 width = W bits 00100x1x001110x0x 01110xxx001100xxx 1111101x1101000xx row 1 row 2 row n Output is 2 width = W bits

Problems Range Representation Problem Multimatch Classification Problem No modifications to TCAMs and simple Easy to deploy

Problems Range Representation Problem Multimatch Classification Problem

Range Representation Problem (Recall that rules contain prefixes and ranges) Representing prefixes in ternary is trivial IP address prefixes present in rules e.g. 128.32.136.0/24 would contain 8 x s at the end Representing arbitrary ranges is not easy though port fields might contain ranges e.g. some security applications may allow ports 1024-65535 only Problem Statement: Given a range R, find the minimum number of ternary entries to represent R

Why is efficient range representation an important problem? Number of range rules has increased over time

Why is efficient range representation an important problem? Number of unique ranges have increased over time

Earlier Approaches I Prefix expansion of ranges: express ranges as a union of prefixes have a separate TCAM entry for each prefix Example: the range [3,12] over a 4-bit field would expand to: 0011 (3), 01xx (4-7), 10xx (8-11) and 1100 (12) expansion: the number of entries a rule expands to Worst-case expansion for a W-bit field is 2W-2 example: [1,14] would expand to 0001, 001x, 01xx, 10xx, 110x, 1110 16-bit port field expands to 30 entries

Why is efficient range representation an important problem? Two range fields multiplicative effect

Earlier Approaches II Database-dependent encoding: observation: TCAM array has some unused bits use these additional bits to encode commonly occurring ranges in the database TCAMs with IP ACLs have ~ 36 extra bits 144-bit wide TCAMs 104-bits + 4-bits typically used for IP ACL rules

Earlier Approaches II Database-dependent encoding: observation: TCAM array has some unused bits use these additional bits to encode commonly occurring ranges in the database Example: Address Port 12.123.0.0/16 20-24 32.12.13.0/24 1024-128.0.0.0/8 20-24 Set extra bit to 1 Set extra bit to x Set extra bit to 1 If search key falls in 20-24, set extra bit to 1, else set it to 0

Earlier Approaches II Database-dependent encoding: observation: TCAM array has some unused bits use these additional bits to encode commonly occurring ranges in the database Improved version: Region-based Range Encoding Disadvantages: database dependent incremental update is hard

Database-Independent Range Pre- Encoding (DIRPE) Key insight: use additional bits in a database independent way wider representation of ranges reduce expansion in the worst-case

DIRPE: Fence Encoding Fence encoding (W-bit field) total of 2 W -1 bits Encoding(0) = 0000000 Encoding(2) = 0000011 Encoding(4) = 0001111 Encoding[2,4] = 000xx11 Using 2 W -1 bits, fence encoding achieves an expansion of 1 Theorem: For achieving a worst-case row expansion of 1 for a W-bit range, 2 W -1 bits are necessary

DIRPE: Using the Available Extra Bits Two extremes: no extra bits worst case expansion is 2W 2 2 W W 1 extra bits worst case expansion is 1 Is there something in between? appropriate worst-case based on number of extra bits available

DIRPE: Splitting the Range Field Procedure: split W-bit field into multiple chunks encode each chunk using fence encoding combine the chunks to form ternary entries k 0 bits k 1 bits W bits k 2 bits Combining chunks: analogous to multi-bit tries

Unibit view of DIRPE (Prefix expansion) W=3, split into three 1-bit chunks; Range=[1,6] Each level can contribute to at most 2 prefixes (but for the top level) [0-7] x x x [0-3] [4-7] 0xx 1xx [0-1] [2-3] [4-5] [6-7] 00x 01x 10x 11x 000 001 010 011 100 101 110 111

Multi-bit view of DIRPE Width of each encoded chunk = 2 3-1 = 7 bits 0-7 0-7 0-7 0-0 0-7 0-7 [16,47] 0-0 1-1 0-7 0-0 2-5 0-7 0-0 6-6 0-7 000 00xxx11 xxxxxxx [11,15] [48,54] 0-0 1-1 3-7 0-0 1-1 0-6 000 0000001 xxxx111 000 0111111 0xxxxxx 9-bit field (W=9) 3 chunks, 3 bits wide Range = [11,54] = [013, 066] Worst case expansion = 2W/k 1 Number of extra bits needed = (2 k -1)W/k - W

Comparison of Expansion Worst-case expansion Real-life expansion

Metric Extra bits Worst-case capacity degradation Cost of an incremental update Prefix Expansion Region-based Encoding (with r regions) 0 F(log 2 r + 2n-1 r DIRPE (with k-bit chunks) ) F( W(2k -1) k - W) DIRPE + Region-based F( (2k -1) log 2 r k 2n-1 + ) r (2W-2) F (2log 2 r) F ( 2W ) F k - 1 ( 2log 2 r ) F k W O(W F ) O(N) O(( ) F ) O(N) k Overhead on the packet processor None Pre-computed table of size: O((log 2 r+ 2n-1 ) F.2 W ) r ( or ) O(nF) comparators of width W bits O( W.2 k ) k logic gates Both pieces of logic from previous two columns

DIRPE: Summary Database independent Scales well for large databases Good incremental update properties Additional bits needed Small logic needed for modifying search key Does not affect throughput

Problems Range Expansion Problem Multimatch Classification Problem

Multimatch Classification Problem TCAM search primitive: return first matching entry for a key Multimatch requirement: return k matches (or all matches) for a key security applications where all signatures that match this packet need to be found accounting applications where counters have to be updated for all matching entries

Earlier Approaches Entry Invalidation scheme: maintain state of multimatch using an additional bit in TCAM called valid bit Search key 011101xx001100x10 1 valid bit TCAM array 00100x1x001110x0x 01110xxx001100xxx 1111101x1101000xx x 0x match x valid bit

Earlier Approaches Entry Invalidation scheme: maintain state of multimatch using an additional bit in TCAM called valid bit Disadvantage: ill-suited for multi-threaded environments

Earlier Approaches Geometric intersection scheme: construct geometric intersection (crossproducts) of the fields and place in TCAM pre-processing step is expensive search is fast Disadvantage: does not scale well in capacity for router dataset: expansion of 25 100

Multimatch Using Discriminators (MUD) Observation: after index j is matched, the ACL has to be searched for all indices >j Basic idea: store a discriminator field with each row that encodes the index of the row to search rows with index >j, the search key is expanded to prefixes that correspond to >j multiple searches are then issued

MUD: Example TCAM array Search key 011101xx00 xxxx rule 0 rule 1 rule 2 0000 0001 0010 match discriminator discriminator field

MUD: Example TCAM array Search key 011101xx00 001x 01xx 1xxx rule 0 rule 1 rule 2 0000 0001 0010 match discriminator discriminator field

Metric Entry Invalidation Geometric Intersection-based MUD Multi-threading support Worst-case TCAM entries for N rules Cycles for k multi-matches No Yes Yes N 7k O(N F ) Update cost O(N F ) O(N) O(N) k N 1 + d + (d-1)(k-2) with DIRPE: 1 + d(k-1) r Extra bits 0 0 without DIRPE: d with DIRPE: log 2 (d/r) + (d-r) + (2 r -1) Overhead on the packet processor Small state machine logic; can be implemented using a few hundred gates or a few microcode instructions None Small state machine logic; can be implemented using a few hundred gates or a few microcode instructions

MUD: Summary No per-search state in TCAM suitable for multi-threaded environments Incremental updates fast Scales well to large databases Additional bits needed Extra search cycles Can still support Gbps speeds

Conclusion Range expansion problem: DIRPE, a database independent range encoding scales to large number of ranges good incremental update properties Multimatch classification problem: MUD suitable for multithreaded environments scales to large databases No change to TCAM hardware and simple easy to deploy