Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze
Background Internet growth Bandwidth Size Complexity Classful Classless Inter-Domain Routing (CIDR) IP version 6 Demand for QoS
Routers Hop-by-hop TTL, checksum update Forwarding decision (Fair) queueing Input Interfaces Routing Protocol Forwarding Engine(s) Output Interfaces Switching Fabric IP Processing
Motivation Higher Link Speeds Higher Data Throughput Fair Queueing Faster Forwarding Decision? Packet Classification for QoS?
Overview Current Routing Techniques Routing Database Patricia Tries Faster Forwarding Binary Search on Prefix Lengths Build and Update Fast Hashing Analysis Conclusions
Routing Database Information spread through Routing Protocols Per-network or per default Old (pre-cidr): Hash tables 3 prefix lengths (class A, B, C: 8, 16, 24 bits) Length determined from address CIDR Arbitrary prefix length Best matching prefix (BMP) Also for IPv6 128.252.0.0/16 = 1000000 11111100 * 129.132.66.64/26 = 1000001 10000100 01000010 01*
Patricia Tries Binary trie Entries vs. plain nodes 0 1 Example: 110011
Faster Forwarding Alternatives to Patricia Multi-level tries Binary search on prefixes Hardware Content Addressable Memories (CAMs) Hardware Patricia search Protocol solutions Label Switching ATM Caching
Overview Current Routing Techniques Binary Search on Prefix Lengths Basic Scheme Refinements Build and Update Fast Hashing Analysis Conclusions
Fast Searching: Basic Idea One hash table per prefix length Result: Linear search of hash tables Entries 1* 11* Increasing Prefix Length 1000* 10001* 100011* 111* 1000111* 1110111*
Binary Search on Hash Tables Binary search needs less than/greater than comparison Example: 1110111 Search Order 1 Entries 1* 2 11* Increasing Prefix Length 4 3 5 1000* 10001* 111* 6 100011* 7 1000110* 1000111* 1110111* Result: More information needed
Marker Placement Simple approach: At each level above Better approach: Only at levels that will be traversed Result: Less than O(log 2 AddressBits) markers per prefix Reality: Much less Search Order 1 Entries 1* Prefix!= entry 2 11* Increasing Prefix Length 4 3 5 1000* 10001* 111* 1110* 6 100011* 111011* 7 1000110* 1000111* 1110111*
Misleading Markers Markers may require backtracking Example: 1110110 Search Order 1 Entries 1* 2 11* Increasing Prefix Length 4 3 5 1000* 10001* 111* 1110* 6 100011* 111011* 7 1000110* 1000111* 1110111* Fix: Precomputation, store BMP in markers
Asymmetric Binary Search Backbone routers: Non-uniform prefix length distribution 30000 Improve average search time 25000 Frequency 20000 15000 10000 5000 AADS MaeEast MaeWest PAIX PacBell MaeEast 1996 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Prefix Length
Specializing Further Prefix lengths region-dependent? Improve after each successful step Compactly encode remaining lengths Bitmap vs. search tree vs. rope
Mutating Binary Search Improve search tree after each match Rope Search Only keep track of the skeleton of the tree Reduces search time Reduces markers Ropes point at prefix lengths, not at entries! Increasing Prefix Length Default Search Order 4 2 1 1000* Entries (with Possible Followups) 10001* 1* 111* 100011* 1110* 11* x 111011* 1000110* 1000111* 1110111*
Overview Current Routing Techniques Binary Search on Prefix Lengths Build and Update Build for Binary Search Build for Rope Search Updating Markers BMP entry Search Tree Restructuring Hash Collisions Fast Hashing Analysis Conclusions
Build for Binary Search Insert prefix into appropriate hash table Walk binary search tree backwards placing markers Search Order 1 Entries 1* 2 11* Increasing Prefix Length 4 3 5 1000* 10001* 111* 1110* 6 100011* 111011* 7 1000110* 1000111* 1110111*
Build for Rope Search Bottom-up merging Root Aggregate mini-tries Individual mini-tries Start End Processed sub-trie Mini-tries
Updating Markers BMP entry Updating can be O(N) Solution: Group into N partitions Significantly improve update times (40,000 200) Search cost: One memory access Generalize to higher roots
Search Tree Restructuring Adding a prefix with new length Only a single rope change 2 4 6 5 2 4 6 1 3 7 1 3 5 7 2 1 3 4 5 6 7 2 1 3 4 5 6 7
Overview Current Routing Techniques Binary Search on Prefix Lengths Build and Update Fast Hashing (Dynamic) Perfect Hashing Too expensive for lookup Limiting Collisions Causal Collision Resolution Analysis Conclusions
Limiting Collisions Hash into buckets (>1 entry) Bucket size: Up to cache line size Sparse array 11 10 9 MaeEast, Mult MaeEast, CRC PacBell, Mult PacBell, CRC Max Collisions 8 7 6 5 4 3 20000 40000 60000 Hash Table Size 80000 100000
Full Bucket Count Observation: Very few buckets require worst case size Full Hash Buckets 20 MaeEast, Mult 15 10 5 0 20000 40000 60000 80000 100000 Hash Table Size
Causal Collision Resolution Goal: Move entries into hash bucket with space How: Split it into two entries Expand Contract 1 1* 1* 2 10* 11* 10* 11* 3
Overview Current Routing Techniques Binary Search on Prefix Lengths Build and Update Fast Hashing Analysis Lookup Speed for IPv4 Projections for IPv6 2-D Packet Classification Conclusions
Lookup Speed for IPv4 Percentage of Prefixes Found 100 90 80 70 60 50 40 30 20 10 0 MaeEast 1996 MaeEast MaeWest PacBell AADS Paix S1 S2 S4 S3 Number of Search Steps Routing Databases
Projections for IPv6 4x longer addresses More networks and nodes Hope that backbone routers will be able to use small routing tables suboptimal routing Hierarchy boundaries at more prefix lengths Policy routing will still force ISPs to have bigger routing tables For our approach, only 2 memory lookups more
2-D Packet Classification Source and destination prefixes Winding paths of increasing specificity O(W log W) Sparse Matrizes Prefix Length y x
Conclusions Fast, space efficient, and scalable lookup algorithm New class of search algorithms Fast update No need for hardware, yet cheap hardware possible No need to proliferate protocol changes
Extensions Extend to two-/multi-dimensional packet classification Preliminary results available Other uses Flexible memory management Access control lists Substring searching (databases)
Future Work Light-weight protocols for secure group communication Secure distributed storage Distributed key storage Protocols for bandwidth fairness enforcement