IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings IPv6 Lkups using Distibuted and Lad Balanced Blm Filtes f 00Gbps Ce Rute Line Cads Hayu Sng, Fang Ha, Muali Kdialam, TV Lakshman Bell Labs, AlcatelLucent Hlmdel, NJ, 07733, USA {hayusng, fangh, mualik, lakshman}@alcatellucentcm Abstact Intenet line speeds ae expected t each 00Gbps in a few yeas T match these line ates, a single ute line cad needs t fwad me than 50 millin packets pe secnd This equies a cespnding amunt f lngest pefix match peatins Futheme, the inceased use f IPv6 equies ce utes t pefm the lngest pefix match n seveal hunded thusand pefixes vaying in length up t 64 bits It is a challenge t scale existing algithms simultaneusly in the thee dimensins f inceased thughput, table size and pefix length Recently, Blm filtebased IP lkup algithms have been ppsed While these algithms can take advantage f hadwae paallelism and fast nchip memy t achieve high pefmance, they have significant dawbacks (discussed in the pape) that impede thei use in pactice In this pape, we pesent the Distibuted and Lad Balanced Blm Filtes t addess these dawbacks We develp the pactical IP lkup algithm f use in 00Gbps line cads The egula and mdula hadwae achitectue f u scheme diectly maps t the statefat ASICs and FPGAs with easnable esuce cnsumptin Als, u scheme utpefms TCAMs n mst metics including cst, pwe dissipatin, and bad ftpint I INTRODUCTION T keep up with eveinceasing ptical tansmissin ates, Intenet ce utes need t fwad packets as fast as pssible This equies faste and faste implementatins f packetpcessing functins such as IP lkup the Lngest Pefix Matching (LPM) peatin needed t detemine the nexthp f incming packets The next hp is detemined by fist pefming a lngest pefix match f the destinatin IP addess f incming packets against a set f pefixes sted in a pefix table Once a match is fund, the nexthp infmatin assciated with the matched pefix is etieved The pefix table can typically have a few hunded thusand pefixes, with pefix lengths vaying fm 8 t 32 f IPv4 addesses F IPv6, the pefix lengths can vay fm 6 t 64 bits The challenges in implementing the lkup peatin ae in accmmdating lage table sizes, achieving the 50 millin me lkups pe secnd needed f the new 00Gbps intefaces while keeping memy, pwe cnsumptin and bad ftpint lw The numbe f IPv4 pefixes in a ce ute BGP table has exceeded 250K ecently, and is inceasing at a ate f a few tens f thusands f pefixes a yea [3] While the cuent IPv6 table is still elatively small, the envisaged lage scale deplyment f IPv6 will esult in table sizes at least as lage as that f IPv4 Highend ce utes such as Cisc s CRS [4] and Junipe s T640 [5] cuently can have 40G line cads with a packet fwading ate f abut 50Mpps Meanwhile, 00GbE standads ae expected t be cmpleted by 200 T meet evegwing taffic needs, pestandad 00GbE pducts (needing a fwading ate f 50 Mpps) ae expected t be available in the maket in abut the same time fame One pssibility f IP lkups is the use f TCAMs TCAMs suppt 250M+ lkups pe secnd Hweve, thei high pwe dissipatin and lage ftpint ae maj disadvantages T limit the veall system pwe cnsumptin, nly a few hunded watts ae typically budgeted f each line cad A TCAM chip alne easily cnsumes me than 20W f pwe F 00Gbps line cads it is cnceivable that many the necessay cmpnents such as the high speed tansceives will inevitably cnsume me pwe necessitating a cespnding eductin in pwe cnsumptin elsewhee in de t stay within the alltted pwe budget In additin, me and me functins and mdules need t fit n a line cad making ftpint an imptant metic It is desiable t avid the use f lw density chips when pssible It has lng been agued that the butefce way f seaching pefixes in TCAM esults in vey inefficient stage and uses t many bits pe pefix F example, a TCAM uses 6 tansists t ste a bit while an SRAM uses nly six IPv6 esults in a twfld wsening ve IPv4 because f the much lnge pefixes that need t be sted In additin t these disadvantages, an incemental pefix update in TCAM invlves as many memy peatins as the numbe f unique pefix lengths This causes an exta pefmance penalty f IPv6 Because f these disadvantages, TCAMs ae bette suited when the pefmance f algithmic altenatives cannt match as in the case f packet classificatin and deep packet inspectin F IP lkup, algithmic appaches can achieve high ates with cmpact stage needs Cmpact stage pemits the use f fast memies such as SRAMs ecnmically and als the effective use f nchip memy t achieve high thughputs The fist IP lkup algithms wee implemented in sftwae unning n hstbased utes with slw SDRAMs used f stage The next geneatin f chassisbased utes had dedicated line cads f packet fwading, with faste but smalle SRAMs n the line cads used t ste the pefix table As the thughput needs utpaced SRAM speeds, nchip memy began getting used as caches f the pefix table 97842443535/09/$2500 2009 IEEE

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings t facilitate faste IP lkups Cuently, a few tens f megabits f fast memy n a chip is feasible and effective use f nchip memy has pved t be vey citical in satisfying the thughput equiements f the next geneatin netwk applicatins [8] In this pape, we pesent a new lkup scheme based n a new data stuctue called Distibuted and Lad Balanced Blm Filtes (DLBBF) This scheme is scalable t vey high line speeds, can handle lage table sizes, and many pefix lengths Its hadwae achitectue can be diectly mapped t statefat ASICs and FPGAs with easnable esuce cnsumptin and can suppt the nnstp linespeed fwading needed in the next geneatin 00Gbps ce ute line cads The pape is ganized as fllws: Sectin II discusses elated wk, dawbacks f sme existing schemes and the mtivatin f u wk Sectin III descibes u new data stuctue and lkup algithm in detail The pefmance f the algithm is analyzed in Sectin IV Sme pactical hadwae implementatin issues ae discussed and an FPGA pttype is evaluated in Sectin V Cncluding emaks ae in Sectin VI II BACKGROUND A Cnventinal IP Lkup Algithms IP lkup algithms have been well studied in the past The tiebased algithms, such as LCtie [22], Lulea [7], Multibit Tie [27], Tee Bitmap [0], and Shape Shifting Tie [26], ae simple and memy efficient Hweve, thei pefmance degades linealy as the tee depth inceases and this makes them unsuitable f 00Gbps IPv6 lkups Anthe appach t fast IP lkups is the use f memy pipelines [2], [8], [9] A deep pipeline can be used t pduce ne lkup esult evey clck cycle but this appach has seveal dawbacks Althugh the pblem f imbalanced memy size f each pipeline stage has been slved ecently, the aggegated memy bandwidth needed by all the pipeline stages is vey high making it difficult t implement with cmmdity memy devices The lnge pefix lengths f IPv6 wsen this pblem and even the use f a dedicated lkup engine with embedded memy in the pipeline stages des nt educe the cmplexity sufficiently Caching ecent lkup esults using nchip memy is discussed in [6] This appach wks fine if thee is sufficient tempal lcality in the lkups Hweve, such lcality is lw in ce utes whee flws ae well intemixed and the cache hit ate is lw In additin, it is pefeable t use schemes that ae deteministic and nt subject t the vaiability in lkup times esulting fm cache misses Althugh caching using nchip memy may nt be desiable, the idea f using fast nchip memy t achieve speedup has mtivated the use f Blm filtes as cmpact datastuctues that can be used f implementing fast lkups B Blm Filte The use f Blm filtes [] in netwking applicatins has been f much ecent eseach inteests [3] Numeus vaiatins f the basic Blm filte have been ppsed f diffeent applicatins [4], [5], [] The basic Blm filte is a memyefficient data stuctue that stes a signatue f an item using just a few bits, egadless f the size f the item itself Given a set f n items and an mbit aay, each item sets up t k bits in the aay using k independent hashes t index int the aay Due t hash cllisins, a bit can be set by multiple items When queying the membeship f an item, the item is fist hashed using the same set f hash functins and then the k bits t which the hash values pint ae examined If any bit is ze, ne can be cetain that the item is nt a membe If all the k bits ae ne, the item can be claimed t be a membe with a small false psitive pbability The false psitive pbability p f is a functin f m, n, and k and can be cmputed as belw: p f =( e kn/m ) k () Blm filtes have the advantage that they can epesents a set f items vey cmpactly and hence making them amenable f nchip stage Hweve, seveal issues need t be cnsideed in making Blm filtes viable f fast IP lkups Fist, t minimize false psitive pbabilities, the numbe f hash functins needed by the Blm filte can be lage (specifically, k = m ln 2/n is the ptimal value) If the Blm filte is implemented using a singlept memy blck, ne Blm filte lkup takes as many memy accesses as the numbe f hash functins This can make the achievable thughput fa belw what is desied Multipt memies can eslve this issue by pemitting multiple simultaneus accesses t memy An Npt memy leads t N speedup ve a singlept memy Hweve, seveal pactical cnstaints limit the numbe f memypts Multiple pts incease the pin cunt, the pwe cnsumptin, and the ftpint f the memy mdule Theefe, N cannt be lage Althugh memies with 3 me pts ae pssible, the mst pactical ptin is a 2pt memy This pt cnstaint has t be accunted f in the design f fast Blmfilte based lkup schemes Secnd, a gd univesal hash functin is cmputatinally intensive and can lwe thughput Ideally, the hash functin must be cmputable in ne clck cycle with lw lgic needs and withut the use f pipelines that can intduce latency Fast cmputatin f a lage numbe f gd, independent hash functins is a challenge Thid, the pssibility f false psitives equies a secnd veificatin step This secnd step is als needed t access the nexthp and the assciated infmatin since the Blm filte itself cannt diectly ste this infmatin This secnd step must als be pefmed in an efficient manne t avid pefmance bttlenecks that might ffset the gains fm use f the Blm filte We addess all these issues in u ppsed Blmfilte based methd f fast IP lkups

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings Fig Using Blm Filtes f IP Lkup C IP Lkup using Blm Filtes The idea f using Blm filtes f IP lkup was fist ppsed in [8] We efe t this scheme as PBF in the est f the pape PBF essentially is a hashbased algithm IP lkup using hash tables has been ppsed befe [2], [24], but Blm filtes use nchip memy much bette and explit the intinsic hadwae paallelism Figue shws the basic achitectue f PBF Fist, IP pefixes ae patitined int gups based n thei pefix lengths Next, each gup is assigned a Blm filte that stes the pefixes in that gup F IP lkup, the Blm filtes f all gups ae checked in paallel A subset f the Blm filtes may ept a match This culd be due t finding matching pefixes in the Blm filtes due t false psitives T veify that thee is an actual match, we stat with the Blm filte cespnding t the lngest pefix (since we ae inteested in finding the lngest matching pefix) and check the match using an ffchip pefix table If a tue match is fund, we etieve the necessay nexthp infmatin Othewise, we cntinue the check using the next lngest match indicated by the Blm filtes Typically, we expect nly ne ffchip lkup using a hash table t find the eal match since the false psitive pbabilities ae lw by design This scheme is simple and can have gd aveage case pefmance Hweve, as we discuss belw, it has seveal dawbacks that makes pactical use difficult One issue is that when multiple Blm filtes indicate false psitives, the ffchip pefix table has t be seached multiple times (equal t the numbe f pefix lengths in the wst case) Althugh this may happen vey infequently, the esulting vaiability in fwading ate culd be an issue since wstcase pefmance is an imptant metic in ute design One methd f impving the wstcase pefmance hee is t educe the numbe f pefixlength gups This equies aggegating pefixes f diffeent lengths int a single Blm filte s that the numbe f gups ( Blm filtes) is educed This aggegatin can be dne by using pefix expansin [27] The tadeff is that the impvement in the wstcase pefmance cmes at the cst f highe memy use because pefix expansin can significantly incease the size f the pefix table [27] Expeiments shw a geate than fivefld table size expansin when the numbe f pefixlength gups is educed t 3 (with theshlds f 20, 24, and 32) f a mdeate sized IPv4 table with abut 00K pefixes [8] The expansin is wse f lage tables In additin, pefix expansin makes uting updates, which happen faily fequently in ce utes, much me timecnsuming and cmplex Multiple expanded pefixes need t be mdified when nly a single iginal pefix is inseted deleted Hence, the algithm is nt well suited f lage tables and lnge pefixes Anthe issue is that the numbe f pefixes in each pefixlength gup is highly vaiable and changes dynamically with uting updates A ecent snapsht f the IPv4 BGP table cntains abut 34K pefixes f the lagest gup (pefix length f 24) but nly 9 f the smallest gup (pefix length f 9) Meanwhile, the aveage pefix length keeps shifting between 2 and 24 The pecentage f pefixes in the lagest gup f length 24 can vay fm 40% t 70% [3] T educe the false psitive ate and t best utilize scace nchip memy, we need t custmize the size f each Blm filte as well as the numbe f hash functins based n the numbe f pefixes in that gup We als need t be able t adapt t the cuent pefix distibutin by adjusting the memy allcatin dynamically Engineeing such a system is difficult and expensive it equies vepvisining f memy the capability t ecnfigue hadwae Neithe f this is feasible since nchip memy is scace and ecnfiguing FPGAs in pactice takes seveal secnds Als, fwading functins ae ften hadcded in ASICs which cannt be ecnfigued A thid issue is that necycle lkups assume that the Blm filtes ae implemented using kpt memy whee k is the numbe f hash functins As discussed ealie, this is impactical unless k is 2 3 In the next sectin, we ppse a new data stuctue, Distibuted and Lad Balanced Blm Filtes (DLBBF), that we use t addess all these issues Ou ppsed lkup scheme is wellsuited f hadwae implementatin and can be used in 00Gbps line cads pefming IPv6 lkups III DLBBF FOR IP LOOKUPS A Basic Achitectue While the ttal numbe f pefixes is elatively stable, thei distibutin n length is highly dynamic Ovepvisining multiple vaiablesized Blm filtes esults in inefficient memy use It is desiable t use just ne ptimized Blm filte t ste all the n pefixes Als, if we use just ne Blm filte we d nt have the pefix expansin pblem n the pblem f managing memy allcatin f Blm filtes that vay widely in the numbe f elements sted (fm a few pefixes t seveal hundeds f thusands) Figue 2(a) illustates the scheme using a single Blm filte (SBF) Even thugh a single Blm filte is used, each pefix gup has its wn set f hash functins A nice featue is that

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings B l D is t ib u te d m F ilte B l m F ilte B l m F ilte 2 D is tib u te d B l m F ilte k D is tib u te d (a) (b) Fig 2 Distibuted and Lad Balanced Blm Filtes all pefix gups need the same numbe f hash functins, k, that is ptimal f the SBF Let thee be x i pefixes in pefix gup i Each pefix is hashed k times t set at mst k bits in the Blm filte Oveall, at mst Σ g i= x ik = nk bits can be set Althugh this SBF achitectue has many desiable ppeties it has a seius dawback in that it des nt pemit paallel seaches and a sequential seach is needed n each pefix length In the wst case, up t g sequential seaches ae needed if g distinct pefixlength gups ae pesent T vecme this dawback while etaining the advantages f SBF, we develp the DLBBF achitectue As shwn in Figue 2(b), we patitin the Blm filte int k equal sized distibuted Blm filtes We name each patitin a distibuted Blm filte even thugh each patitin by itself is n lnge a eal Blm filte in the classical sense We futhe egup the g hash functin gups, each with k hash functins, t k new hash functin gups, each with g hash functins This is dne as descibed belw: the ith hash functin f each iginal gup fms the ith new gup, whee i anges fm tk Each gup f g hash functins is assciated with a distibuted Blm filte Nw in any new hash functin gup, each hash functin maps t a diffeent pefix length This means that in the pefix stage pcess each pefix will set exactly ne bit in the cespnding distibuted Blm filte and s at mst n bits can be set Since we have k distibuted Blm filtes, kn bits can be set veall This is the same numbe f bits as the SBF case We call the new achitectue DLBBF This is because we have t cnside all the distibuted Blm filtes as a whle t check pefix membeship and because all the distibuted Blm filtes have the same lad This lad balancing is imptant f a egula and mdula implementatin In Sectin IV, we will fmally pve that the false psitive pbability f DLBBF is identical t the SBF as well as t that f [8] in the ideal cnfiguatin The new data stuctue allws IP lkups t be pefmed in paallel n all the DLBBFs Each DLBBF seach utputs a gbit vect If bit i is set, it indicates a pefix match ( pssibly a false psitive) in pefixlength gup i When we pefm a bitwise AND peatin n all the kbit vects, the esulting bit vect indicates the eal match with a vey high pbability just as in the SBF case B Futhe Impvements Since all the DLBBFs can be seached in paallel, the system thughput is detemined by the lkup ate f a single DLBBF, which in tun is detemined by the hash functin calculatin speed and the numbe f memy pts, as discussed in Sectin II We defe the hash functin issue t Sectin V Hee, We fcus n the DLBBF access speed pblem assuming that a singlecycle hashfunctin cmputatin is pssible Assume that nly pt SRAM blcks ae available, we futhe patitin a DLBBF int t patial DLBBFs with each being implemented in ne pt SRAM blck T ste pefixes in the DLBBF, each value geneated by a hash functin cmpises tw pats: the SRAM blck ID and the bucket addess in the SRAM blck By ding this, the lad f each patial DLBBF is balanced statistically, accepting abut n/t hashes The shaded aea in Figue 2 is enlaged in Figue 3 t shw the details F IP lkups, the g hash values in each DLBBF is sent t the schedule fist The task f the schedule is t maximize the numbe f SRAM blcks and pts used in ne cycle The cllect is espnsible f eganizing the SRAM utputs and geneating the gbit matching vect Ideally, if each SRAM blck eceives n me than access equests, the seach n each DLBBF can be dne in just ne clck cycle Hweve, in the wstcase when all the g accesses t a DLBBF happen t be cncentated t the same SRAM blck, g/ cycles ae needed We analyze the aveage numbe f accesses needed t finish ne DLB BF lkup in Sectin IV and shw that this scheme indeed

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings S c h e d u le Fig 3 S R A M # S R A M # 2 S R A M # t p t b it p t b it p t b it Patitin DLBBF f Speedup significantly impves the lkup pefmance Nte that the functinality f this mdule actually mimics a g t switch schedule with the speedup f, whee each input pt has nly ne equest t an utput pt Implementing such a switch schedule is nt tivial We will intduce in Sectin V a simplified scheduling scheme which can still pvide satisfacty Blm filte lkup accuacy while being easy t implement C Ad Hc Pefix Expansin One maj agument against the use f Blm filtes f IP lkups is its p wstcase pefmance when packets cme at the highest pssible ate and all the Blm filtes shw false psitives Althugh this is highly unlikely, we still have t addess this issue t cmply with the pefmance equiements impsed n utes The eal cncen is that nce a packet ( me specifically, an addess pefix) happens t hit a bad case (ie multiple false psitives ae geneated), all subsequent packets having the same addess pefix ae als pblematic unless a lnge pefix hits a eal match These packets can slw dwn the packet lkup ate and might eventually cause packet dps Next, we pesent a scheme that can educe cnsecutive false psitives egadless f the packet aival patten Ou design assumes a cetain magin f tleatin f a few false psitives F example, assume a 400MHz clck ate, the lkup budget is 400M/50M = 27 cycles f the maximum packet ate that can be seen n a 00GbE pt, which means we have 7 cycles pe packet t deal with the Blm filte false psitives in the wst case If a paticula packet esults in many false psitives, we need t pevent subsequent packets belnging t the same flw fm causing false psitives, since the fwading ate can slw dwn if these packets ae cnsecutive We use an adhc pefix expansin scheme f this When a packet causes seveal false psitives and the lngest false psitive match is f length k, we extact the kbit pefix f the packet s IP addess and inset it int the ffchip pefix table alng C le c t with the next hp infmatin F example, let us say we nly allw tw false psitive matches, but a packet with addess 92684 esults in thee false matches and the fist (als the lngest) false match happens at length f 24 T cpe with this bad case, we inset a new expanded pefix 92680/24 int the pefix table This new pefix is assciated with the same next hp infmatin as the eal matching pefix Any subsequent packets fm the same flw will then be guaanteed t find the cect next hp infmatin in just ne Blm filte access This scheme has thee advantages: () Unlike the iginal pefix expansin scheme which is pedetemined and can expnentially incease the table size, u pefix expansin scheme is invked dynamically, geneates nly ne new pefix, and is used nly when abslutely necessay (2) Me imptantly, the nchip Blm filtes emain intact We nly need t inset ne new pefix in the ffchip pefix table The new pefix des nt change the Blm filte lad n affect its false psitive pbability (3) The actual false psitive ate bseved is nt a functin f the aived packets but a functin f the unique flws (ie unique destinatin IP addesses) When n new flw is seen, thee ae n me false psitives The entie expansin scheme is managed by the system sftwae An expanded pefix can be evked at any time if it is n lnge necessay The scheme significantly educes the veall table size, simplifies the wk lad f incemental updates, and suppts faste packet lkups When the numbe f expanded pefixes becmes t lage (afte the system has been peatinal f a lng time) and causes pefmance degadatin in the ffchip table lkup, epgamming the DLBBFs using the cuent set f pefixes (excluding thse expanded nes) will help eset the state Hweve, due t the small Blm filte false psitive pbability it is unlikely that this will need t be dne anyway D Offchip Pefix Table Optimizatin Afte the nchip DLBBF is seached, the ffchip pefix table als needs t be seached t veify the match and t fetch the next hp infmatin The ffchip pefix table is typically ganized as a hash table In PBF, it is assumed that the hash table lkup takes just ne memy access [8] This is nt always tue due t hash cllisins Unbunded hash cllisins can cause seius pefmance degadatin We can take advantage f high memy bandwidths f SRAMS t alleviate this pblem We aange multiple cllided ndes int ne hash bucket such that they can be etieved with ne memy access and aviding tavesal f linked lists when hash cllisin happens Cuently, 500+ MHz QDRIII SRAMs which suppt 72 bit ead and wite peatins pe clck cycle ae available A bust ead access using tw clck cycles can etieve 44 bits This is sufficient t pack thee IPv4 pefixes tw IPv6 pefixes plus the next hps With a 44bit bucket size, a 72 Mbit memy can cntain 500K buckets which can hld 5 millin IPv4 pefixes ne millin IPv6 pefixes

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings A key pblem is aviding bucket veflw entiely keep its ccuence t a minimum The Fast Hash Table [25] and Peacck Hash [20] wee designed t impve hash table pefmance T simplify design, we adpt the scheme in [2] whee each pefix is hashed using tw hash functins and the pefix is sted in the less laded bucket Cnsequently, a pefix lkup needs t access the hash table tw times using tw hash functins All pefixes sted in the tw accessed buckets need t be cmpaed t find the match Althugh each pefix has tw chices and each bucket can ste 2 3 pefixes, bucket veflw can still happen Hweve, analysis and simulatin shw that the veflws ae extemely ae Given u cnfiguatin, when we inset 250K IPv6 pefixes in the table, nly 3 pefixes cause veflw These veflw pefixes can be put in a small nchip CAM with a handful f enties F 250K IPv4 pefixes, a 36Mbit memy can achieve the same pefmance Since each lkup needs t access memy tw times and each memy access takes tw clck cycles, a 500MHz SRAM can suppt 25M lkups pe secnd This is a little sht f the wstcase 50Mpps lkup ate equied f 00GbE line cads We can get aund this pblem in tw ways: () We can use faste SRAM devices A 600MHz SRAM device can satisfy the wstcase equiement (2) We can use tw 36 8Mbit SRAM devices in paallel, with each addessed by a hash functin This scheme pvides 250M lkups pe secnd which is way beynd the wstcase equiement and leaving me than 67% f memy bandwidth t deal with the DLBBF false psitive matches Nte that this scheme dubles the memy bandwidth but des nt incease the memy size E Nnstp Fwading Changing netwk cnditins cnfiguatin changes cause the uting infmatin t be updated The cntl pcess, which uns the uting ptcls, cmputes the new pefix, nexthp infmatin and updates the fwading table data stuctue n line cads accdingly The updates must happen withut inteupting the packet fwading causing packet misuting The update pcedue f u scheme is simple: We fist inset delete the pefix t be updated fm the ffchip hash table We then mdify the nchip DLBBFs This can guaantee the efee updates F a pefix update, thee is at mst ne memy access t each DLBBF, and all the memy accesses can be cnducted in paallel S the impact t the system thughput is minimized The ffchip hash table is sted in QDR SRAM, whee a sepaate witing pt is dedicated f table updates The mdificatin t the nchip DLBBF f pefix inseting and deleting needs t use the ffline mi Cunting Blm Filtes, just as descibed in [8] IV PERFORMANCE ANALYSIS All pevius IP lkup slutins based n TCAM, Tie, and pipeline achitectues ae sensitive t the lngest pefix length Theefe, when used f IPv6, they suffe a pefmance degadatin f up t tw times that f the IPv4 case The stage is als significantly inceased A key featue f u algithm is that it is insensitive t bth the pefix length as well as t the numbe f unique pefix lengths Only pefix table size is imptant Hence u algithm wks equally well f bth IPv6 and IPv4 and is wellsuited t be used f IPv6 lkups in ce utes Nw, we pve that u scheme is equivalent t PBF in its ideal cnfiguatin The pf is split int tw pats Theem : The SBF is identical t the DLBBF in tems f the false psitive pbability Pf: In SBF, althugh pefixes with diffeent lengths use diffeent sets f hash functins, each pefix is hashed k times S f any pefix lkup, the false psitive pbability is exactly the same as that shwn in Equatin : SBF pf = ( e kn/m ) k, whee n is the ttal numbe f pefixes and m is the ttal numbe f Blm filte buckets In DLBBF thee ae k distibuted Blm filtes Each pefix is hashed just nce, in each distibuted Blm filte, by using ne f the g hash functins Theefe, f a pefix lkup, a bit is set in a distibuted Blm filte with the pbability f e n/m, whee m is the size f the distibuted Blm filte Since thee ae k independent distibuted Blm filtes, the false psitive pbability f DLBBF is DLBBF pf = ( e n/m ) k Since m = m/k, we get SBF pf = DLBBF pf Similaly, we can pve the fllwing theem: Theem 2: The DLBBF is identical t the PBF in its ideal cnfiguatin in tems f the false psitive pbability and the numbe f hash functins used Pf: In the PBF algithm, assume thee ae g unique pefix lengths, each with n i pefixes, and each pefix length is assigned a Blm filte with m i buckets and k i hash functins It was ppsed t assign m i pptinal t the pefix distibutin, that is m i /n i = m/n while Σ g i= n i = n and Σ g i= m i = m The false psitive pbability f each Blm filte is p fi =( e kini/mi ) ki =( e kin/m ) ki The ptimal value f k i is equal f all the Blm filtes, which is m ln 2/n In ttal gm ln 2/n hash functins need t be implemented We have shwn that the SBF achieves the same false psitive pbability The ptimal numbe f hash functins f each pefix length is als m ln 2/n, s the ttal numbe f hash functins is gm ln 2/n t Since DLBBF has the same numbe f hash functins and the same false psitive pbability as SBF, the theem is pved Althugh at the fist glance u algithm seems t have the same esuce cnsumptin and lkup false psitive pbability as the PBF algithm in its ideal cnfiguatin, in pactice the pefmance is much bette since we avid lage pefix expansin and dynamic esuce allcatin Als, the mdula achitectue f DLBBF geatly simplifies the actual implementatin We als analyze the pefmance f patitined DLBBF implementatin using multiple 2pt memy blcks Figue 4

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings pbability E+00 E02 E04 E06 E08 E0 2 3 4 5 IPv6, x32, E=2 IPv6, x64, E=04 IPv4, x32, E=04 IPv4, x64, E=0 # memy access Fig 4 Distibutin f the Numbe f Sequential Blck Memy Accesses f Diffeent Scenais shws the pbability distibutin f the diffeent numbe f sequential accesses t a same memy blck Each plt shws the esult f a cmbinatin f pefix type (IPv6 IPv4) and numbe f memy blcks (32 64) It als shws the expected numbe f accesses (E) f each case In geneal, we expect slightly me than ne clck cycle t finish the DLB BF lkup The esults ae in line with u expectatin In the next Sectin, we will shw hw a simplified schedule can be used that esults in just ne clck cycle being used t finish the DLBBF lkup A Cmpaisn with TCAMs The cst pe bit f TCAM is abut 5 geate than that f SRAM, and TCAM cnsumes me than 50 as much pwe as SRAM des f each access [9] Ou algithm uses abut 320 SRAM bits (bth nchip and ffchip) pe IPv6 pefix The equivalent TCAMbased slutin uses 72 TCAM bits plus 8 SRAM bits Hence, cmpaed t the TCAM slutin, the DLBBF algithm has me than 3 cst advantage and pwe advantage Because f the lw cell density and the heat dissipatin equiement f TCAMs, the dimensin f a typical TCAM chip is 3mm 3mm [7] while a QDR SRAM chip s size is as small as 3mm 5mm [6] The TCAMbased slutin equies ne TCAM chip plus ne SRAM chip, and u algithm equies ne tw SRAM chips Hence, the ftpint f the TCAMbased slutin is at least 3 lage than that f the DLBBF algithm V IMPLEMENTATION CONSIDERATIONS A Refeence Design A efeence design uses 8Mbit nchip SRAMs f implementing 6 DLBBFs Each DLBBF is futhe ealized with 32 64 2pt SRAM blcks Thee ae 52 024 blcks in ttal Each blck is 6 8Kb in size, cnfigued as bit aay This design is feasible in FPGA devices available in 2008 F example, Altea s Statix IV FPGA includes me than 22Mb embedded memy including 280 9Kbit mdules One tw 9Kbit mdules can be cmbined t fm ne memy blck With this cnfiguatin, each pefix is hashed 6 times and we need 6 48 = 768 hash functins in place f IPv6 and 6 24 = 384 hash functins f IPv4 F 250K pefixes, we achieve a false psitive pbability as lw as 33 0 7 When the adhc pefix expansin is applied, this basically means that thee ae less than 4 expanded pefixes f evey 0 millin flws B Hadwae Hash Functins A Blm filte equies k independent hash functins t be cmputed f each lkup The pefmance citically depends n the pefmance f the hash functin cmputatin A family f hash functins H 3 studied in [23] is shwn t be suitable f fast hadwae implementatin with pefmance clse t the theetical bund Hweve, the lgic needs can be lage If the input key has bits and the hash table has s buckets, the hash functin needs t egiste lg 2 s bits and pefm lgic AND and XOR peatin n them When is lage, we need t beak the lgic peatins int multiple pipeline stages t impve the cicuit timing This will intduce exta lg 2 s flipflps pe pipeline stage Ou achitectue equies the use hundeds f hash functins (ie geneate hundeds f independent hash values f each key) Using existing hash functin t implement the Blm filtes equies t much esuces Hence, we develp an aea efficient hash scheme that can pduce n hash values using just O(lg n) seed hash functins f the same hash key The hash peatins use nly simple lgic peatins and ae fast enugh f the lkup applicatin Given a key λ, we can geneate n hash values H,, H n by using nly m seed univesal hash functins S,, S m,as if n unique hash functins ae used, whee { lg2 n + n =2 m = k,k N lg 2 n n 2 k,k N and each S i geneates an addess between 0 and t, whee t is pwe f 2 (ie the hash esult can be epesented as a bitvect with lg 2 t bits) The cnstuctin f H i, whee i [,n], isasfllws: Since f any i, we have a unique epesentatin f i = m 2 m + m 2 m 2 + + 2 2+ i {0, } We let H i =( m S m ) ( m S m ) ( S ) whee is bitwise XOR peatin H i has the exactly same addess space as S i The fllwing is an example that uses thee seed hash functins t pduce seven new hash values H = S H 2 = S 2 H 3 = S 2 S H 4 = S 3 H 5 = S 3 S H 6 = S 3 S 2 H 7 = S 3 S 2 S

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings Pecentage Fig 5 00% 90% 80% 70% 60% 50% 40% 30% 20% 0% 0% = =0 =2 >2 0 28 256 384 52 640 768 896 024 # items Hash Table Bucket Lad with 64K Buckets and 256 Hash Functins Nte that we can chse any fast hash functin as seed functins In this pape we use H 3 hash functins f the implementatin and simulatins The celatin pefmance f these new hash values cannt be analyzed theetically Theefe, we use simulatins t cmpae thei pefmance with the theetical esults Fist, we test each new hash functin t see if it is univesal and andm by inseting a numbe f keys int the hash table and measue the hash cllisins and the aveage lad f nnempty buckets Specifically, we implement a cunting Blm filte with diffeent numbe f hash functins and inset sme numbe f items int the filte We then tack the pecentage f buckets with diffeent lads We cmpae this numbe with the theetical value calculated fm andm insetins Figue 5 shws ne f the esults Nte that the theetical values match the expeimental values s well that the cespnding cuves ae almst nt distinguishable fm each the Secnd, we build a Blm filte with these hash functins t see if these hash functins ae independent Each key is 64 bit lng and the Blm filte has m =64K buckets We vay the numbe f keys n t be pgammed int the Blm filte Table I summaizes the Blm filte false psitive ate with diffeent m/n ati and diffeent numbe f hash functins (k) The simulatin esults meet the theetical values vey well In u efeence design, we can use just five seed hash functins t geneate a gup f 6 hash functins f each pefix length, which accunt f 69% hadwae esuce savings veall C DLBBF Memy Pt Scheduling Mapping the 24 48 DLBBF ead equests t diffeent memy pts tuns ut t be the mst esuce cnsuming pat f the design Since we have nly 2pt memy, when me than tw ead equests ae t the same memy blck, we need me than ne clck cycle t schedule these equests This can significantly incease the design cmplexity and negatively impact the system thughput We, instead, use nly ne clck cycle t schedule the equests When me than tw equests ae f a memy blck, nly the tw equests f the tp tw lngest pefixes ae ganted The emaining equests ae simply discaded Hweve, we assume they all find a match, pssibly false, and as thugh they have all been ganted memy accesses The cespnding bits in the bit vect ae thus diectly set t ne withut actually lking up the DLBBF Recall that we have 6 DLBBFs wking in paallel, and each pefix length geneates ne ead equest in each DLB BF using independent hash functins Even when a equest f a pefix length is discaded in ne DLBBF, the equests in the the DLBBFs ae likely t be ganted S the effect is that f a pefix length, a educed numbe f hash functins ae used t seach the Blm filte Althugh the false psitive ate is then nt as gd as that when all the hash functins ae used, we tade this ff f a smalle and faste implementatin In additin, the simplified schedule shws sme pefeence t the lnge pefixes S geneally the lnge the pefix, the me the hash functins that ae applied and in tun the bette false psitive pbability that is achieved (Nte that the equests f the tp tw pefix lengths, 32 and 30, ae always ganted, accding t u scheduling stategy) This aangement cmplies with the seach de f the last step when multiple matches ae fund in the DLBBFs D Pttype and Simulatin We pttype the design using the Altea Statix III FPGA EP3SL340 The design is aimed at 40G line cad with an IPv4 lkup ate f 60Mpps Each ne f the 6 DLBBFs include 32 8Kbit 2pt memy blcks The design uses 50% f the lgic esuce and 25% f the blck memy esuce The synthesized cicuit can un at 50MHz clck ate A 36 Mbit 250MHz QDRII SRAM is used t ste the ffchip hash table Each table bucket has 44 bits, sting up t thee pefixes and the cespnding next hps As discussed befe, each pefix is sted in ne f tw candidate buckets f lad balancing A snapsht f IPv4 BGP Table, which cntains abut 80K pefixes and 24 unique pefix lengths, is used t test u design We cmbine the pefix value and its length as the key t the ffchip hash table, and find n veflw at all in the hash table T test the lkup pefmance, we fist use a synthesized packet tace which cntains the same numbe f packets as pefixes and matches each pefix exactly nce Due t u simplified memy pt scheduling algithm, the bseved false psitive ate is 83 0 5, which is slightly highe than the theetical value 3 0 5 Hweve, afte the 5 pefixes that causes the false psitives ae sted in the ffchip hash table as expanded pefixes, f the same set f flws, n me false psitives happen again In the secnd expeiment, we lk up abut 80 millin unique IP addesses and find less than 5K false psitive ccuences The expanded pefixes cause nly a small pefix set inflatin f 26% Finally, we test the algithm with a eal Intenet packet tace F this case, we bseved 38 false psitive ccuences and a false psitive ate f 4 0 7 This means f seven

This full text pape was pee eviewed at the diectin f IEEE Cmmunicatins Sciety subject matte expets f publicatin in the IEEE INFOCOM 2009 pceedings k=2 k=4 k=8 k=6 m/n simulatin they simulatin they simulatin they simulatin they 2 40e 400e 560e 559e 858e 863e 995e 995e 4 53e 55e 60e 60e 36e 32e 749e 744e 8 49e2 489e2 244e2 240e2 253e2 255e2 962e2 976e2 6 37e2 38e2 233e3 238e3 540e4 574e4 64e4 650e4 32 366e3 367e3 88e4 9e4 600e6 573e6 340e7 330e7 TABLE I BLOOM FILTER FALSE POSITIVE RATE WITH FAST HASH FUNCTIONS millin flws, nly ne can cause false psitive with its fist packet In all f u tests, we bseved at mst ne false psitive f a pefix We cnducted each f the abve expeiments multiple times, each with a diffeent hash functin cnfiguatin (ie vaying the andm numbe set used t implement the seed H 3 hash functins) The esults wee all cnsistent VI CONCLUSIONS With the cntinued gwth in Intenet taffic, diven by the suge in mediaich taffic, deplyment f 00Gbps line cads in ce utes is vey likely as sn as standadizatin effts ae cmpleted Meve, Intenet uting tables cntinue t gw and IPv6 use is gwing as well It has als becme inceasingly imptant t minimize the pwe cnsumptin f line cads t the maximum extent pssible These facts lead us eexamine algithmic appaches t IP lkup Algithmic appaches can explit the inceasing amunts f nchip memy using new datastuctues and can cnsideably impve upn the high pwecnsumptin f TCAMs Hweve, it is still a challenge t devise lkup schemes usable f 00Gbps line cads suppting IPv6 and lage uting tables in an ecnmical, pweefficient manne We make the fllwing fu maj cntibutins t addess this challenge: () We design a nvel data stuctue, DLBBF, based n Blm filtes which is suitable f nchip stage and makes pssible fast pefixlength independent IPv4 and IPv6 lkups at 00Gbps line speed The ppsed scheme addesses seveal maj dawbacks f peviusly ppsed schemes including the need t use lage multipt memies (2) We avid the lage a pii pefix expansins that ae needed t educe the numbe f distinct pefix lengths and instead use an adhc pefix expansin methd that efficiently cpes with the Blm Filte false psitive pblem withut any changes t the nchip Blm filte itself (3) We design an aeaefficient algithm f cmputing a lage numbe f fast and high pefmance hash functins in hadwae These ae ideal f implementing Blm filtes (4) We design a simple and efficient memypt scheduling scheme that allws us t use pevalent 2pt memy blcks f implementing DLB BFs withut much degadatin in attainable false psitive pbabilities The cmpaisn shws that u algithm significantly utpefms the TCAMbased slutins in pwe cnsumptin, ftpint, and cst REFERENCES [] B Blm Space/Time Tadeffs in Hash Cding With Allwable Es Cmmunicatins f the ACM, July 970 [2] A Bde and M Mitzenmache Using Multiple Hash Functins t Impve IP Lkups In IEEE INFOCOM, 200 [3] A Bde and M Mitzenmache Netwk Applicatins f Blm Filtes: A Suvey In Intenet Mathematics, 2005 [4] J Buck, J Ga, and A Jiang Weighted Blm Filte In IEEE ISIT, 2006 [5] B Chazelle, J Kilian, R Rubinfeld, and A Tal The Blmie Filte: An Efficient Data Stuctue f Static Suppt Lkup Tables In The Fifteenth Annual ACMSIAM Sympsium n Discete Algithms, 2004 [6] I Chvets and M MacGeg Multizne Caches f Acceleating IP Ruting Table Lkups In Pceedings f HighPefmance Switching and Ruting, 2002 [7] M Degemak, A Bdnik, S Calssn, and S Pink Small Fwading Tables f Fast Ruting Lkups In ACM SIGCOMM, 997 [8] S Dhamapuika, P Kishnamuthy, and D Tayl Lngest Pefix Matching using Blm Filtes In ACM SIGCOMM, 2003 [9] S Dhamapuika, H Sng, J Tune, and J Lckwd Fast Packet Classificatin Using Blm Filtes In ACM ANCS, 2006 [0] W Eathetn, G Vaghese, and Z Dittia Tee Bitmap: hadwae/sftwae IP Lkups with Incemental Updates ACM SIGCOMM Cmpute Cmmunicatin Review, 2004 [] L Fan, P Ca, J Almeida, and A Bde Summay Cache: A Scalable Wideaea Web Cache Shaing Ptcl IEEE/ACM Tansactins n Netwking, Ma 2000 [2] J Hasan and T N Vijaykuma Dynamic Pipelining: Making IP Lkup Tuly Scalable In ACM SIGCOMM, 2005 [3] http://bgpptanet BGP Ruting Table Analysis Repts 2008 [4] http://wwwcisccm/en/us/pducts Cisc CRS 2007 [5] http://wwwjunipenet/pducts/tseies Junipe Netwks Tseies Ruting Platfms 2007 [6] http://wwwnecelcm NEC Electnics [7] http://wwwnetlgicmiccm NetLgic [8] W Jiang and V K Pasanna Beynd TCAMs: An SRAMbased Multi Pipeline Achitectue f Teabit IP Lkup In IEEE INFOCOM, 2008 [9] S Kuma, M Becchi, P Cwley, and J S Tune CAMP: Fast and Efficient IP Lkup Achitectue In ACM/IEEE ANCS, 2006 [20] S Kuma, J Tune, and P Cwley Peacck Hash: Fast and Updatable Hashing f High Pefmance Packet Pcessing Algithms In IEEE INFOCOM, 2008 [2] J T M Waldvgel, G Vaghese and B Plattne Scalable High Speed IP Ruting Lkups In ACM SIGCOMM, 997 [22] S Nilssn and G Kalssn IP Addess Lkup using LCTies IEEE Junal n Selected Aeas in Cmmunicatins, June 999 [23] M Ramakishna, E Fu, and E Bahcekapili A Pefmance Study f Hashing Functins f Hadwae Applicatins In Pc 6th Int l Cnf Cmputing and Infmatin, 994 [24] R Sangieddy, N Futamua, S Aluu, and A K Smani Scalable, Memy Efficient, HighSpeed IP Lkup Algithms IEEE/ACM Tansactins n Netwking, Aug 2005 [25] H Sng, S Dhamapuika, J S Tune, and J W Lckwd Fast Hash Table Lkup Using Extended Blm Filte: an Aid t Netwk Pcessing In ACM SIGCOMM, 2005 [26] H Sng, J Tune, and J Lckwd Shape Shifting Ties f Faste IP Lkup In IEEE ICNP, 2005 [27] V Sinivasan and G Vaghese Faste IP Lkups Using Cntlled Pefix Expansin In ACM SIGMETRICS, 998