High-Performance Hardware Monitors to Protect Network Processors from Data Plane Attacks

High-Perormnce Hrdwre Monitors to Protect Network Processors rom Dt Plne Attcks Hrikrishnn Chndrikkutty, Deepk Unnikrishnn, Russell Tessier nd Tilmn Wol Deprtment o Electricl nd Computer Engineering University o Msschusetts, Amherst, MA, USA {chndrikkut,unnikrishnn,tessier,wol}@ecs.umss.edu ABSTRACT The Internet represents n essentil communiction inrstructure tht needs to be protected rom mlicious ttcks. Modern network routers re typiclly implemented using embedded multi-core network processors tht re inherently vulnerble to ttck. Hrdwre monitor subsystems, which cn veriy the behvior o router s pcket processing system t runtime, cn be used to identiy nd respond to n ever-chnging rnge o ttcks. While hrdwre monitors hve primrily been described in the context o generlpurpose computing, our work ocuses on two importnt spects tht re relevnt to the embedded networking domin: We present the design nd prototype implementtion o high-perormnce monitor tht cn trck ech processor instruction with low memory overhed. Additionlly, our monitor is cpble o deending ginst ttcks on processors with Hrvrd rchitecture, the dominnt contemporry network processor orgniztion. We demonstrte tht our monitor rchitecture provides no network slowdown in the bsence o n ttck nd provides the cpbility to drop ttck pckets without otherwise ecting regulr network tric when n ttck occurs. 1. INTRODUCTION The Internet is criticl inrstructure component in tody s society. Mny spects o personl communiction, business trnsctions, entertinment, digitl government, etc. rely on the vilbility nd correct opertion o the Internet. In this work, we ocus on the security o pcket processing unctions tht re necessry to hndle pcket orwrding on network router. The pcket processing component o modern router is typiclly implemented using network processor (NP) system. A network processor hs multiple simple embedded processor cores tht cn be progrmmed to hndle network tric. When chnges re necessry, the sotwre on this network processor cn be chnged to dpt the opertion o the router. Unlike trditionl routers tht hve been bsed on ppliction-speciic Permission to mke digitl or hrd copies o ll or prt o this work or personl or clssroom use is grnted without ee provided tht copies re not mde or distributed or proit or commercil dvntge nd tht copies ber this notice nd the ull cittion on the irst pge. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciic permission nd/or ee. DAC 13, My 9 June 0, 013, Austin, Texs, USA. Copyright 013 ACM 98-1-503-01-9/13/05 $15.00. integrted circuit (ASIC) technology, network-processorbsed routers cn be dpted in their unctionlity. However, this lexibility rises n interesting security problem: routers tht use sotwre-progrmmble pcket processors re vulnerble to ttcks [5]. Vulnerble pcket processing code cn be exploited using mlormed dt pckets to lunch n in-network denil-o-service ttck. This type o ttck on contemporry NPs is ddressed by this pper. To reduce the vulnerbility o pcket processors in routers (s well s embedded processors in generl), digitl circuits tht monitor run-time processor opertion hve been proposed [3, 5, 13]. These hrdwre monitors use inormtion bout correct processor sotwre execution to trck the instructions executed by processor core. I n ttck on the processor occurs, devition rom expected, progrmmed behvior is detected nd recovery process is initited. Compred to sotwre-bsed protection mechnisms (e.g., virus scnners), hrdwre-bsed monitors require less perormnce overhed nd rect ster. In the networking domin, low overhed nd st detection speed re prticulrly importnt. Thereore, there hve been ongoing eorts to urther improve NP protection mechnisms to better support the pplictions supported by NPs nd the processor orgniztion typiclly exhibited by NPs. In this pper, we ddress two key problems tht hve not been previously ddressed in protecting network processors rom sotwre ttcks. An eective NP monitoring system must veriy every instruction tht is executed by the processor nd thus needs to operte t very high speeds. This instruction-bsed monitoring opertion cn be viewed s inite utomton with ixed number o cceptble pths. Prior work in hrdwre monitoring [5, 13] hs been bsed on non-deterministic inite utomton (NFA) implementtions, which potentilly require high memory bndwidth when trcking multiple prllel sttes, or corse veriiction t the level o bsic blocks [3], which my not detect ttcks beore they re executed. For our irst contribution, we present n instruction-level monitoring solution or NPs bsed on deterministic inite utomton (DFA) implementtion, which overcomes the shortcomings o both prior techniques. Prior work in embedded NP security hs ssumed von Neumnn processor rchitecture with combined instruction nd dt memory, where n ttck cn execute code rom the processor stck [5]. Most network processors, however, re bsed on Hrvrd rchitectures tht seprte instruction nd dt memory mking stck-bsed code execution impossible. Thereore, it is questionble whether dt

plne ttcks re even possible in networks. For our second contribution, we present n ttck exmple tht demonstrtes the existence o Hrvrd rchitecture ttcks nd we show tht our monitoring system is eective in deending NPs ginst them. The speciic contributions o our pper re: 1. Design o high-perormnce hrdwre monitoring system or NPs: Our pipelined design cn perorm instruction veriiction with single memory red per instruction nd thus cn operte t speeds suicient to mintin line rte networking dt trnser.. Algorithm or construction o deterministic monitoring grph: We present method to convert the monitoring grph o NP instructions, which initilly is non-deterministic due to control-low chnges (e.g. brnches), into deterministic utomton. The representtion o the DFA is compcted to llow or highly eicient implementtion in the hrdwre monitor. 3. Demonstrtion o n ttck on nd deense o Hrvrd rchitecture network processor: We demonstrte n in-network ttck through the dt plne o the network tht exploits n integer overlow vulnerbility to smsh the processor stck nd lunch returnto-librry ttck. This ttck propgtes the ttck pcket nd crshes the processor system. We lso show tht our hrdwre monitor is eective in deending ginst this ttck nd llowing or continued NPbsed router opertion ter ttck identiiction nd recovery. The reminder o the pper presents these contributions in detil.. RELATED WORK Progrmmbility in the pcket processing systems o routers hs been used incresingly widely over the pst decde. Most mjor router vendors employ network processors in their products (e.g., Cisco QuntumFlow [6], Cvium Octeon []). While the progrmmbility o these devices is hidden rom network users, it is used by vendors to extend system unctionlity. It cn be expected tht routers will continue to hve progrmmble pcket processing components, especilly with network virtuliztion [] emerging s promising technology or the uture Internet. While network security s whole hs received much recent ttention (e.g., end-system vulnerbilities leding to botnets [9], worm propgtion [1], etc.), there hs been little ocus on vulnerbilities in the networking inrstructure itsel. Cui et l. [] hve surveyed vulnerbilities in the control plne o networks, where n ttcker cn potentilly gin ccess to the router system. In the dt plne, Chski et l. [5] hve shown n exmple o how simple integer overlow vulnerbility cn be exploited to lunch denilo-service rom within the network. In this cse, single mlormed User Dtgrm Protocol (UDP) pcket triggers the vulnerbility, chnges the network processor s opertion, nd cuses lood o ttck pckets to be sent by the system. We dpt this ttck exmple to processor system bsed on Hrvrd rchitecture in our work to demonstrte the eectiveness o our monitoring system in detecting nd stopping such dt plne ttcks in prcticl networking environment. Tble 1: Comprison o Monitoring Approches. Monitor grnulrity implemen- underlying ttion rchitecture Chski et l. [5] instruction NFA von Neumnn Aror et l. [3] bsic block DFA von Neumnn This pper instruction DFA Hrvrd Protection mechnisms or embedded processors hve been proposed bsed on hrdwre monitors in generl [1,3,5,10,13,15,16]. These monitors dier by the level o monitoring grnulrity (unction clls, bsic blocks, individul instructions) nd i they require chnges to source code or i they re bsed on progrm binries. We only ocus on pproches tht do not require chnges to the processor binries. The min novelty o the monitoring system we present in this pper is highlighted in Tble 1. In contrst to relted work, we cn monitor t the level o individul instructionsnddosousingdfa,whichcnbeimplemented with high perormnce. 3. HARDWARE MONITOR SYSTEM ARCHITECTURE The system rchitecture o the network processor system with security monitor is shown in Figure 1. The network processorshownontheletotheigureisbsedonconventionl Hrvrd rchitecture with seprte dt memory or network pckets nd processing stte nd instruction memory or pcket processing code. For simplicity, only single processor core is shown; the system cn esily be extended or multiple processor cores. The processing monitor on the right side o the igure veriies the opertion o the processor instruction-by-instruction. For every instruction tht is executed on the processor core, hsh vlue o the executed opertion is reported to the monitor. The monitor uses the comprison logic to compre the reported hsh vlue to the inormtion tht is stored in the monitoring grph. The monitoring grph is derived by oline nlysis o the pcket processing code binry. Any ttck on the system necessrily needs to chnge the opertion o the processor core (otherwise the ttck is not eective). This devition leds to the processor reporting hsh vlues tht do not mtch with the monitoring grph. The comprison logic cn detect this devition nd reset the processor in response. In networking, such reset nd recovery opertion is very simple: The current pcket is dropped (i.e., the pcket buer is clered), the processing stte is reset (i.e., the stck is reset), nd processing continues with the next pcket. Since most pcket processing opertions re not stteul nd there is no gurntee tht pckets re relibly delivered, no urther recovery ctions re necessry. The monitoring grph used by the hrdwre monitor is stte mchine, where ech stte represents speciic processor instruction. The stte mchine is derived rom the pcket processing code s illustrted in Figure. Ech processor instruction corresponds to stte. The edges between sttes re lbeled with inormtion relting to next vlid instruction tht cn be executed ter the current instruction. In cse o control low opertions, there my be multiple outgoing edges rom ech stte (ech being vlid trnsition). In our system, we use 3-bit processor (i.e., open source embedded Plsm processor bsed on the MIPS instruction

oline nlysis runtime opertion network processor processing code binry processing code instruction memory network processor core dt memory pcket buer network interce hsh o processing instruction reset/ recovery NFA monitoring grph DFA monitoring grph mon. grph mon. memory comprison logic NFA-to-DFA trnsormtion Figure 1: System rchitecture o network processor with security monitor. [ ] 9c: 9c0010 lhu v0,16(s8) 0: 00000000 nop : c0033 sltiu v0,v0,51 8: 10000 bnez v0,d c: 00000000 nop b0: 3c06666 lui v0,0x6666 b: 330191 ori v1,v0,0x191 b8: 9c0010 lhu v0,16(s8) [ ] hrdwre monitor 0 9c 0 11 10 8 10 c 6 b0 3 b d b8 Figure : Stte mchine genertion rom processing binry. set). The monitoring system uses -bit hsh o the next instruction to lbel edges in the monitoring grph (s hs been recommended in [13]). A hsh (insted o the ull 3-bit instruction) is used to reduce the size o the monitoring grph nd thus to reduce the implementtion overhed o the hrdwre monitor while still llowing instruction-by-instruction monitoring. The use o hsh (or ny other method tht uses mny-to-one mpping), however, leds to two undmentl problems: Attck detection mbiguity: The mny-to-one mpping tht occurs in hsh unction o the monitor my mke it possible or n ttcker to remin undetected. This would require tht the ttck perorms opertions tht led to sequence o hsh vlues tht mtches the monitoring inormtion o vlid code. Mo et l. hve shown tht this probbility decreses geometriclly with the length o the ttck code nd thus is unlikely to led to prcticl ttcks [13] (in prticulr when the hsh unction is not known to the ttcker). We do not consider this issue urther in this pper. Nondeterminism during monitoring: The mny-to-one mpping lso leds to nondeterminism in the monitoring grph. There my be control low instruction where ech o the next instructions hs the sme hsh vlue. As result, the corresponding node in the monitoring grph hs two outgoing edges with the sme hsh vlue (s illustrted in Figure 3). Since this nondeterminism cn continue or multiple such control low opertions, it cn led to complex implementtions [5], potentilly slowing monitor perormnce. In the ollowing section, we show how we cn ddress the ltter problem by converting the nondeterministic monitoring grph into deterministic monitoring grph, which is esier to use in high-perormnce implementtions.. DETERMINISTIC PROCESSOR MONI- TORING To relize deterministic instruction-level monitor, we irst convert the NFA monitoring grph described in the previous section to DFA monitoring grph. We then describe how to implement monitoring system tht uses this DFA grph..1 Construction o Deterministic Monitoring Grph Trcking nondeterministic inite utomt is diicult to implement in prctice since the utomton cn hve multiple ctive sttes. This leds to high bndwidth requirements between the monitoring logic nd the memory tht mintins the NFA since next-stte inormtion or ll ctive sttes hs to be etched in ech itertion. When using DFA, in contrst, only one stte is ctive nd implementtion becomes much esier. To convert n NFA to DFA, stndrd powerset construction lgorithm cn be used [11]. This lgorithm computes ll possible stte sets in which the utomton cn be situted (i.e., the powerset). Bsed on the powerset, DFA is then constructed. Figure shows the DFA tht corresponds to the NFA shown in Figure 3. Note tht stte{3,5} representsthesetsosttestowheresttecnbrnchwhen hsh vlue c is observed. One potentil problem with NFA-to-DFA conversions is tht the number o sttes in the DFA cn grow exponentilly over the number o sttes in the NFA. However, the monitoring NFAs constructed rom binry code do not exhibit this pthologicl behvior. Our experiments indicte tht this increse is smll nd does not led to drsticlly lrger stte mchines (see Section ). Thus, this pproch is eective or creting deterministic hrdwre monitors.. Implementtion o Monitoring System A key chllenge in the implementtion o our hrdwre monitoring system is how to represent the monitoring DFA in memory. The comprison logic needs to be ble to retrieve the inormtion bout next stte trnsitions or every instruction tht it trcks. Thus, stte trnsitions need to be implemented with no more thn one memory ccess per instruction (to keep up with the network processor core) nd be s compct s possible (to minimize the implementtion overhed o the monitor). The inormtion tht needs to be stored in the monitoring memory is illustrted on the let side o Figure 5. Ech stte represents n instruction nd n outgoing trnsition edge rom this stte represents the hsh vlue o the next expected instruction in the execution sequence. For exmple, stte c hs two next sttes, d nd e, with hsh vlues 11 nd 3, respectively.

1 3 5 b c d e c Figure 3: Nondeterministic monitoring grph. 6 9 9 grouping b d 0 g b d 0 g group 1 11 1 11 5 group c 3 e h c 3 e h group 3 Figure 5: Grouping o DFA sttes. 1 b c {3,5} d Figure : Deterministic monitoring grph ter NFA-to-DFA conversion. A nïve wy to store the stte mchine in RAM would be to store ech stte nd ll its possible edge trnsitions. This would require h entries per stte or n h-bit hsh. Since most sttes hve only one or two outgoing edges, lrge number o edge trnsitions would never be used, leding to ineicient memory use. Assuming tht only two outgoing trnsitions exist or ech stte is lso not esible due to the cses where powerset construction cretes sttes with up to h outgoing edges. Finlly, or perormnce resons we should only use one memory ccess per stte trnsition, which precludes design where sttes with more thn two outgoing edges re hndled s specil cses. Our min ide to compctly represent DFA sttes with vrying numbers o outgoing edges is to encode ll the necessry inormtion in single tble entry nd to group sttes by the number o outgoing edges. The min chllenge in chieving compctness is to llocte exctly the mount o memory tht is needed or ech stte to store next stte inormtion while still being ble to index this memory without degrding to liner serch. In our representtion, we group sttes together i they hve the sme previous stte. A stte belongs to group g i the previous stte hs g outgoing edges. For monitor with -bit hsh vlue, there re 16 possible groups. For exmple, in Figure 5 on the right side, groups re shown with dierent colors. Note tht stte cn belong to multiple groups (e.g., stte belongs to group (becuse hs two outgoing edges, one to b nd one to ) nd to group 3 (becuse e hs three outgoing edges)). The memory lyout nd bsic opertion o our DFA monitor system is shown in Figure 6. The memory contins tuples o {number o next sttes, oset in stte group, vlid hsh vlues on outgoing edges} nd is logiclly divided into groups. The bse ddresses or ech group re stored in registerilewith16entries. Withingroup, thesetsosttes tht shre the previous stte re grouped together (e.g., b nd re together nd d nd e re together). Within set, sttes re ordered by the hsh vlue on their incoming edge (e.g., e beore d becuse hsh vlue 3 is smller thn hsh vlue 11). The hsh comprison block perorms two unctions: it determines i the one-hot coded hsh bit is set in the 16-bit vlue red rom memory nd it determines k, which is the position o the mtching hsh vlue mong the vlid hsh vlues red rom memory. To illustrte the opertion o the monitor, we describe n exmpletrnsition. Assumethemonitorisinsttendthe processor reports n instruction tht leds to hsh vlue o. To perorm the trnsition, the memory row lbeled is red. The tuple in this row indictes tht there re two outgoing edges. The vlid hsh vlues o these two edges e 5 6 reset/ recovery processor instruction 3 1 hsh comprison one-hot encoding 16 16 -bit hsh unction 3 group 1 group group 3 group 16 k (position o mtching hsh mong vlid hsh vlues) 0x0000 0x000 0x0006 group bse ddress dd -1 c b e d h g mult number o next sttes oset in stte group 0 vlid hsh vlues on outgoing edges 0000 0000 1000 0100 1 0000 1000 0000 1000 1 1 0000 0000 1000 0000 1 0 0000 0010 0000 0000 3 0 0000 0000 0010 0101 1 0 0000 0010 0000 0000 stte mchine memory 16 group 1 group group 3 Figure 6: Memory representtion o DFA monitoring grph. re stored in the 16-bit vector. To veriy tht the trnsition is vlid, the hsh comprison unit checks i bit is set in the bit vector (which it is). I this bit is not set, then n invlid trnsition tkes plce, indicting n ttck, nd the processor is reset. Ater the check, the next stte (i.e., stte ) in the DFA needs to be ound in memory. To determine theddressothtstte, thebseddressothegroupothe next stte is looked up in the register ile (i.e., 0x000 since the next stte belongs to group ). To this bse ddress, the product o the set size (i.e., group number) nd the oset in the stte group is dded (to index the correct set within the group). Finlly, k is dded, which is the position o the mtching hsh in the bit vector (in our cse 1 since is the irst mtching hsh (i.e., k=0) nd is the second mtching hsh (i.e., k=1)). Thus the memory loction o stte is 0x000 + 0 + 1 = 0x003. Note tht ny stte trnsition tkes only one memory red rom stte mchine memory nd lookup into ixed-size register ile. The DFA is represented compctly without wsting ny memory slots (sttes shown with dots in Figure 6 point to other sttes not shown in our exmple). Thus, this representtion lends itsel to high-perormnce implementtion. 5. HARVARD ARCHITECTURE ATTACKS Even though generl memory error techniques (integer overlow, hep overlow etc.) cnnot be used to generte code injection ttcks, Frncillion et l. [8] demonstrted tht code injection ttcks re still esible on Hrvrd rchitecture processor using return-oriented progrmming technique. Here, n ttcker tkes control o return instructions in the stck to chin ttck code rom n existing librry unction. Since the code is lredy present in executble memory, the ttck will not be prevented rom running. In this section, we describe how such n ttck cn be constructed or the networking environment nd how our monitor cn detect it. Figure shows portions o congestion mngement protocol (CM) nd IPV pcket orwrding ppliction used to build n ttck on the network processor system. The

Tble : Evlution o monitoring pproches or our new DFA pproch nd previous NFA-only pproch. The mximum number o memory ccesses or our pproch is 1 or ll benchmrks. CM protocol IPV ppliction Figure : Vulnerble ppliction code. congestion mngement protocol inserts custom protocol hederinthepckethederspcebetweentheiphedernd the UDP heder. During this opertion, the code needs to mke sure the new pcket size does not exceed the mximum dtgrm length (the boxed instruction in the CM code). Exploiting n integer overlow vulnerbility, the boundry check in the CM code cn be circumvented nd the stck cn be smshed. To do so, n ttcker sends mlormed UDP pcket with size 0xe (deciml vlue 6553), which willpssthemximumpcketsizecheck(since6533+1= 10, due to integer overlow). As result, the pcket pylod is copied over the stck. The pcket pylod o the ttck pcket is crted in such wy tht the return ddress is overwritten to direct the control low to the IPv pcket orwrding ppliction (which is librry code on the processor core) nd the vlue o the ip dst low ield is 0x. The port inormtion gets updted with this vlue (the boxed instruction in the IPv code), orwrding the ttck pcket to ll the outgoing ports nd then crshing the processor system. As result, the ttck pcket gets orwrded to ll outgoing interces beore the system crshes, thus propgting the ttck through the network. Since our hrdwre monitor hs no vlid edge between the sttes in the middle o the CM ppliction nd the IPv ppliction, this ttck is detected. As soon s the control low chnges, the hsh vlues reported by the processor no longer mtch the monitoring inormtion nd the system is reset, dropping the mlicious pcket. 6. PROTOTYPE SYSTEM IMPLEMENTA- TION Although n end-system would likely be implemented in ixed logic, we hve prototyped the described network processor nd hrdwre monitoring system on Strtix IV GX30 FPGA locted on n Alter DE bord. The router inrstructure surrounding the NP core is tken rom the NetFPGA reerence router, which hs been migrted to the Strtix IV mily. The DE bord hs our 1 Gbps Ethernet interces or pcket input/output. In our prototype implementtion, the single-core network processor is implemented s sot core nd the monitor is implemented in FPGA logic (using Qurtus or synthesis, plce nd route). Only the memory initiliztion iles need to be reconigured on per-ppliction bsis. To run networking code on the processor plus monitor system, the code is irst pssed through stndrd MIPS-GCC compiler low to generte ssembly-level instructions. The output o the compiler llows or the identiiction o brnch instructions nd their trget ddresses. In our current im- Chski [5] Ours Netw. No. NFA Mx. DFA Mem. Mem. ppli- o sttes mem. sttes entries overction instr. ccess hed rg 53 53 3 59 6 9.% mtc 3 60 58 6.% red 80 80 808 85 6.8% wq 905 905 91 98 8.0% plementtion, ll possible brnch trgets nd return instructions re nlyzed t compile time. The monitor cn hndle n rbitrry number o indirect brnches to stticlly known trgets (e.g., return ddresses) since the NFA representtion llows ny number o outgoing brnches. (Our monitor cnnot hndle indirect brnches to stticlly unknown trgets tht re resolved t run time, but such progrmming constructs did not pper in ny o 11 benchmrk pplictions tht we looked t.) The NFA-to-DFA conversion strts with nondeterministic NFA representtion obtined rom the compiler inormtion. Through powerset construction, DFA is constructed. This DFA is then converted into memory initiliztion ile using the process described in Section nd is loded into the monitor when the processing binry is instlled in the processor. To evlute our system, our benchmrks rom the NPbench suite[1] were processed with this low.. EVALUATION RESULTS.1 Monitoring Grphs The results o generting instruction-level monitoring grphs or both our pproch nd previous pproch [5] re illustrted in Tble. The number o entries in the stte mchine memory (Figure 6) or ech benchmrk re shown in the Mem. entries column. A cler beneit o the new pproch is speed. In ll cses, only one ccess to the monitor memory is required or ny benchmrk (including the our shown here). The previous NFA-bsed pproch requires up to three memory ccesses or the benchmrks tested nd potentilly up to 16 or other benchmrks. The conversion rom n NFA to DFA does incur memory overhed o.% on verge or the benchmrks. Tble 3: Resource Utiliztion Resources Secure Network DE Avilble monitor proc. interce in FPGA LUTs 10 3,9 3,803 18,00 FFs 6,10 38, 18,00 Mem. bits 131,0 01,16,550,800 1,65,9

Attck Pcket Norml Pcket Norml Pcket orwrded This mteril is bsed upon work supported by the Ntionl Science Foundtion under Grnt No. 1115999. We grteully cknowledge Alter Corportion s dontion o the DE bords nd Qurtus sotwre. Attck detected & pcket dropped Figure 8: Simultion wveorms showing the identiiction o n ttck pcket nd the successul orwrding o the subsequent pcket. This behvior ws conirmed using hrdwre.. Monitoring Speed nd Eectiveness Our network processor nd monitoring system were successully implemented on the DE pltorm. The lookup tble (LUT), lip lop (FF), nd memory resources required or the network processor core, monitor, nd other interce circuitry or the router (e.g. buers, input rbiter, queuing control, etc) re shown in Tble 3. The NP memory includes spce or up to 096 monitor memory entries. All circuitry operted t 15 MHz, the sme clock speed or the system without the monitor. Experiments in simultion nd in the lb on FPGA hrdwre showed tht the processor is ble to orwrd pckets rnging in size rom 6 to 1500 bytes per pcket t the sme rte under monitoring s without monitoring (e.g. no slowdown or monitoring). For hrdwre experiments, pckets were generted nd trnsmitted to the DE with the NP nd monitor t 1 Gbps line rte by seprte DE crd serving s pcket genertor. This sme crd ws used to receive the processed pckets rom the crd with the NP. In inl experiment, we tested the bility o the monitorbsed system to detect nd recover rom n ttck. The vulnerble ppliction code shown in Figure ws implemented nd used with the NP to send copies o pcket to ll ports o the router nd then crsh the router. We conirmed this behvior or system without monitor both in simultion nd in hrdwre. As shown in Figure 8, ter the monitor ws dded to the system, the ttck pcket ws successully identiied, the NP ws reset, nd subsequent regulr pckets were routed successully. This behvior ws veriied using our DE hrdwre setup. 8. SUMMARY AND FUTURE WORK The eective use o the Internet depends on relible network routers tht re impervious to ttck. In this pper, we hve described high-perormnce monitor or network processor tht requires only single memory lookup per network processor instruction. This single memory lookup is mintined regrdless o the complexity o the NP progrm using n NFA-to-DFA trnsltion o the monitoring grph. Our monitor, which trcks individul NP instructions, hs been veriied in hrdwre using n NP with Hrvrd rchitecture. The presence o monitoring does not slow down NP opertion since it is perormed outside o the opertionl pths o the NP. In the uture, we pln to evlute our monitoring pproch using multi-core network processor. Acknowledgements 9. REFERENCES [1] Abdi, M., Budiu, M., Erlingsson, Ú., nd Ligtti, J. Control-low integrity principles, implementtions, nd pplictions. In ACM Conerence on Computer nd Communiction Security (CCS) (Alexndri, VA, Nov. 005), pp. 30 353. [] Anderson, T., Peterson, L., Shenker, S., nd Turner, J. Overcoming the Internet impsse through virtuliztion. Computer 38, (Apr. 005), 3 1. [3] Aror, D., Rvi, S., Rghunthn, A., nd Jh, N. K. Secure embedded processing through hrdwre-ssisted run-time monitoring. In Proc. o the Design, Automtion nd Test in Europe Conerence nd Exhibition (DATE 05) (Munich, Germny, Mr. 005), pp. 18 183. [] Cvium Networks. OCTEON Plus CN58XX to 16-Core MIPS6-Bsed SoCs. Mountin View, CA, 008. [5] Chski, D., nd Wol, T. Attcks nd deenses in the dt plne o networks. IEEE Trnsctions on Dependble nd Secure Computing 9, 6 (Nov. 01), 98 810. [6] Cisco Systems, Inc. The Cisco QuntumFlow Processor: Cisco s Next Genertion Network Processor. Sn Jose, CA, Feb. 008. [] Cui, A., Song, Y., Prbhu, P. V., nd Stolo, S. J. Brve new world: Pervsive insecurity o embedded network devices. In Proc. o 1th Interntionl Symposium on Recent Advnces in Intrusion Detection (RAID) (Sint-Mlo, Frnce, Sept. 009), vol. 558 o Lecture Notes in Computer Science, pp. 38 380. [8] Frncillon, A., nd Cstellucci, C. Code injection ttcks on Hrvrd-rchitecture devices. In Proc. o the 15th ACM Conerence on Computer nd Communictions Security (CSS) (Alexndri, VA, Oct. 008), pp. 15 6. [9] Geer, D. Mlicious bots threten network security. Computer 38, 1 (005), 18 0. [10] Gognit, G., Wol, T., Burleson, W., Diguet, J.-P., Bossuet, L., nd Vslin, R. Reconigurble hrdwre or high-security/high-perormnce embedded systems: the SAFES perspective. IEEE Trnsctions on Very Lrge Scle Integrtion (VLSI) Systems 16, (Feb. 008), 1 155. [11] Hopcrot, J. E., nd Ullmn, J. D. Introduction to Automt Theory, Lnguges, nd Computtion. Addison-Wesley, 199. [1] Lee, B. K., nd John, L. K. NpBench: A benchmrk suite or control plne nd dt plne pplictions or network processors. In Proc. o IEEE Interntionl Conerence on Computer Design (ICCD) (Sn Jose, CA, Oct. 003), pp. 6 33. [13] Mo, S., nd Wol, T. Hrdwre support or secure processing in embedded systems. IEEE Trnsctions on Computers 59, 6 (June 010), 8 85. [1] Moore, D., Shnnon, C., nd Brown, J. Code-Red: cse study on the spred nd victims o n Internet worm. In IMW 0: Proceedings o the nd ACM SIGCOMM Workshop on Internet mesurement (Mrseille, Frnce, Nov. 00), pp. 3 8. [15] Rgel, R. G., nd Prmeswrn, S. IMPRES: integrted monitoring or processor relibility nd security. In Proc. o the 3rd Annul Conerence on Design Automtion (DAC) (Sn Frncisco, CA, USA, July 006), pp. 50 505. [16] Zmbreno, J., Choudhry, A., Simh, R., Nrhri, B., nd Memon, N. SAFE-OPS: An pproch to embedded sotwre security. Trnsctions on Embedded Computing Sys., 1 (Feb. 005), 189 10.