Curing Regular Expressions Matching Algorithms from Insomnia, Amnesia, and Acalculia

Similar documents
Last time Interprocedural analysis Dimensions of precision (flow- and context-sensitivity) Flow-Sensitive Pointer Analysis

Uses for Binary Trees -- Binary Search Trees

Change Your History How Can Soccer Knowledge Improve Your Business Processes?

Reading. Minimum Spanning Trees. Outline. A File Sharing Problem. A Kevin Bacon Problem. Spanning Trees. Section 9.6

Schedule C. Notice in terms of Rule 5(10) of the Capital Gains Rules, 1993

Higher. Exponentials and Logarithms 160

WIRELESS mesh networks (WMNs) provide cheap, reliable,

QoS Provisioning in WLAN Mesh Networks Using Dynamic Bandwidth Control

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems. Distributed File Systems. Example: NFS Architecture

Graph Theoretical Analysis and Design of Multistage Interconnection Networks

AC Circuits Three-Phase Circuits

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

Outline. Binary Tree

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

Oracle PL/SQL Programming Advanced

Chapter 3 Chemical Equations and Stoichiometry

Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple Packet Transmission

Algorithmic Aspects of Access Networks Design in B3G/4G Cellular Networks

5.4 Exponential Functions: Differentiation and Integration TOOTLIFTST:

Link-Disjoint Paths for Reliable QoS Routing

Fundamentals of Tensor Analysis

A122 MARION COUNTY HEALTH BUILDING HVAC, GLAZING AND LIGHTING RENOVATION 75% DOCUMENTS 08/31/2015

Where preparation meets opportunity. My Academic Planner. Early Academic Outreach Program (EAOP)

Bypassing Space Explosion in Regular Expression Matching for Network Intrusion Detection and Prevention Systems

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Hospitals. Internal Revenue Service Information about Schedule H (Form 990) and its instructions is at

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Chapter 2: Privatization, Diffusion of Share Ownership, and Politics

Quality and Pricing for Outsourcing Service: Optimal Contract Design

One Ring to Rule them All: Service Discovery and Binding in Structured Peer-to-Peer Overlay Networks

Important result on the first passage time and its integral functional for a certain diffusion process

Business Process Simulation for Operational Decision Support

B April 21, The Honorable Charles B. Rangel Ranking Minority Member Committee on Ways and Means House of Representatives

NerveCenter Protocol and Perl Metrics. November 2014 NCSD-PPM-01

Back left Back right Front left Front right. Blue Shield of California. Subscriber JOHN DOE. a b c d

Managing Supply Chain Backorders under Vendor Managed Inventory: A Principal-Agent Approach and Empirical Analysis

Diagram Editing with Hypergraph Parser Support

Instruction: Solving Exponential Equations without Logarithms. This lecture uses a four-step process to solve exponential equations:

E X C H A N G E R U L E S A N D C L E A R I N G R U L E S O F N A S D A Q O M X D E R I V A T I V E S M A R K E T S

A Note on Approximating. the Normal Distribution Function

IncrEase: A Tool for Incremental Planning of Rural Fixed Broadband Wireless Access Networks

Applications: Lifting eyes are screwed or welded on a load or a machine to be used as lifting points.

QUANTITATIVE METHODS CLASSES WEEK SEVEN

SEE PAGE 2 FOR BRUSH MOTOR WIRING SEE PAGE 3 FOR MANUFACTURER SPECIFIC BLDC MOTOR WIRING EXAMPLES A

(Analytic Formula for the European Normal Black Scholes Formula)

CPU. Rasterization. Per Vertex Operations & Primitive Assembly. Polynomial Evaluator. Frame Buffer. Per Fragment. Display List.

Revised Conditions (January 2009) LLOYDS BANKING GROUP SHARE ISA CONDITIONS

CompactPCI Connectors acc. to PIGMG 2.0 Rev. 3.0

December Homework- Week 1

Economics 340: International Economics Andrew T. Hill Nontariff Barriers to Trade

Free ACA SOLUTION (IRS 1094&1095 Reporting)

Usability Test Checklist

Incomplete 2-Port Vector Network Analyzer Calibration Methods

Traffic Flow Analysis (2)

Key Management System Framework for Cloud Storage Singa Suparman, Eng Pin Kwang Temasek Polytechnic

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

ITIL & Service Predictability/Modeling Plexent

Continuity Cloud Virtual Firewall Guide

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

Message Definition Report Part 1

Remember you can apply online. It s quick and easy. Go to Title. Forename(s) Surname. Sex. Male Date of birth D

Operational Procedure: ACNC Data Breach Response Plan

TC Appendix 4E Appropriate Qualification tables. (Unless otherwise indicated all qualifications are valid if awarded by examination only)

11 + Non-verbal Reasoning

Lecture 3: Diffusion: Fick s first law

SecurView Antivirus Software Installation

One Minute To Learn Programming: Finite Automata

Mathematics. Mathematics 3. hsn.uk.net. Higher HSN23000

Magic Message Maker Amaze your customers with this Gift of Caring communication piece

Version 1.0. General Certificate of Education (A-level) January Mathematics MPC3. (Specification 6360) Pure Core 3. Final.

Question 3: How do you find the relative extrema of a function?

A Project Management framework for Software Implementation Planning and Management

Jesus Performed Miracles

Regular Sets and Expressions

Repulsive Force

A Path Tracking Algorithm Using Future Prediction Control with Spike Detection for an Autonomous Vehicle Robot

Standard Conditions for Street Traders The Royal Borough of Kensington and Chelsea. Revised standard conditions for street trading

Section A This ONE question is compulsory and MUST be attempted

How To Get A Usb Power Button On Your Computer (For A Free) For A Year (For Free) (For An Ipad) (Free) (Apple) (Mac) (Windows) (Power) (Net) (Winows

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

9 CONTINUOUS DISTRIBUTIONS

Discovering Petri Nets From Event Logs

Network Decoupling for Secure Communications in Wireless Sensor Networks

Got diabetes? Thinking about having a baby?

FEE-HELP INFORMATION SHEET FOR DOMESTIC FULL FEE STUDENTS

AP Calculus AB 2008 Scoring Guidelines

Introduction to Physical Systems Modelling with Bond Graphs

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

CHAPTER 4c. ROOTS OF EQUATIONS

New Basis Functions. Section 8. Complex Fourier Series

Taiwan Stock Forecasting with the Genetic Programming

Rural and Remote Broadband Access: Issues and Solutions in Australia

Electric power can be transmitted or dis

Hardware Modules of the RSA Algorithm

Long run: Law of one price Purchasing Power Parity. Short run: Market for foreign exchange Factors affecting the market for foreign exchange

Enforcing Fine-grained Authorization Policies for Java Mobile Agents

A Secure Web Services for Location Based Services in Wireless Networks*

SPECIAL VOWEL SOUNDS

Transcription:

Curing Rgulr Exprssions Mtching Algorithms rom Insomni, Amnsi, n Aclculi Silsh Kumr, Blkrishnn Chnrskrn, Jonthn Turnr Wshington Univrsity Gorg Vrghs Univrsity o Cliorni, Sn Digo ABSTRACT Th importnc o ntwork scurity hs grown trmnously n collction o vics hv bn introuc, which cn improv th scurity o ntwork. Ntwork intrusion tction systms (NIDS) r mong th most wily ploy such systm; populr NIDS us collction o signturs o known scurity thrts n viruss, which r us to scn ch pckt s pylo. Toy, signturs r otn spcii s rgulr xprssions; thus th cor o th NIDS compriss o rgulr xprssions prsr; such prsrs r tritionlly implmnt s init utomt. Dtrministic Finit Automt (DFA) r st, thror thy r otn sirbl t high ntwork link rts. DFA or th signturs, which r us in th currnt scurity vics, howvr rquir prohibitiv mounts o mmory, which limits thir prcticl us. In this ppr, w rgu tht th tritionl DFA bs NIDS hs thr min limittions: irst thy il to xploit th ct tht norml t strms rrly mtch ny virus signtur; scon, DFAs r xtrmly inicint in ollowing multipl prtilly mtching signturs n xplos in siz, n thir, init utomton r incpbl o icintly kping trck o counts. W propos mchnisms to solv ch o ths rwbcks n monstrt tht our solutions cn implmnt NIDS much mor scurly n conomiclly, n t th sm tim substntilly improv th pckt throughput. Ctgoris n Subjct Dscriptors C.2. [Computr Communiction Ntworks]: Gnrl Scurity n protction (.g., irwlls) Gnrl Trms Algorithms, Dsign, Scurity. Kywors DFA, rgulr xprssions, p pckt inspction. 1. INTRODUCTION Ntwork scurity hs rcntly rciv n normous ttntion u to th mounting scurity concrns in toy s ntworks. A wi vrity o lgorithms hv bn propos which cn tct n combt with Prmission to mk igitl or hr copis o ll or prt o this work or prsonl or clssroom us is grnt without provi tht copis r not m or istribut or proit or commrcil vntg n tht copis br this notic n th ull cittion on th irst pg. To copy othrwis, or rpublish, to post on srvrs or to ristribut to lists, rquirs prior spciic prmission n/or. ANCS'7, Dcmbr 3 4, 27, Orlno, Flori, USA. Copyright 27 ACM 978-1-59593-945-6/7/12...$5.. ths scurity thrts. Among ll ths proposls, signtur bs Ntwork Intrusion Dtction Systms (NIDS) hv bcom commrcil succss n hv sn wispr option. A signtur bs NIDS mintins signturs, which chrctrizs th proil o known scurity thrts (.g. virus, or DoS ttck). Ths signturs r us to prs th t strms o vrious lows trvrsing through th ntwork link; whn low mtchs signtur, pproprit ction is tkn (.g. block th low or rt limit it). Tritionlly, scurity signturs hv bn spcii s string bs xct mtch, howvr rgulr xprssions r now rplcing thm u to thir suprior xprssiv powr n lxibility. Whn rgulr xprssions r us to spciy th signturs in NIDS, thn init utomton r typiclly mploy to implmnt thm. Thr r two typs o init utomton: Nontrministic Finit Automton (NFA) n Dtrministic Finit Automton (DFA) [2]. Unlik NFA, DFA rquirs only on stt trvrsl pr chrctr thrby yiling highr prsing rts. Aitionlly, DFA mintins singl stt o xcution which rucs th pr low prs stt mintin u to th pckt multiplxing in ntwork links. Consquntly, DFA is th prrr mtho. DFAs r st, howvr or th currnt sts o rgulr xprssions, thy rquir prohibitiv mounts o mmory. Currnt solutions otn ivi signtur st into multipl substs, n construct DFA or ch o thm. Howvr, multipl DFAs rquir multipl stt trvrsls which ruc th throughput, n incrs th pr low prs stt. Lrg pr low prs stt my lso crt prormnc bottlnck bcus thy my hv b lo n stor or vry pckt u to th pckt multiplxing. Th problms ssocit with th tritionl DFA bs rgulr xprssions stms rom thr prim ctors. First, thy tk no intrst in xploiting th ct tht norml t strms rrly mtch mor thn irst w symbols o ny signtur. In such situtions, i on constructs DFA or th ntir signturs, thn most portions o th DFA will b unvisit, thus th pproch o kping th ntir utomton ctiv pprs wstul; w cll this icincy insomni. Scon, DFA usully mintins singl stt o xcution, u to which it is unbl to icintly ollow th progrss o multipl prtil mtchs. Thy mploy sprt stt or ch such combintion o prtil mtch, thus th numbr o stts cn xplo combintorilly. It pprs tht i on quips n utomton with smll uxiliry mmory which it will us to rgistr th vnts o prtil mtchs, thn combintoril xplosion cn b voi; w rr to this implmnt 32-bit countr. W cll this icincy clculi. In this ppr, w propos solutions to tckl ch o ths thr rwbcks. W propos mchnisms to split signturs such tht, only on portion ns to rmin ctiv, whil th rmining portions cn b put to slp unr norml conitions. W lso 155

propos cur to mnsi, by introucing nw mchin, which is s st s DFA, but rquirs much wr numbr o stts. Our inl cur to clculi xtns this mchin, so tht it cn hnl vnts o counting much mor icintly. Th rminr o th ppr is orgniz s ollows. Du to spc limittion, bckgroun is prsnt in th tchnicl rport [35]. Sction 2 xplins th rwbcks o tritionl implmnttions. Our cur to insomni is prsnt in Sction 3. Sction 4 prsnts th cur to mnsi, n sction 5 prsnts th cur to clculi. Sction 6 rports th rsults, n th ppr conclus in Sction 7. 2. Rgulr Exprssions in Ntworking Any implmnttion o rgulr xprssions in ntworking hs to l with svrl complictions. Th irst compliction riss u to multiplxing o pckts in th ntwork links. Sinc pckts blonging to irnt lows cn rriv intrsprs with ch othr, ny pttrn mtchr hs to -multiplx ths pckts n rssmbl th t strm o vrious lows bor prsing thm. As consqunc, th rchitctur must mintin th prs stt tr prsing ny pckt. Upon switch rom low x to low y, th mchin will irst stor th prs stt o th currnt low x n lo th prs stt o th lst pckt o th low y. Consquntly, it is criticl to limit th prs stt ssocit with th pttrn mtchr bcus t high sp bckbon links, th numbr o lows cn rch up to million. NFAs r thror not sirbl in spit o bing compct, bcus thy cn hv lrg numbr o ctiv stts. On th othr hn, DFA rquirs singl ctiv stt; thus th mount o prs stt rmins smll. Th scon compliction riss u to th high ntwork link rts. In 1 Gbps ntwork link, pylo byt usully rrivs vry nno-scon. Thus, prsr running t 1GHz clock rt will hv singl clock cycl to procss ch input byt. NFAs r unlikly to mintin such prsing sps bcus thy otn rquir multipl stt trvrsls or n input byt; thus DFAs ppr to b th only rsort. Du to ths complictions, on cn conclu tht pttrn mtching mchin or ntworking pplictions must stisy ths ul objctivs i) st prsing rts or w trnsitions pr input byt, n ii) lss pr low stt. Although, DFAs ppr to mt both o ths gols, thy otn sur rom stt xplosion, i.. th totl numbr o stts in DFA cn b xponntil in th lngth o th rgulr xprssion. Th problms with DFA bs pproch cn b ivi into th ollowing thr min ctgoris. 2.1 Thr Ky Problms o Finit Automt In this sction, w introuc th thr icincis o tritionl init utomt bs rgulr xprssions pproch: 1. Tritionl rgulr xprssions implmnttions otn mploy th complt signturs to prs th input t. Howvr, in NIDS pplictions, th liklihoo tht norml t strm compltly mtchs signtur is low. Tritionl pproch thror pprs wstul; rthr, th til portions o th signturs cn b isolt rom th utomton, n put to slp uring norml tric n wokn up only whn thy r n. W cll this inbility o th tritionl pproch Insomni. Th numbr o stts in mchin suring rom insomni my unncssrily blot up; th problm bcoms mor svr whn th til portion is rltivly complx n long. W prsnt n ctiv cur to insomni in sction 3. 2. Th scon icincy, which is spciic to DFAs, cn b clssii s Amnsi. In mnsi, DFA hs limit mmory; thus it only rmmbrs singl stt o prsing n ignors vrything bout th rlir prs n th ssocit prtil mtchs. Du to this tnncy, DFAs my rquir lrg numbr o stts to trck th progrss o both th currnt mtch s wll s ny prvious prtil mtch. Although mnsi kps th pr low stt rquir uring th prs smll, it otn cuss n xplosion in th numbr o stts, bcus sprt stt is rquir to inict vry possibl combintion o prtil mtch. Intuitivly, mchin which hs w lgs in ition to its currnt stt o xcution cn utiliz ths lgs to trck multipl mtchs mor icintly n voi stt xplosions. W propos such mchin in sction 4, which icintly curs DFAs rom mnsi. 3. Th thir icincy o th init utomt cn b tgg with th lbl Aclculi u to which it (both NFA n DFA) is unbl to icintly count th occurrncs o crtin sub-xprssions in th input strm. Thus, whnvr rgulr xprssion contins lngth rstriction o k on sub-xprssion, th numbr o stts rquir by th sub-xprssion gts multipli by k. With lngth rstrictions, th numbr o stts in NFA incrss linrly, whil in DFA, it my incrs xponntilly. It is sirbl to construct mchin which is cpbl o counting crtin vnts, n uss this cpbility to voi th stt xplosion. W propos such mchins in sction 5. W now proc with our curs to ths thr icincis. Our irst solution is cur rom insomni. 3. Curing DFA rom Insomni Tritionl pproch o pttrn mtching constructs n utomton or th ntir rgulr xprssion (rg-x) signtur, which is us to prs th input t. Howvr, in NIDS pplictions, norml lows rrly mtch mor thn irst w symbols o ny signtur. Thus, th tritionl pproch pprs wstul; th utomton unncssrily blots up in siz s it ttmpts to rprsnt th ntir signtur vn though th til portions r rrly visit. Rthr, th til portions cn b isolt rom th utomton, n put to slp uring norml tric conitions n wokn up only whn thy r n. Sinc th tritionl pproch is unbl to prorm such slctiv slping n kps th utomton wk or th ntir signtur, w cll this icincy insomni. In othr wors, insomni cn b viw s th inbility o th tritionl pttrn mtchrs to isolt rquntly visit portions o signtur rom th inrqunt ons. Insomni is ngrous u to two rsons i) th inrquntly visit til portions o th rg-xs r gnrlly complx (contins closurs, unions, lngth rstrictions) n long (mor thn 8% o th signtur), n ii) th siz o st rprsnttions o rg-xs (.g. DFA) usully r xponntil in th lngth n complxity o n xprssion. Thus, without cur rom insomni, DFA o hunrs o rg-xs my bcom insibl or will rquir normous mounts o mmory. An obvious cur to insomni will ssntilly rquir n isoltion o th rquntly visit portions o th signturs rom th inrqunt ons. Clrly, rquntly visit portions must b implmnt with st rprsnttion lik DFA n stor in st mmory in orr to mintin high prsing rts. Morovr, sinc st mmoris r lss ns n limit in siz, n st rprsnttions lik DFA usully sur rom stt blowup, it is vitl to kp such st rprsnttions compct n simpl. Fortuntly, prcticl signturs cn b clnly split into simpl prixs n suixs, such tht th prixs compris o th ntir rquntly visit portions o th signtur. Thror, with such cln sprtion in plc, only th utomton rprsnting th prixs n to rmin 156

B bits/sc C stt Fst pth stt mmory Fst pth utomton Slow pth mmory εb bits/sc Slow pth utomt ctiv t ll tims; thrby, curing th tritionl pproch rom insomni by kping th suix utomton in slp stt most o th tims. Thr is n importnt tro involv in such prix n suix bs pttrn mtching rchitctur. Th gnrl objctiv is to kp th prixs smll, so tht th utomton which is wk ll th tim rmins compct n st. At th sm tim, i prixs r too smll thn norml t strms will mtch thm otn, thrby wking up th suixs mor rquntly thn sir. Not tht, uring bnorml conitions th utomton rprsnting th suixs will b triggr mor otn; howvr, w iscuss such scnrios ltr. Unr norml conitions, th rchitctur must thror blnc th tro btwn th simplicity o th st utomton n th ormncy o th slow utomton. W rr to th utomton which rprsnts th prixs s th st pth n th rmining s th slow pth. Fst pth rmins wk or th ntir input t strm, n ctivts th slow pth onc it ins mtching prix. Thr r two xpcttions. First, slow pth shoul b triggr rrly. Scon, it shoul procss rction o th input t; hnc it cn us slow mmory n compct rprsnttion lik NFA, vn i it is rltivly slow. In orr to mt ths xpcttions, norml t strms must not mtch th prixs o th signturs or mtch thm rrly. Upon prix mtch, th slow pth procssing shoul not continu or long tim. Th liklihoo tht ths two xpcttions will b mt uring norml tric conitions will pn irctly upon th signturs n th positions whr thy r split into prixs n suixs. Thus, it is criticl to ci th split positions n w scrib our procur to comput ths in th nxt sction. 3.1 Splitting th rgulr xprssions Th ul objctivs o th splitting procur r tht th prixs rmin s smll s possibl, n t th sm tim, th liklihoo tht norml t mtchs ths prixs is low. Th probbility o mtching prix pns upon its lngth n th istribution o vrious symbols in th input t. In this contxt, it my not b ccptbl to ssum uniorm rnom istribution o th input symbols (i.. vry symbol pprs with probbility o 1/256) bcus som wors ppr much mor otn thn th othrs (.g. HELO in n ICMP pckt). Thror, on ns to consir trc rivn probbility istribution o vrious input symbols [6]. With ths trcs, on cn comput th mtching probbility o prixs o irnt lngths unr norml n nomlous tric. This will trmin th rt t which slow pth will b triggr. In ition to th mtching probbilitis, it is importnt to consir th probbilitis o mking trnsitions btwn ny two stts o th utomton. This probbility will trmin how long th slow pth will continu procssing onc it is triggr. Ths trnsition probbilitis r likly to b pnnt upon th prvious strm o input symbols, bcus thr is strong corrltion btwn th occurrncs o vrious symbols, i.. whn n whr stt s Figur 1: Fst pth n slow pth procssing in biurct pckt procssing rchitctur. εc thy occur with rspct to ch othr. Th trnsition probbilitis s wll s th mtching probbilitis cn b ssign by constructing n NFA o th rgulr xprssions signturs n prsing th sm ginst norml n nomlous tric. Mor systmticlly, givn th NFA o ch rgulr xprssion, w trmin th probbility with which ch stt o th NFA bcoms ctiv n th probbility tht th NFA tks its irnt trnsitions. Onc ths probbilitis r comput, w trmin cut in th NFA grph, so tht i) thr r s w nos s possibl on th lt hn si o th cut, n ii) th probbility tht stts on th right hn si o th cut is ctiv is suicintly smll. This will nsur tht th st pth rmins compct n th slow pth is triggr only occsionlly. Whil trmining th cut, w lso n to nsur tht th probbility o thos trnsitions which lvs som NFA no on th right hn si n ntrs som othr no on th sm si o th cut rmins smll. This will nsur tht, onc th slow pth is triggr, it will stop tr procssing w input symbols. Clrly, th cut comput rom th norml tric trcs n rom th ttck tric r likly to b irnt, thus th corrsponing prixs will lso b irnt. W opt th policy o tking th longr prix. Mor tils o th cutting lgorithm r prsnt in th tchnicl rport [35]. 3.2 Th biurct pttrn mtching W now prsnt th biurct pttrn mtching rchitctur. Th rchitctur (shown in Figur 1) consists o two componnts: st pth n slow pth. Th st pth prss vry byt o ch low n mtchs thm ginst th prixs o ll rg-xs. Th slow pth prss only thos lows which hv oun mtch in th st pth, n mtchs thm only ginst th corrsponing suixs. Notic tht, th prsing o input t is prorm on pr low bsis. In orr to kp prsing o ch low iscrt, th pr low prs stt hs to b stor. With millions o ctiv lows, prs stts hv to b stor in n o-chip mmory, which my crt prormnc bottlnck bcus upon ny low switch w will hv to stor n lo this inormtion. With th minimum IP pckt siz bing 4 byts, w my hv to prorm this lo n stor oprtion vry 4 ns t 1 Gbps rts. Thus, it is importnt to minimiz th pr low prs stts. This minimiztion is criticl in th st pth bcus ll lows r procss by th st pth. It os not pos similr thrt to th slow pth bcus it procsss rction o th pylo o smll numbr o lows. Consquntly, th st pth utomton hs two objctivs: 1) it must rquir smll pr low prs stt, n 2) it must b bl to prorm prsing t high sp, in orr to mt th link rts. On obvious solution which will stisy this ul objctiv is to construct singl composit DFA o ll prixs. A composit DFA will hv only on ctiv stt pr low n will lso rquir only on stt trvrsl or n input chrctr. Thus, i thr r C lows in totl, w will n C stt mmory, whr stt is th bits n to rprsnt DFA stt. At this point in iscussion w will proc with composit DFA in th st pth, ltr in sction 4, w will propos n ltrntiv to composit DFA which is mor spc icint n yt stisis our ul objctivs. Slow pth on th othr hn hnls, sy ε rction o th totl numbr o byts procss by th st pth. Thror, it will n to stor th prs stt o εc lows on n vrg. I w kp ε smll, thn unlik th st pth, w nithr hv to worry bout minimizing th pr low prs stt nor o w hv to us st rprsnttion, to kp up with th link rts. Thus, NFA my 157

* *.25 g-h 1. 1. * 1. CUT ^g.2.1.1 1 2 g 3 5.1.1.8.6.6 6 7 g 8 i 9 j 1.1.2.16.8.8 11 g-h 12 i 13-14 c 15 st pth utomton suic to rprsnt th slow pth. Nvrthlss slow pth ors nothr ky vntg, i.. composit utomton or ll suixs is not rquir bcus w n to prs th lows ginst only thos suixs whos prixs hv bn mtch. Howvr, thr is compliction in th slow pth. Slow pth cn b triggr multipl tims or th sm low, thus thr cn b multipl instncs o pr low ctiv prs stts vn though w my b using DFA. Consir simpl xmpl o n xprssion bbc, which is split into b prix n bc suix, n pckt pylo xybbcpq. Th slow pth will b triggr twic by this pckt, n thr will b two instncs o ctiv prs stts in th slow pth. In gnrl it is possibl tht i) singl pckt triggrs th slow pth svrl tims, in which cs signling btwn th st n slow pth my bcom bottlnck n ii) thr r multipl ctiv stts in th slow pth, which will rquir complict t-structurs to stor th prs stts. Ths problms will xcrbt whn th slow pth will procss pckts much slowr thn th st pth n will hnl its triggrs squntilly. With th bov pckt, slow pth will b triggr irst tr th st pth prss xybbcpq n scon tr xybbcpq. Upon irst triggr, it will prs th pylo xybbcpq n stop tr it ss p. Upon scon triggr, it will prs th pylo xybbcpq, thus rpting th prvious prs. Du to ths complictions, w propos pcktiz vrsion o biurct pckt procssing rchitctur. 3.3 Pcktiz biurct pttrn mtching Th objctiv o th pcktiz biurct pckt procssing is to minimiz th signling btwn th st pth n th slow pth. Mor spciiclly i w nsur tht th st pth triggrs th slow pth t most onc or vry pckt, thn th slow pth will not rpt th prsing o th sm pckt pylo. This objctiv cn b stisi by slightly moiying th slow pth utomton, so tht it prss th pckts ginst th ntir signtur, n not just th suixs. With th slow pth rprsnting th ntir signtur, th subsqunt triggrs or this signtur will b cptur within th slow pth. Hnc, thy cn b ignor. In orr to bttr unrstn how th slow pth is construct n triggr, lt us consir simpl xmpl o 3 signturs: r 1 =. * [gh][^g] * g r 2 =. * g[^i] * i[^j] * j r 3 =. * [gh]i[^l] * []c Th NFA or ths signturs r shown in igur 2 ( composit DFA or ths signturs will contin 92 stts). In th igur, th probbilitis with which vrious NFA stts r ctivt r lso highlight. A cut btwn th st n slow pth is lso shown which ivis th stts so tht th cumultiv probbility o th slow pth stts is lss thn 5%. ^i ^l ^j slow pth utomt Figur 2: NFA n th cut btwn prix n suix ^,g,h g,h * g,h, 1 ^,,g,h, 5 With this cut, th prixs will b p 1 = [gh][^g] * g; p 2 = ; n p 3 = j[gh] n th corrsponing suixs will b s 1 = ; s 2 = g[^i] * i[^j] * j; n s 3 = i[^l] * []c. As highlight in th sm igur, st pth consists o composit DFA o th thr prixs p 1, p 2, n p 3, which will hv only 14 stts, whil th slow pth compriss o thr sprt DFAs, on or ch signtur r 1, r 2, n r 3, rthr thn just th suixs s 1, s 2, n s 3. Whnvr th st pth will in mtching prix, sy p i in pckt, it will triggr th corrsponing slow pth utomton rprsnting th signtur r i. Onc this utomton is triggr, ll subsqunt triggrs corrsponing to th prix p i or th signtur r i cn b ignor bcus uring th procss o mtching r i in th slow pth, such triggrs will lso b tct. Thus, or ny givn pckt procss in th st pth, th stt o th slow pth ctiv or slp ssocit with ch signtur is mintin, so tht th subsqunt triggrs or ny givn signtur cn b msk out. Howvr, w hv to b crul in inititing th o triggring th slow pth utomton rprsnting ny signtur r i. Spciiclly, w hv to nsur tht th slow pth utomton bgins t stt which inicts tht th prix p i o th signtur r i hs lry bn tct. Consir th DFA or th irst signtur (r 1 ) o th bov slow pth, shown in Figur 3. Inst o bginning t th usul strt stt, o this DFA, w bgin its prsing t th stt (,1,3), which inicts tht th prix p 1 hs just bn tct; th prsing continus rom this point onwrs in th slow pth. In gnrl cs, th strt stt o th slow pth utomton will pn upon th st pth DFA stt which triggrs th slow pth. Mor spciiclly, th slow pth strt stt will b th miniml on which ncompsss ll prtil mtchs in th st pth. Th bov procur scribs how w initit th slow pth utomton or prix mtch in ny givn pckt. Th cision tht th slow pth shoul rmin ctiv or th subsqunt pckts o th low pns on th stt o th slow pth utomton t which th pckt lvs it. I this inl DFA stt compriss ny o th stts o th slow pth NFA, thn th impliction is tht th slow pth procssing will continu; ls th slow pth will b put to slp. For xmpl, in th Figur 3, unlss th inl stt upon pckt prsing is ithr (,1,3) or (,5), th subsqunt pckts o th low will not b prs by this utomton; in othr wors this utomton will no longr rmin ctiv. Lt us now consir th prsing o pckt pylo gggh. Th st pth stt trvrsl is illustrt blow; th slow pth will b triggr twic, but th scon triggr will b ignor. g g g h,1,2,1,3,2,1,3,1 r r Upon th irst triggr, th slow pth DFA (shown in Figur 3) or th signtur r 1 will bgin its xcution t th stt (,1,3) n will prs th rmining pckt pylo gh. Th prsing will inish t th DFA stt (, 1). Sinc this stt os not contin ny o th stts o th slow pth NFA, this slow pth utomton will b put to 1 ^g,h, 2, 1, 2 g,h g, 1, 3 h ^g g "strt stt" Figur 3: DFA n strt stt or r 1 in th slow pth 1 158

pcktiz rchitctur slow pth triggring byt-bs rchitctur slow pth bing ctiv 1 11 21 31 41 51 61 1 11 21 31 41 51 61 Figur 4: Fst pth n slow pth procssing in biurct pckt n byt bs procssing rchitcturs. slp. On th othr hn i th rmining pckt pylo wr g, th pckt woul lv th slow pth in th stt (,5). Thus, in this cs, th slow pth procssing will rmin ctiv or th subsqunt pckts o th low. In contrst with th prvious byt bs pttrn mtching rchitctur, th propos pcktiz rchitctur hs rwbck tht it kps th slow pth utomton ctiv until th pckt is compltly prs in th slow pth. Thus, th slow pth my n up procssing mny mor byts, unlik in th byt lvl rchitctur. This rwbck riss u to th irnc in th procssing grnulrity; th byt bs pttrn mtchr will hlt th slow pth s soon s th nxt input chrctr ls to suix mismtch, whrs th pcktiz pttrn mtchr will rtin th slow pth ctiv till th lst byt o th pckt is prs. Nvrthlss, th pcktiz rchitctur mintins th triggring probbility t much lowr vlu, sinc th rcurrnt signling o prixs blonging to th sm signtur is supprss. Lt us xprimntlly vlut th prormnc o th pcktiz pttrn mtching rchitctur ginst th byt lvl rchitctur. Both rchitcturs r likly to oprt wll whn th input tric is bnign n th slow pth is triggr with vry low probbility, sy.1%. Thror, w consir n xtrm sitution whr th 1% o th contnts o th input t strm consists o th ntir signturs. Thus, th triggring probbility o th slow pth will b roun 1%. W us 36 Cisco signturs whos vrg lngth is 33 chrctrs, n ssum tht pckts r 2 byts long. In Figur 4, w plot snpshot o th timlin o th triggring vnts, n th tim intrvls uring which th slow pth is ctiv. It is pprnt tht slow pth in th pcktiz rchitctur rmins ctiv or rltivly longr urtions. Consquntly, th signturs hv to b split ccoringly in th pcktiz rchitctur, so tht th slow pth will hnl such los. 3.4 Protction ginst DoS ttcks In biurct pckt procssing rchitctur, smll rction o pckts rom th norml lows might b ivrt to th slow pth, vn though norml t strm is not likly to mtch ny signtur. Th slow pth procssing is provision in wy tht it cn sustin th rt t which such ls pckt ivrsions rom norml lows occur. Thror, it is unlikly, tht ths pckts rom norml lows will ovrlo th slow pth. Howvr, low which rquntly mtchs prixs, my ovrlo th slow pth by triggring it mor otn thn sir. This opns up possibility o Dnil o srvic ttck. A nil o srvic ttck, in ct is much mor thrtning to th n-to-n t trnsr. Consir pckt rom norml low gtting ivrt to th slow pth. I th slow pth is ovrlo, thn this pckt will ithr gt iscr or ncountr normous procssing lys. I th sning ppliction rtrnsmits this pckt, it will urthr xcrbt th ovrlo conition in th slow pth. Th impliction on th n-to-n t trnsr is tht it my nvr b bl to livr this pckt, n complt th t trnsmission. This clrly signls n to protct ths norml lows rom such rpt pckt iscrs. To ccomplish this objctiv, w n som mchnism in th slow pth to istinguish such pckts o norml lows rom th pckts o th nomlous or ttck lows, which r ovrloing th slow pth. W now propos lightwight lgorithm which prorms such clssiiction t vry high sp n with high ccurcy. Our lgorithm is bs upon sttisticl smpling o pckts rom ch low. For ch low, w comput n nomly inx which is moving vrg o th numbr o its pckts which mtchs on o th prixs in th st pth. Th moving vrg cn ithr b simpl moving vrg (SMA) or n xponntil moving vrg (EMA). For simplicity w only consir th SMA, whrin w comput th vrg numbr o pckts which mtchs som prix ovr winow o n prvious pckts. W cll low wll-bhving, i lss thn ε rction o its pckts ins mtch, simply bcus such low will not ovrlo th slow pth. Flows which in mor mtchs r rrr to s nomlous. I th smpling winow n is suicintly lrg, thn th nomly inics o th wll-bhving lows r xpct to b much smllr thn thos o th nomlous/ttck lows. Howvr, longr smpling winows will rquir mor bits pr low to comput th nomly inx. Consquntly thr is tr-o btwn th ccurcy o th nomly inics n th pr low mmory n to mintin thm. W ttmpt to strik blnc btwn this ccurcy n th cost o implmnttion. Lt us sy tht w r givn with t most k-bits or vry low to rprsnt its nomly inx. Sinc low is clr nomlous s soon s its nomly inx xcs ε, w st ε s th uppr boun o th nomly inx. Thus, whn ll k-bits r st, it rprsnts n nomly inx o ε. Consquntly, th pr low smpling winow, n compriss o 2 k /ε pckts; or vry pckt which mtchs prix, th k-bit countr is incrmnt by 1/ε n or othr pckts it is crmnt by 1 (not tht low is thrt only i mor thn ε rction o its pckts r ivrt to th slow pth, or th mn istnc btwn pckts which r ivrt is smllr thn 1/ε pckts). Thus, th probbility tht low which in is nomlous is not tct will b O( n ). I ε is.1, thn 8-bit nomly countr will rsult in ls tction probbility o wll blow 1 6. This nlysis ssums tht th vnts o pckt ivrts to th slow pth is uniormly istribut. In cs o ny othr istribution, th ccurcy o th tction o nomlous lows is likly to improv whil th probbility tht norml low is lsly tct s nomlous my lso incrs. Th nomly countrs in ct, inicts th gr to which low los th slow pth. Consquntly, thy cn b us to clssiy not just th nomlous lows but lso th wll bhving lows. Th lows cn b prioritiz in th slow pth ccoring to th gr o thir nomly; th impliction bing tht th slow pth will irst procss th lows with smllr nomly inics. Th slow pth thus 159

C εb pkts/sc HoL bur Bpkts/sc k pr-low nomly countr Fst pth utomton consists o multipl quus which will stor th rqusts rom vrious lows ccoring to thir nomly inics. Quus ssocit with smllr nomly inics r srvic with highr priority. Hnc, vn i wll bhving low ccintlly ivrts its pckts to th slow pth, it will b srvic quickly in spit o th prsnc o lrg volums o nomlous pckts. 3.5 Bining things togthr Hving scrib th procur to split th rg-x signturs into simpl prixs n rltivly complx suixs s wll s mchnisms n to put th suix portions to slp, w r now ry to iscuss som urthr issus. In ths pttrn mtching rchitcturs, th irst issu is tht it otn bcoms criticl to prvnt rcivr rom rciving complt signtur. This hs n intrsting impliction. Whnvr pckt is ivrt to th slow pth, no subsqunt pckts o th sm low cn b orwr in th st pth, until th slow pth pckt is compltly procss. I this policy is not hr to, thn signturs tht spn cross multipl pckts might not b tct. This inicts tht in ny low, i pckt is ccintlly ivrt to th slow pth, subsqunt pckts o th low cn crt h o lin (HoL) blocking in th st pth. Thus, in orr to voi such HoL blockings, HoL bur is mintin (shown in Figur 5), which stors th pckts tht cn not b procss currntly. Th bov iscussion gin bolstrs th prmis tht th norml lows must b gur ginst nomlous/ttck lows which my ovrlo th slow pth. Without such protction, whnvr ivrt pckt o norml low gts ithr ly or iscr in th hvily lo slow pth, subsqunt pckts o th low cnnot b orwr; thus th low will ssntilly bcom. In cs o TCP, th iscr pckt will gt rtrnsmitt tr th tim-out; nvrthlss, it will gin gt ivrt to th slow pth, n congstion will nsu. Sinc DoS protction is crucil, w hv prorm thorough vlution o DoS protction, n th rsults r summriz in th tchnicl rport [35] 4. H-FA: Curing DFAs rom Amnsi DFA stt xplosion occurs primrily u mnsi, or th incomptnc o th DFA to ollow multipl prtil mtchs with singl stt o xcution. Bor procing with th cur to mnsi, w r-xmin th connction btwn mnsi n th stt xplosion. As suggst prviously, DFA stt xplosions usully occur u to thos signturs which compris o simpl pttrns ollow by closurs ovr chrctrs clsss (.g.. * or [z] * ). Th simpl pttrn in ths signturs cn b mtch with strm o suitbl chrctrs n th subsqunt chrctrs cn b consum without moving wy rom th closur. Ths chrctrs cn bgin to mtch ithr th sm or som othr rg-x, n such : : Slow pth utomt slow pth slp sttus Figur 5: Fst pth n slow pth procssing in biurct pckt procssing rchitctur. situtions o multipl prtil mtchs hv to b ollow. In ct, vry prmuttion o multipl prtil mtchs hs to b ollow. A DFA rprsnts ch such prmuttion with sprt stt u to its inbility to rmmbr nything othr thn its currnt stt (mnsi). With multipl closurs, th numbr o prmuttions o th prtil mtchs cn b xponntil, thus th numbr o DFA stts cn lso xplo xponntilly. An intuitiv solution to voi such xponntil xplosions is to construct mchin, which cn rmmbr mor inormtion thn just singl stt o xcution. NFAs ll in this gnr; thy r bl to rmmbr multipl xcution stts, thus thy voi stt xplosion. NFAs, howvr, r slow; thy my rquir O(n 2 ) stt trvrsls to consum chrctr. In orr to kp st xcution, w woul lik to nsur tht th mchin mintins singl ctiv stt. On wy to nbl singl xcution stt n yt voi stt xplosion is to quip th mchin with smll n st cch, to rgistr ky vnts uring th prs, such s ncountring closur. Rcll tht th stt xplosion occurs bcus th prsing gt stuck t singl or multipl closurs; thus i th history bur will rgistr ths vnts thn on my voi svrl stts. W cll this clss o mchin History bs Finit Automton (H-FA). Th xcution o th H-FA is ugmnt with th history bur. Its utomton is similr to tritionl DFA n consists o st o stts n trnsitions. Howvr, multipl trnsitions on singl chrctr my lv rom stt (lik in NFA). Nvrthlss, only on o ths trnsitions is tkn uring th xcution, which is trmin tr xmining th contnts o th history bur; thus crtin trnsitions hv n ssocit conition. Th contnts o th history bur r upt uring th mchin xcution. Th siz o th H-FA utomton (numbr o stts n trnsitions) pns upon thos prtil mtchs, which r rgistr in th history bur; i w juiciously choos ths prtil mtchs thn th H-FA cn b kpt xtrmly compct. Th nxt obvious qustions r: i) how to trmin th prtil mtchs? ii) Hving trmin thm, how to construct n utomton? iii) How to xcut th utomton n upt th history bur? W now srib H-FA which ttmpts to nswr ths qustions. 4.1 Motivting xmpl W introuc th construction n xcuting o H-FA with simpl xmpl. Consir two rg-x pttrns list blow: r 1 =. * b[^] * c; r 2 =. * ; Ths pttrns crt NFA with 7 stts, which is shown blow: NFA: b[^]*c; * ^ 1 b 2 c 3 4 5 6 Lt us xmin th corrsponing DFA, which is shown blow (som trnsitions r omitt to kp th igur rbl): ^[],1,4 b,2,2,4,2,5,2,6,5 c, 3 c c c, 6 16

Th DFA hs 1 stts; ch DFA stt corrspons to subst o NFA stts, s shown bov. Thr is smll blowup in th numbr o stts, which occurs u to th prsnc o th Kln closur [^] * in th xprssion r 1. Onc th prsing rchs th Kln closur (NFA stt 2), subsqunt input chrctrs cn bgin to mtch th xprssion r 2, hnc th DFA rquirs thr itionl stts (,2,4), (,2,5) n (,2,6) to ollow this multipl mtch. Thr is subtl irnc btwn ths stts n th stts (,4), (,5) n (,6), which corrspons to th mtching o th rg-x r 2 lon: DFA stts (,2,4), (,2,5) n (,2,6) compris o th sm subst o th NFA stts s th DFA stts (,4), (,5) n (,6) plus thy lso contin th NFA stt 2. In gnrl, thos NFA stts which rprsnt Kln closur ppr in svrl DFA stts. Th sitution bcoms mor srious whn thr r multipl rg-xs contining closurs. I NFA consists o n stts, o which k stts rprsnts closurs, thn uring th prsing o th NFA, svrl prmuttions o ths closur stts cn bcom ctiv; 2 k prmuttions r possibl in th worst cs. Thus th corrsponing DFA, ch o whos stts will b st o th ctiv NFA stts, my rquir totl n2 k stts. Ths DFA stt st will compris o on o th n NFA stts plus on o th 2 k possibl prmuttions o th k closur stts. Such n xponntil xplosion clrly occurs u to mnsi, s th DFA is unbl to rmmbr tht it hs rch ths closur NFA stts uring th prsing. Intuitivly, th simplst wy to voi th xplosion is to nbl th DFA to rmmbr ll closurs which hs bn rch uring th prsing. In th bov xmpl, i th mchin cn mintin n itionl lg which will inict i th NFA stt 2 hs bn rch or not, thn th totl numbr o DFA stts cn b ruc. On such mchin is shown blow:, lg<= b, lg<=1 c, lg=,1,4 c,i lg=1, lg<=, lg<= This mchin mks trnsitions lik DFA; bsis it mintins lg, which is ithr st or rst (inict by <=1, n <= in th igur) whn crtin trnsitions r tkn. For instnc trnsition on chrctr rom stt () to stt (,1) rsts th lg, whil trnsition on chrctr b rom stt (,1) to stt () sts th lg. Som trnsitions lso hv n ssocit conition (lg is st or rst); ths trnsitions r tkn only whn th conition is mt. For instnc th trnsition on chrctr c rom stt () ls to stt (,3) i th lg is st, ls it ls to stt (). This mchin will ccpt th sm lngug which is ccpt by our originl NFA, howvr unlik th NFA, this mchin will mk only on stt trvrsl or n input chrctr. Consir th prs o th string cbc strting t stt (), n with th lg rst. bcuslgis rst bcuslgis st c b c ( ) ( ) (,4) (,1 ) ( ) (,3),5 rst lg st lg In th bginning th lg is rst; consquntly th mchin mks mov rom stt () to stt () on th input chrctr c. On th othr hn, whn th lst chrctr c rrivs, th mchin mks mov rom stt () to stt (,3) bcus th lg is st this tim., 3, 6 lg Sinc stt (,3) is n ccpting stt, th string is ccpt. Such mchin cn b sily xtn to mintin multipl lgs, ch inicting closur. Th trnsitions pn upon th stt o ll lgs n thy will b upt uring crtin trnsitions. As illustrt by th bov xmpl, ugmnting n utomton with ths lgs cn voi stt xplosion. Howvr, w n mor systmtic wy to construct ths H-FAs, which w propos now. 4.2 Forml Dscription o H-FA History bs Finit Automt (H-FA) compriss o n utomton n st cll history. Th trnsitions hv i) n ccompni conition which pns upon th stt o th history, n ii) n ssocit ction which r insrts or rmov rom th history st, or both. H-FA cn thus b rprsnt s 6-tupl M = (Q, q, Σ, A, δ, H), whr Q is th st o stts, q is th strt stt, Σ is th lphbt, A is th st o ccpting stts, δ is th trnsition unction, n H th history. Th trnsition unction δ tks in chrctr, stt, n history stt s its input n rturns nw stt n nw history stt. δ : Q Σ H Q H H-FAs cn b synthsiz ithr irctly rom NFA or rom DFA. For clrity, w xplin th construction rom combintion o NFA n DFA. To illustrt th construction, w consir our prvious xmpl o th two rg-xs. First, w trmin thos NFA stts o th rg-xs, which r rgistr in th history bur (gnrlly ths r th closur NFA stts). Th irst rg-x, r 1 contins closur rprsnt by th NFA stt 2; hnc w kp singl lg in th history or this stt. Atrwrs, w intiy thos DFA stts, which compris o ths closur NFA stts, in this instnc th NFA stt 2. W cll ths DFA stts (which r lso highlight blow) ing stts:,1,4 b,2,2,4,2,5,2,6,5 In th nxt stp, w ttmpt to rmov th NFA stt 2 rom th ing DFA stts. Notic tht, i w will mk not tht th NFA stt 2 hs bn rch by stting th history lg, thn w cn rmov th NFA stt 2 rom th ing stts subst. Th consqunc is tht ths ing stts my ovrlp with som DFA stts in th non-ing rgion, thus thy cn b rmov. Trnsitions which origint rom non-ing stt n l to ing stt n vic-vrs will now st n rst th history lg, rspctivly. Furthrmor, ll trnsitions tht rmin in th ing rgion will hv n ssocit conition tht th lg is st. Lt us illustrt th rmovl o th NFA stt 2 rom th ing stt (, 2). Atr rmovl, this stt will ovrlp with th DFA stt (); th rsulting conitionl trnsitions r shown blow:, 2,-2 b,+2,1,4 c, 2,-2, 2,5 c, 3, 6, 3,2,4, 6,2,5,2,6 161

Hr trnsition with s mns tht th trnsition is tkn whn history lg or th stt s is st; +s implis tht, whn this trnsition is tkn, th lg or s is st, n -s implis tht, with this trnsition, th lg or s is rst. Notic tht ll outgoing trnsitions o th ing stt (,2) now origints rom th stt () n hs th ssocit conition tht th lg is st. Also thos trnsitions which l to non-ing stt rsts th lg n incoming trnsitions into stt (,2) originting rom non-ing stt now hs n ction to st th lg. Onc w rmov ll stts in th ing rgion, w will hv th ollowing H-FA:, 2,-2 b,+2, 2,1,4, 2, 2,-2,5, 2, 3, 6 c, 2,-2 Notic tht svrl trnsitions in this mchin cn b prun. For xmpl th trnsitions on chrctr rom stt () to stt (,4) cn b ruc to singl unconitionl trnsition (th pruning procss is ltr scrib in grtr til). Onc w compltly prun th trnsitions, th H-FA will hv totl o 4 conitionl trnsitions; rmining trnsitions will b unconitionl. Whn thr r multipl closurs, thn multipl lgs will b us n th procur will b rptly ppli to synthsiz th H-FA. Th bov xmpl monstrts gnrl mtho o th H-FA construction rom DFA. In orr to chiv th mximum spc ruction or givn numbr o history lgs, th lgorithm shoul only rgistr thos NFA stts in th history bur which ppr most rquntly in th DFA stts. Atrwrs, th bov procur cn b rptly ppli. With multipl lgs in th history bur, som trnsitions my hv conitions tht multipl history lgs r st. Morovr, som trnsitions my ithr st or rst multipl lgs. I thr r n lgs in th history bur n h rprsnts this k- bit vctor, thn conition C will b k-bit vctor, which bcoms tru whnvr ll thos bits o h r st whos corrsponing bits in C r lso st. Th rprsnttion o conitions s vctors ss out th pruning procss, which is crri out immitly tr th construction. Th pruning procss limints ny trnsition with conition C 1, i nothr trnsition on conition C 2 xists btwn th sm pir o stts, ovr th sm chrctr such tht th conition C 1 is subst o th conition C 2 (i.. C 2 is tru whnvr C 1 is tru) n th ctions ssocit with both th trnsitions r th sm. In gnrl, pruning procss limints lrg numbr o trnsitions, n it is ssntil in rucing th mmory rquirmnts o H-FAs. Howvr, vn tr pruning, thr cn b blowup in th numbr o trnsitions. In th worst-cs, i w limint k NFA stts rom th DFA by mploying k history lgs thn thr cn b up to 2 k itionl conitionl trnsitions in th rsulting H-FA, thus thr will b littl mmory ruction. Howvr, such worst-css r rr; normlly thr is only smll blowup in th numbr o trnsitions. Anlysis o th blowup n implmnttion o history bur is prsnt in grt til in th tchnicl rport [35]. 5. H-cFA: Curing DFAs rom Aclculi W now propos History bs counting init Automt or H- cfa, which icintly curs tritionl FA rom clculi, u to which FA is unbl to icintly count th occurrncs o crtin sub-xprssions. W bgin with n xmpl; w consir th sm st o two rg-xs with th closur in th irst rg-x rplc with lngth rstriction o 4, s shown blow: r 1 =. * b[^] 4 c; r 2 =. * ; A DFA or ths two rg-xs will rquir 2 stts. Th blowup in th numbr o stts in th prsnc o th lngth rstriction occurs u to clculi or th inbility o th DFA to kp trck o th lngth rstriction. Lt us now construct n H-cFA or ths rg-xs. Th irst stp in this construction rplcs th lngth rstriction with closur, n constructs th H-FA, with th closur rprsnt by lg in th history bur. Subsquntly with vry lg in th history bur, countr is ppn. Th countr is st to th lngth rstriction vlu by thos conitionl trnsitions which st th lg, whil it is rst by thos trnsitions which rst th lg. Furthrmor, thos trnsitions whos conition is st lg r ttch with n itionl conition tht th countr vlu is. During th xcuting o th mchin, ll positiv countrs r crmnt or vry input chrctr. Th rsulting H-cFA is shown blow: ; lg<= b; lg<=1, ctr<=4 c; ilg= or ctr,1,4 c;i lg=1 & ctr=; lg<= ; lg<=,5, 3, 6 ctr i (ctr >) crmnt Consir th prs o th string bc by this mchin strting t th stt (), n with th lg n countr rst. b c ( ) (,1) ( ) (,4) (,5) (,6) (,5) (,3) lg<= 1;ctr<= 4 ctr<= 3 ctr<= 2 ctr<= 1 bcuslg ctr<= = 1n ctr = lg<= As th prsing rchs th stt (,1), n mks trnsition to th stt (), th lg is st, n th countr is st to 4. Subsqunt trnsitions crmnts th countr. Onc th lst chrctr c o th input string rrivs, th mchin mks trnsition rom stt (,5) to stt (,3), bcus th lg is st n countr is ; thus th string is ccpt. This xmpl illustrts th strightorwr mtho to construct H-cFAs rom H-FAs. Svrl kins o lngth rstrictions incluing grtr thn i, lss thn i n btwn i n j cn b implmnt. Ech o ths conitions will rquir n pproprit conition with th trnsition. For xmpl, lss thn i lngth rstriction will rquir tht th conitionl trnsition bcoms tru whn th history countr is grtr thn. From th hrwr implmnttion prspctiv, grtr thn or lss thn conition rquirs pproximtly qul numbr o gts n by n qulity conition, hnc irnt kins o lngth rstrictions r likly to hv inticl implmnttion cost. In ct, rprogrmmbl logic cn b vis qully icintly, which cn chck ch o ths conitions. Thus, th rchitctur will rmin lxibl in c o th rqunt signtur upts. This simpl cur to clculi is xtrmly ctiv is rucing th numbr o stts, spciiclly in th prsnc o long lngth rstrictions. Snort signturs compriss o svrl long lngth rstrictions, hnc H-cFA is xtrmly vlubl in implmnting ths signturs. W now prsnt our til xprimntl rsults, whr w highlight th ctivnss o our curs to th thr rg-x problms. 162

Tbl 1. Splitting rsults: Lt columns show th proprtis o complt rg-x, whil right columns show th proprtis o prixs Sourc # o ruls Rgulr xprssions implmnttion bor split Rgulr xprssions prix turs tr split Avg. ASCII lngth # o closurs # o lngth rstrictions Numbr o DFA Totl mmory Avg. ASCII lngth # o closurs # o lngth rstrictions Numbr o DFA Totl mmory Cisco 68 44.1 7 15 6 973 MB 19.8 19 1 1 152 MB Linux 7 67.2 31 4 3.7 MB 21.4 11 2 15.8 MB Bro 648 23.64 1 3.77 MB 16.1 1 1.23 MB Snort rul 1 22 59.4 9 11 5 114.6 MB 36.9 6 6 3 32.1 MB Snort rul 2 1 43.72 11 1 2 64.2 MB 16 1 2 1 6.5 MB Snort rul 3 19 3.72 8 6 N/A N/A 13.8 5 1 2 2.42 MB 6. Exprimntl Evlution W hv crri out comprhnsiv st o xprimnts in orr to vlut th ctivnss o our propos cur to th thr problms, insomni, mnsi, n clculi. Our primry signtur sts r th rgulr xprssions us in th scurity pplincs rom Cisco Systms [33]. Ths rul sts compris o mor thn 75 mortly complx rgulr xprssions. Cisco otn uss DFAs to implmnts ths ruls; consquntly, u to th stt xplosion, thy mploy mor thn gigbyt o mmory; still th prsing rts rmins sub-gigbits/s. W lso consir th rg-x signturs us in th opn sourc Snort n Bro NIDS, n in th Linux lyr-7 ppliction protocol clssiir. Linux lyr-7 protocol clssiir compriss o 7 ruls, whil Snort ruls consists o mor thn thousn n hl rg-xs. In Snort, ths rg-xs n not b mtch simultnously, bcus bor pckt is prs, it is clssii, n bs upon th clssiiction, only subst o th rg-xs r consir. Thror, w only group thos Snort signturs which corrspon to th ovrlpping hr ruls, i.. thos hr ruls which singl pckt cn mtch (w prsnt rsults o thr such groups). For th Bro NIDS, w prsnt rsults or th HTTP signturs, which contin 648 rg-xs. Sinc Cisco ruls compris o lrg numbr o pttrns, our irst stp in implmnting thm involvs grouping ths ruls into two sts: on consisting o ll thos signturs which o not contin closur, whil th scon contining ll signturs with t lst on closur. Clrly, th irst st cn b compil into composit DFA without ny iiculty. It is th scon st o rg-xs, which r problmtic n rquirs our cur mchnisms; thror ll our rsults r ovr ths signturs. First w prsnt th rsult o our splitting lgorithm, which ls to cur rom insomni. 6.1 Rg-x splitting rsults For rg-x splitting, our rprsnttiv xprimnt sts th slow pth pckt ivrsion probbility t 1%, n computs th cut in th rgxs. Our norml tric trcs wr riv rom th MIT DARPA Intrusion Dtction Dt Sts [29], whil th nomlous tric trcs wr provi to us by Cisco Systms. W hv lso crt synthtic nomlous trcs, by insrting som signturs into th norml tric trc. With ths trcs, w hv split th rg-xs into prixs n suixs. Atrwrs th prixs r xtn by on or two mor chrctrs to nsur tht slow pth rmins substntilly lss lo. W summriz th rsult o th splitting procss on th rg-xs in Tbl 1. In this tbl, w irst list th proprtis o th originl rg-xs n th mmory n to implmnt thm. Notic tht most o ths rg-x sts r sub-ivi into multipl sts. Ech st is compil into sprt DFA, bcus it is iicult to compil ll rg-xs into s singl composit DFA (u to stt xplosion). Th impliction o this sub-ivision is tht sinc ch DFA is xcut simultnously, th prsing rt or givn mmory bnwith will ruc. In th sm tbl, on th right hn si, w list th proprtis o th prixs tr th splitting. Notic tht ths prixs cn b compil into wr DFAs, which will yil highr prsing rts n lss pr low stt. Aitionlly, ths DFAs r rltivly compct howvr thir mmory rquirmnts r still much highr compr to th currnt mb mmory nsitis. Th prim rson is tht th prixs still contin smll numbr o closurs which l to mort stt xplosion. W now prsnt th rsults o our cur to mnsi, which vois such stt xplosion in th prix utomton. Tbl 2. Rsults o th H-FA n H-cFA construction, thr rsults r or th prix portions o th rg-xs Sourc # o DFA Composit H-FA / H-cFA % spc H-FA closurs, # # o totl # o # o # o Totl # Mx # o Totl # o ruction prsing rt o lngth utomt stts lgs in countrs o stts trnsitions / trnsitions with H-FA spup rstriction history in history chrctr Cisco64 14, 1 1 132784 6 3597 2 121545 94.69 - Cisco64 14, 1 1 132784 13 1861 8 682718 96.77 - Cisco68 19, 1 1 328664 17 2956 8 1337293 97.3 - Snort rul 1 6, 6 3 62589 5 6 583 8 23817 97.4 3x Snort rul 2 1, 2 1 1273 1 2 71 2 27498 98.58 - Snort rul 3 5, 1 2 4737 5 1 116 4 46124 93.48 2x Linux7 11, 2 2662 9 134 8 546378 81.63 2x 163

6.2 H-FA n H-cFA construction rsults For th prixs, w construct H-FAs, which rmticlly rucs th totl mmory. Snort prixs compris o svrl long lngth rstrictions thror w construct H-cFAs or ths. W in tht H-cFA is xtrmly ctiv in rucing th mmory; without using th counting cpbility o H-cFA, composit utomton or Snort prixs xplos in siz. In Tbl 2, w rport th rsults rom our rprsnttiv xprimnts. W highlight th numbr o lgs n countrs tht w mploy in th history bur. For Cisco ruls, w lso show how vrying th numbr o lgs cts th H- FA siz. In gnrl, with mor history lgs, th H-FA is mor compct. Notic tht th tritionl DFA comprssion tchniqus incluing th D 2 FA [34] cn b ppli to H-FA, thrby urthr rucing th mmory. Th tbl lso highlights n importnt rsult: th blowup in th numbr o conitionl trnsitions in th H-FA gnrlly rmins vry smll. In DFA thr r 256 outgoing trnsitions, whil in most o th H-FAs thr r lss thn 5. Thus, thr is lss thn 2- ol blowup in th numbr o trnsitions; on th othr hn ruction in th numbr o stts is gnrlly w orrs o mgnitu, thus th nt ct is signiicnt mmory ruction. Du to spc rstrictions, w r currntly unbl to prsnt urthr tils o th H-FA n H-cFA construction. 7. CONCLUDING REMARKS In this ppr, w propos svrl mchnisms to nhnc th prormnc o rgulr xprssions bs prsrs, which r wily us to implmnt ntwork intrusion tction systms. W bgin by intiying th thr ky limittions o tritionl pproch, n ctgoriz thm s insomni, mnsi n clculi. W propos solutions or ch o th limittion, n show tht our solutions r orthogonl with rspct to ch othr; hnc thy cn b mploy in unison. Bs upon xprimnts which wr crri out on rl signturs sts rwn rom collction o wily us ntworking systms, w show tht our solutions r in ctiv. It cn ruc th mmory rquirmnts o th stt-o-th-rt rgulr xprssions implmnttions by up to 1 tims, whil lso nbling two to thr ol incrs in th pckt throughput. W lso py qut ttntion to svrl complictions tht pprs in rl ntworks,.g. DoS protction, multipl simultnous lows, n pckt multiplxing. Thror, w bliv tht th propos solutions cn i in implmnting ntwork intrusion tction n prvntion systms much mor scurly n conomiclly. REFERENCES [1] R. Sommr, V. Pxson, Enhncing Byt-Lvl Ntwork Intrusion Dtction Signturs with Contxt, ACM con. on Computr n Communiction Scurity, 23, pp. 262--271. [2] J. E. Hopcrot n J. D. Ullmn, Introuction to Automt Thory, Lngugs, n Computtion, Aison Wsly, 1979. [3] J. Hopcrot, An nlogn lgorithm or minimizing stts in init utomton, in Thory o Mchins n Computtion, J. Kohvi, E. Nw York: Acmic, 1971, pp. 189--196. [4] Bro: A Systm or Dtcting Ntwork Intrurs in Rl-Tim. http://www.icir.org/vrn/bro-ino.html [5] M. Rosch, Snort: Lightwight intrusion tction or ntworks, In Proc. 13th Systms Aministrtion Conrnc (LISA), USENIX Assocition, Novmbr 1999, pp 229 238. [6] S. Antontos, t. l, Gnrting rlistic worklos or ntwork intrusion tction systms, In ACM Workshop on S & P, 24. [7] A. V. Aho n M. J. Corsick, Eicint string mtching: An i to bibliogrphic srch, Comm. o th ACM, 18(6):333 34, 1975. [8] B. Commntz-Wltr, A string mtching lgorithm st on th vrg, Proc. o ICALP, pgs 118 132, July 1979. [9] S. Wu, U. Mnbr, A st lgorithm or multi-pttrn srching, Tch. R. TR-94-17, Dpt. o Comp. Scinc, Univ o Arizon, 1994. [1] Fng Yu, t l., Fst n Mmory-Eicint Rgulr Exprssion Mtching or Dp Pckt Inspction, UCB tch. rport, 25. [11] N. Tuck, T. Shrwoo, B. Clr, n G. Vrghs, Dtrministic mmory-icint string mtching lgorithms or intrusion tction, IEEE Inocom 24, pp. 333--34. [12] L. Tn, n T. Shrwoo, A High Throughput String Mtching Architctur or Intrusion Dtction n Prvntion, ISCA 25. [13] I. Souris n D. Pnvmtiktos, Pr-co CAMs or Eicint n High-Sp NIDS Pttrn Mtching, Proc. IEEE Symp. on Fil-Prog. Custom Computing Mchins, Apr. 24, pp. 258 267. [14] S. Yusu n W. Luk, Bitwis Optimis CAM or Ntwork Intrusion Dtction Systms, IEEE FPL 25. [15] R. Sihu n V. K. Prsnn, Fst rgulr xprssion mtching using FPGAs, In IEEE Symposium on Fil- Progrmmbl Custom Computing Mchins, Rohnrt Prk, CA, USA, April 21. [16] C. R. Clrk n D. E. Schimml, Eicint rconigurbl logic circuit or mtching complx ntwork intrusion tction pttrns, In Procings o 13th Intrntionl Conrnc on Fil Progrm. [17] J. Moscol, t. l, Implmnttion o contnt-scnning moul or n intrnt irwll, IEEE Workshop on FPGAs or Custom Comp. Mchins, Np, USA, April 23. [18] R. W. Floy, n J. D. Ullmn, Th Compiltion o Rgulr Exprssions into Intgrt Circuits, Journl o ACM, vol. 29, no. 3, pp 63-622, July 1982. [19] Scott Tylr Shr, Mrk Jons, Ntwork g courts pps, http://inoworl.com/rticl/2/5/27/2527nwbv_1.html [2] TippingPoint X55, www.tippingpoint.com/proucts_ips.html [21] Cisco IOS IPS Dploymnt Gui, www.cisco.com [22] Trri RgEx, www. trri.com/pdf/rgex_fact_sheet.p [23] N.J. Lrsson, Structurs o string mtching n t comprssion, PhD thsis, Dpt. o Computr Scinc, Lun Univrsity, 1999. [24] S. Dhrmpurikr, P. Krishnmurthy, T. Sproull, n J. Lockwoo, Dp Pckt Inspction using Prlll Bloom Filtrs, IEEE Hot Intrconncts 12, August 23. IEEE Computr Socity Prss. [25] Z. K. Bkr, V. K. Prsnn, Automtic Synthsis o Eicint Intrusion Dtction Systms on FPGAs, in Fil Prog. Logic n Applictions, Aug. 24, pp. 311 321. [26] Y. H. Cho, W. H. Mngion-Smith, Dp Pckt Filtr with Dict Logic n R Only Mmoris, Fil Prog. Logic n Applictions, Aug. 24, pp. 125 134. [27] M. Gokhl, t l., Grnit: Towrs Gigbit Rt Ntwork Intrusion Dtction Tchnology, in FPL, Spt. 22, pp. 44 413. [28] J. Lvnoski, E. Sommr, n M. Strit, Appliction Lyr Pckt Clssiir or Linux. http://l7-iltr.sourcorg.nt/. [29] MIT DARPA Intrusion Dtction Dt Sts, http://www. ll.mit.u/ist/ivl/t/2/2_t_inx.html. [3] Vrn Pxson t l., Flx: A st scnnr gnrtor. [31] SXcl Contnt Inspction Engin, hrwr rgx cclrtion IP. [32] Ntwork Srvics Procssor, OCTEON CN31XX, CN3XX Fmily. [33] Will Ethrton, John Willims, An nco vrsion o rg-x tbs rom cisco systms provi or rsrch purposs. [34] S. Kumr t l, Algorithms to Acclrt Multipl Rgulr Exprssions Mtching or Dp Pckt Inspction, in ACM SIGCOMM'6, Pis, Itly, Sptmbr 12-15, 26. [35] S. Kumr, B. Chnrskrn, J. Turnr, n G. Vrghs, Curing Rgulr Exprssions Mtching Algorithms rom Insomni, Amnsi, n Aclculi, Wshington Univrsity tchnicl rport, 26. 164