This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. Alrig Mli-Prs Mhig o Comprss HTTP Trffi A Brmlr-Brr Compr Si Dp. Irisipliry Cr, Hrzliy, Isrl Emil: rmlr@i..il Yro Korl Compr Si Dp. Irisipliry Cr, Hrzliy, Isrl Emil: korl.yro@i..il Asr O of h fml hiq whih is s oy y work sriy ools o mliios iviis is sigr s io. Toy, h prform of h sriy ools is omi y h sp of h srig-mhig lgorihms h hs sigrs. Crrly hs sriy ools o o l wih omprss rffi, whih oms mor mor ommo i HTTP. HTTP proool ss h GZIP omprssio, whih firs rqirs som ki of omprssio phs for prformig h mli-prs mhig sk. Ths, hr is high prform ply i pr mhig o omprss. I his ppr w prs ovl lgorihm, Aho-Corsiks lgorihm for Comprss HTTP (ACCH) h ks vg of iformio ghr y h omprssio phs i orr o lr h ommoly s Aho-Corsik pr mhig lgorihm. W show y lyzig rl HTTP rffi rl WAF sigrs prs, h w skip sig p o 75% of h. Srprisigly, w show h i som siios, i is fsr o o pr mhig o h omprss, wih h ply of omprssio, h oig pr mhig o rglr rffi. As fr s w kow w r h firs ppr, h lyzs h prolm of o-h-fly mli-prs mhig lgorihms o omprss HTTP rffi sggs solio. I. INTRODUCTION O of h fml hiqs whih is s oy y sriy ools sh s work irsio io sysm (NIDS) or w ppliio firwll (WAF) o mliios iviis is sigr s io. I his hiq, h sriy ools lr wh sigrs, fi s of prs of mliios iviis, ppr i h rffi. Toy, h prform of h sriy ools is omi y h sp of h srig-mhig lgorihms h h sigrs []. HTTP omprssio, lso kow s o oig, is plily fi wy o omprss xl o rsfrr from w srvrs o rowsrs. Mos poplr sis ppliios s HTTP omprssio, sh s Yhoo!, Googl, MSN, YoT Fook. This srs-s mho of livrig omprss o is il io HTTP., mos mor rowsrs h sppor HTTP. sppor GZIP omprssio [2]. O vrg, o oig svs ro 75% of x fils (HTML, CSS, JvSrip) 37% ovrll [3]. Comprss HTTP is slly s o h rspos si from srvr o li o h f h grlly rsposs oi mos of h whil rqss slly oi shor URL srig. Mli-prs mhig o omprss rffi is iffil prolm, si i rqirs wo im-osmig phss: rffi omprssio pr mhig. Crrly mos sriy ools o o l wih omprss rffi. I som of h ss hy js o o s omprss rffi, whih my h s of miss-io of mliios iviy. I ohr ss, h sriy ools sr h hr will o omprss rffi y r-wriig h HTTP hr w h origil li srvr. I his hiq, h sriy ools hg h li si rspos i sh wy h i iis h omprssio is o sppor y h li s rowsr. This mho hrms h prform wih of oh li srvr for h spifi oio. Th fw sriy ools h hl HTTP omprss rffi slly work i proxy mo, mig h hy osr h fll pg o h proxy y omprssig i, fr h prform sigr s h forwr i o h li. Th ls opio is o pplil i sriy ools h op wih high sps i s whr isrig ly is o rl opio. I his ppr w prs ovl lgorihm, Aho-Corsiks lgorihm o Comprss HTTP (ACCH). Th lgorihm is s o h followig srprisig osrvio: w k vg of h f h w l wih omprss rffi o lr h mli-prs mhig phs. Spifilly, w s h f h h GZIP omprssio lgorihm, works y limiig rpiios of srigs sig k-rfrs (poirs) o h rp srigs. Or ky i is o sor iformio pro y h pr mhig lgorihm, for h lry s omprss rffi, h i s of poirs, o s his i orr o rs if hr is possiiliy of fiig mh or w skip sig his r. W show h y sig his iformio, w skip p o 75% of h gi p o 7% improvm i h prform of mli-prs mhig lgorihm. As fr s w kow w r h firs ppr h lyzs h prolm of o-h-fly mli-prs mhig lgorihm o omprss HTTP rffi, sggs solio. II. BACKGROUND Th GZIP lgorihm: O of h HTTP. romm omprssio lgorihms h mos ommoly s is h GZIP lgorihm [4] [2]. GZIP is s o h DEFLATE lgorihm h omprsss h fil sig omiio of h 978--4244-353-5/9/$25. 29 IEEE 397 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. followig wo omprssio hiqs: h x is omprss y h LZ77 lgorihm. Th, h op is omprss y h Hffm oig. W lor o h wo sgs: LZ77 Comprssio [5]- Th si i of h LZ77 omprssio hiq is h w omprss sris of ys (hrrs) if w spo h his sris of ys hs lry ppr i h ps (spifilly, i h sliig wiow of h ls 32KB of omprss ). I sh s w o his sris of ys (o y rp srig) y h pir (is,lgh) whr is is mr w -32768 (32KB) iis h is i ys of h rp srig lgh is mr w 3-258 iis h lgh of h srig i ys. For xmpl, h x: f, will omprss o: f(6,4), i.., rr 6 ys opy 4 ys from h poi. Hffm Coig [6]- I HTTP, Hffm os omprss ys poirs (i.., s mrs). Th Hffm iiory is slly o h giig of h omprss fil (ohrwis prfi iiory is sl). Mli-prs mhig: Pr-mhig hs opi of isiv rsrh h hs rsl i svrl pprohs. Th wo fml pprohs r h Aho- Corsik [7] h Boyr-Moor [8]. Aho-Corsik (AC) lgorihm osrs fii s mhi (FSM) for ig ll orrs of giv s of prs y prossig h ip i sigl pss, prformig s rsiio for h ip y. I his ppr w illsr or hiq sig h Aho-Corsik lgorihm, howvr s pr of or fr work, w ivsig h sg of or sggs hiq wih som ohr pr mhig lgorihms. III. THE CHALLENGES IN PERFORMING MULTI-PATTERNS MATCHING ON COMPRESSED HTTP TRAFFIC I his sio w giv ovrviw of h osls i prformig mli-prs mhig o omprss HTTP rffi. No, h lig wih w liv rffi is mh mor hllgig h h sk of offli mli-prs mhig o omprss. Firs, h omprssio mho o hos or moifi. So, pr mhig shol prform o-h-fly o h ogoig w rffi. W o h hr is o sy wy o prform mliprs mhig ovr omprss rffi wiho omprssig h i som wy. Th mi rso for his is h LZ77 is piv omprssio lgorihm o h s of k-rfrs poirs. I piv omprssio, h x rprs y h omprssio symol is rmi ymilly y h. As rsl, h sm ssrig will o iffrly pig o is loio i h x. Ths, oig h pr is fil si i will o ppr i h omprss x i som spifi form. For xmpl, osir h s whr w srh for pr. Th pr xprss i h omprss y j (j +3, 3) for ll possil j<32765. O h ohr h, Hffm oig, is o-piv wihi giv x h sm pr will lwys o o h sm i srig. Howvr, si h LZ77 pr is piv, h omiio of h wo lgorihms is hrfor piv. Th iv (ir) wy of prformig mli-prs mhig o h rffi, i rl im, s rqir y sriy ools is y omiig h followig sps (S Algorihm ): ) Rmov h HTTP hr sor h Hffm iiory of h spifi sssio i mmory. No h iffr HTTP sssios wol hv iffr Hffm iioris. 2) Do h Hffm mppig of h symol o h origil y or poir rprsio sig h spifi Hffm iiory l. 3) Do h LZ77 pr. 4) Prform mli-prs mhig o h omprss rffi. Th hllgs i h mli-prs mhig lgorihm o omprss rffi r oh from h sp im sps: Sp - O of h prolms of omprssio is is mmory rqirm; h srigh forwr pproh rqirs 32KB sliig wiow for h HTTP sssio. No h his rqirm of sorig 32KB of h omprss is iffil o voi, si h k-rfr poir rfr o y poi i h 32KB sliig wiow h poirs my rrsiv limily (i.., poir my poi o r wih poir). Figr () shows h i h isriio of poirs o rl lif s (s Sio VI for ils o h s) is ll ro h 32KB sliig wiow. O h ohr h, pr mhig of o-omprss rffi rqirs oly sorig o or wo pks (o hl ross pk ), whr vrg TCP pk is ro.5kb. H, h f h w r lig wih omprss rffi poss highr mmory rqirm y for of. Thrfor i orr o hl omprss rffi, Mi-Rg firwll h hls 3K orr sssios, s GB mmory whil High-Rg firwll h hls 3K orr sssios, s GB mmory. This mmory rqirm, hs impliio o o oly o h pri fsiiliy of h rhir lso o h piliy o prform hig. Tim - Rll h AC s-im is omi for i h prforms of sriy ools []. W iro hr simpl mol h will hlp s o ompr h im rqirm of h omprssio wih h im rqirm of h AC lgorihm. A ky ifl o h im is h iliy o prform fs mmory rfrs o h limi mmory of h h s oppos o h slowr mi mmory. W ssm i or mol, h w hoos h som of h srrs will i h mmory. L M h os of o mmory rfr, C h os of o h rfr, P l h vrg lgh of poir i h omprss rffi P r h frio of ys rprs y poirs i h omprss rffi. L B h h lok siz whih is ypilly 32 Bys i SDRAM mmory (i.., Usig hrwr solios or y spil ssmly omms h giv rommio o h lor. 398 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. 25% 2% 8% g r P 2% 5% % 5% % g r P 6% 4% 2% % 8% 6% 4% 2% % GZIP Poir Dis Disriio (ys) GZIP Poir Lgh Disriio (ys) Fig.. () Disriio of h followig poir hrrisis o h rl lif s of Sio VI () is of poir () lgh of poir. () h mmory lookp rigs o h h 32 Bys from h srroig r of h mmory rss). W show h B hs rmi ifl o h prform of h omprssio lgorihm si h poirs rfr o osiv rsss i h mmory. W sr y lyzig AC lgorihm. Th FSM rprs y wo imsiol mrix (rfrrig h Sor implmio [9]), whr row rprss s olm rprss ip y. Eh s hs lso mh poir h pois o mh pr lis if his is mh s (.k.. op s [7]) or NULL ohrwis. For vry y, h AC lgorihm prforms wo mmory rfrs, o for x s xrio h ohr o hk whhr mh ws fo. 2 If h FSM is o i h mmory o is lrg siz (for xmpl 53.MB for 533 Sor prs i 23 []), h mmory os for h y is 2M. Ohrwis, if h FSM fis io h h, o smll mr of prs or h sg of FSM omprssio hiqs [], [2], [3], [], [4] h mmory os is ro 2C. W ow lyz h GZIP lgorihm. GZIP miis wo srrs pr HTTP sssio for h omprssio phs: smll o for h Hffm iioris (lss h 7 ys, whih is smll ogh hrfor ssm o i h) lrgr o for h 32KB sliig wiow. No h iv Hffm oig is m y o i im. 3 Wors s lysis pproxims h mr of mmory lookps rqir y Hffm oig s h mr of is i h omprss fil ivi y h l ys i h omprss fil. Cosirig vrg omprssio rio of 75%, his iis h hr r 2 mmory lookps pr y i h omprss fil. Si h Hffm iiory is smll ogh o fi io h h, w pproxim h Hffm ovrh y 2C. W ow lyz h LZ77 omprssio pr. For smll mr of orr sssios whr ll of h 32KB sliig wiows fi io h, h os pr y whih is o poir is C rfr for pig h 32KB sliig wiow. 2 If hr is mor h o mhig pr for s h mor mmory sss r rqir, for simpliiy his s is igor. No h h s ( row) is mh iggr h h grl lok siz, si slly h s hols o rsiios for h of h 256 possil ips. 3 W liv h Hffm oig lgorihm improv, i is pr of or fr work. Ch Mmory AC 2C 2M GZIP omprssio 4C 5C TABLE I SUMMARY OF THE ANALYSIS OF TIME COMPLEXITY OF AC AND GZIP DECOMPRESSION WHERE C IS CACHE LOOKUP TIME AND M IS MAIN MEMORY LOOKUP TIME A y rprs y poir rqirs 2C for oh rig wriig o h 32KB sliig wiow. Aig h iiol 2 Hffm oig rfrs w r o y ol im of 4C. Howvr, if h mr of orr sssios is high, h i h s of poir, w firs o rig h loks o h h. Th vrg mr of h loks rriv pr poir, o y B P,isB P = (P l /B) +(P l mob)/b. I orr o rs h os pr y w o ivi h rsl y h vrg poir lgh. H w ol P r (B P /P l M +4C)+( P r ) 2C im. Usig rllif s, w rriv h vl of vrg P l s 7.8 (Figr ()); s Sio VI for ils o h s. By xmiig h sm s, w rriv lso h rio of ys rprs y poirs, P r s.92. Si oy h for w h rglr mmory lookp o h mmory lookp of M/C is slly w 2 B is 32Bys, h P r B P /P l o y.8m whih is ro C. H h f h mr of orr sssios is high hs mior imp o h omprssio im, s h poir rfrs o osiv ys. Si mos of h omprss fil is rprs y poirs (i. P r =.92), mmory ss im is omi y h poir lysis pr, h i is ro 5C. Tl I smmrizs h fiigs. O oom of his lysis is h osrvio h wh h FSM is i h h, h GZIP omprssio hs highr im rqirm h h AC lgorihm islf. I h s whr h FSM is i rglr mmory, h GZIP omprssio ks oly 2% of h im of h AC (ssmig M/C ). I his ppr w fos o AC prform. W show h w r h AC im y skippig mor h 7% of h FSM ss of h ys h r h ol im osri for hlig pr mhig i omprss rffi. 399 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. IV. RELATED WORK Th prolm of pr mhig o omprss hs riv io i h ox of h Lmpl-Ziv omprssio fmily [5], [6], [7], [8]. Howvr, h LZW/LZ78 r mor riv simpl for pr mhig h h LZ77 omprssio lgorihm. HTTP ss LZ77 omprssio, whih hs simplr omprssio lgorihm, prformig pr mhig o i is mor omplx sk h rqirs som ki of omprssio (s Sio II). H ll h ov works r o pplil o or s. Th ppr [9] sggs moifiio o h LZ77 omprssio lgorihm i orr o mk h sk of h mhig sir i fils. Howvr, h sggsio is o rrly implm i oy HTTP. Th ppr [2] is h oly ppr w r wr of h ls wih pr mhig ovr LZ77. Howvr, i his ppr h lgorihm is pl of mhig oly o pr i rqirs wo psss ovr h omprss x (fil), whih is o pplil for h prolm i h work omis h rqirs o-h-fly prossig. H, s fr s w r wr of, w r h firs ppr h ls wih pr mhig o omprss HTTP rffi, i.., o h LZ77 fmily, i h ox of workig. O oom of his ppr, is h srprisig olsio h i som of h ss, pr mhig o omprss HTTP rffi wih h ovrh of omprssio, is fsr h prformig pr mhig o rglr rffi. W o, h i ohr omprssio lgorihms (o LZ77 whih is or s), similr olsio ws show i ohr ox. Th pprs [2], [22], [23] show h omprssig fil o h prformig pr mhig o h omprss fil, lr h sig pross. V. AHO-CORASICK BASED ALGORITHM FOR COMPRESSED HTTP (ACCH) I his sio, w prs or Aho-Corsik s lgorihm for Comprss HTTP (ACCH). W sr y givig grl sripio iiio of h lgorihm. As rll, HTTP ss h GZIP omprssio, whih is LZ77 pr omprsss y sig poirs o ps orrs of y (hrr) sqs. Ths, h ys h h poir rfrs o i h sliig wiow (o y s s rfrr ys) wr lry s for pr mhig w s his kowlg o sv ssry ss. No h v if o pr ws mh rig h s of h rfrr ys, w sill o rs som ys of h poir. This is o h f h pr my or h ory of h poir. A prfix of h rfrr ys my sffix of pr h sr prviosly o h poir sffix of h rfrr ys my prfix of pr h ois fr (s Exmpl, i Fig. 2). Morovr, i h s of mh h rfrr ys, w sill o hk if h pr ors, si i migh h s whr oly h pr sffix is rfrr y h poir (s Exmpl 2, i Fig. 2). Iiivly, ig pr h lf ory o y oiig h s p o ri poi of h poir Algorihm Niv Domprssio wih Aho-Corsik pr mhig Trf - h ip, omprss rffi (fr Hffm omprssio) SWi 32KB - h sliig wiow of LZ77, whr SWi j is h iformio o h omprss y whih is lo j ys for rr y FSM(s, y) - AC FSM rivs s y rrs h x s, whr srsfsm is h iiil FSM s Mh(s) - if s is mh s i sors iformio o h mh pr, ohrwis NULL : s = fio sac(s) 2: s=fsm(s,y) 3: if Mh(s) NULL h 4: orig o Mh(s) 5: if 6: rr s 7: pror GZIPDomprssPlsAC(Trf Trf ) 8: s=srsfsm 9: for i=oo : if Trf i is poir (is,l) h : for j= o lgh- o 2: s = sac(s, SW i is j ) 3: for 4: p SWi wih ys SWi is is l 5: ls 6: s=sac(s, T rf i) 7: p SWi wih h y Trf i 8: if 9: for Exmpl Trf= { 8, 8 } U= Dph= Ss= 2 3 2 3 m m Lf Irl Righ Exmpl 2 Trf= { 6, 6 } U= Dph= 3 4 Ss= m Lf Irl Righ Fig. 2. Exmpl of ACCH r for CDph=2. Th mr isi vry s iis is ph. Qsio mrks ii h h orrspoig ys wr skipp hrfor hir ph is o kow o h lgorihm. Th poir r is rli y sh li. Th rfrr ys r rli y soli li. (isss lr). Dig pr h righ ory or ig whol rfr mh pr, rqirs sorig iformio o h prvios ss. O rl opio is o sor h s of h s y h FSM. Howvr, his is o fsil si or mi im is o skip sig som of h ys, h w o rmi h x s of hos skipp ys (whih migh rqir lr o i s of poir o poir). Lkily, w show h i is ogh o sor oly h iformio of whhr h s of s y is Mh h iformio o h ph of h 4 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. s y s h FSM. Th ph of s s is fi s h mr of gs i h shors simpl ro w h sr s o s s i h FSM. I orr o rs how w s h ph iformio, l s look h s of mh rfrr y. If w wol hv h x ph iformio, w ol hk if h pr ors h poir r y omprig h lgh of h mh pr, whih is ql o h mh s ph, o h mh y loio h poir r. O gi, sorig h x ph iformio is o fsil, si w o ll h ph of skipp ys. Forly, w show h i is ogh o sor iformio whih is rlxio of h ph. W o h y sorig iformio h sims if h ph is low som hrshol, o y CDph, os prmr of or lgorihm. Spifilly, w sor ss for h y i h sliig wiow. Th ss is iformio o h s w rh h FSM fr sig h y. Th ss of y is o y 2 is o of h followig yps: Mh, Chk Uhk. Mh iis rhig mh s h FSM. Th wo ohr sss of y ii simio o h ph of h s y s i FSM i h s whr h ss is o Mh. As rll, h simio is rqir, si w o ll h s ( h h ph) of h skipp ys. No, h w lso im o sp-ffii ss iformio. Or simio is i wo ss. Firs, w o sor h x ph. W fi CDph s os prmr of h lgorihm h rprss low ph. Is opiml vl is rmi sig h xprims sri o Sio VI. Or ss wol ii oly if h ph is ov or low CDph. So, w llow misk i h simio. Th simio wrog oly i o irio whr h sim ph is pr h i shol. Spifilly, w fi h ss of s y s Uhk if h ph of h s is low CDph. W mrk h ss s Chk if h ph my CDph or ov. Th my is o h f h w migh misk simig h ph of y y givig i highr vl h is l ph i h ss of skipp ys. As o i [], [2], mos of h im h FSM ss ss of low ph, h i mos ss h ss of h ys wol Uhk, hrfor w ol skip mos of h ys. W ow giv h ils of h lgorihm (s il pso o i Algorihm 2). W fi sac s fio h rivs s ip h rr s (of h FSM) y rr h x s y ss y prformig AC FSM rsiios (s Lis -). I grl, w o hl hr ss whr hr r possil orrs of prs: lf ory of poir r, righ ory of poir r irl r, whr prs ( mlipl orrs or o) r flly oi wihi poir r (s Exmpl i Fig. 2). Algorihm 2 Comprss HTTP s Aho-Corsik Prmrs s i h Niv Algorihm, wih iio: SWi 32KB - Hr SWi j h iformio o h j h y is ror of wo: SWi j. - y SWi j.s -ss Dph(s) - rr h ph of h s i h FSM CDph - h os prmr of ACCH lgorihm : (ss,s) = fio sac(s,y) 2: s=fsm(s,y) 3: if Mh(s) NULL h 4: orig o Mh(s) 5: ss = Mh 6: ls 7: if Dp(s) CDph h ss = Chk 8: lsss = Uhk 9: if : if : rr (ss,s) 2: pror ACCH(Trf Trf ) 3: s=srsfsm 4: for i=oo 5: if Trf i is poir (is,l) h 6: j=; 7: whil (Dph(s) >j)(j <l) o Lf Bory s 8: (ss,s)=sac(s,swi is j.) 9: p SWi wih SWi is j. ss 2: j++ 2: whil 22: k=j- 23: whil k<l o Chk Mhs isi h poir r 24: Fi h miiml k, j k<l sh SWi is k.s = Mh 25: If o sh k xis h k = l Cs of Righ Bory 26: Fi h mximl p, j p k sh SWi is p.s = Uhk 27: If o sh p xis h p = j 28: if j<(p CDph +) h Skip ys p wiow 29: p SWi wih SWi (is j)...(is p CDph+) 3: s=srsfsm 3: for j=(p CDph +)o (p ) o S ys opy ss from SWi 32: (s,ss)=sac(s,swi is j.) 33: p SWi wih SWi is j ss 34: for 35: if 36: for l=j o k o Irl or Righ Bory s 37: (s,ss)=sac(s,swi is l.) 38: p SWi wih SWi is l. ss 39: for 4: j=k+ 4: whil 42: ls By S (o poir) 43: (s,ss)=sac(s, SW i is i.) 44: p SWi wih SWi is i. ss 45: if 46: for 4 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. Lf Bory: I orr o pr h is i h lf ory of poir r (s Lis 7-2), i is ogh o oi sig wih sac, il w rh h j h y i h poir r, whr h ph of h s w rh i h FSM fr sig h j h y, is lss or ql o j (h mr of ys i h poir r w lry s). Thr is o o oi h s, si from his poi, if poir ois pr i wol flly oi wihi poir r his is l wih i h x s. Irl r: I orr o prs h r flly oi wihi poir r (Lis 23-4), w o hk if hr is y wih Mh ss i h rfrr ys. L k h firs ix whr h k h y hs Mh ss. No h Mh i h rfrr ys iis h hr is possiiliy of pr wihi h poir r. Spifilly, if hr is pr wihi h poir r, h hr ms Mh wihi h rfrr ys, si w r ow mhig pr h is flly oi i h poir r h, hs flly oi i h rfrr ys. Howvr, w sill o hk, si w migh hv Mh i h rfrr ys for pr whih oly is sffix ws poi y poir r. W ow s h Chk/Uhk ss o rmi how my ys for k w o s. L p h h mximl ix, sh s h p h y hs Uhk ss p<k. I is sy o s h if hr is pr wihi his r, i sr oly fr h p CDph+ posiio i h rfrr ys, si ohrwis w wol hv oriio o h fiiio of p. H w skip sig ys from j p o h posiio of p CDph +. This is h svig y or lgorihm h wy w hiv h prform improvm. Th hllgig pr is o mii orr ss for h ys w skip sig (for s i s of fr poirs o his r) wiho llig sac o hs ys. Th ky i, is o mii sss of hs ys from h orrspoig ys i h sliig wiow (s li 29). If ss of y i h poir r is Chk h h orrspoig y i h rfrr ys will Chk.This is o h f h i his pr, h s h h FSM wol hv, if w r h sac o ll h ys, is orrl o pr h is irl o h poir r h ws lso irl o h rfrr ys. No h h opposi is o r (i.., y h hs ss of Chk i h rfrr ys my hv ss Uhk i h orrspoig y h poir r, if w h r sac). A ss of y i h rfrr ys, my o pr h sr for h rfrr ys. H, i h s h w hv poir o his poir r, w migh ll sac o orr ss of ys h wr mrk s Chk whr hir r ss shol hv U hk, howvr hs r lls o o hrm h orrss of h lgorihm. I orr o oi sig, w s FSM o sr s oi sig from p CDph +. No h sss for h r p CDph+ o p whr mii from SWi (Lis 3-34). As rl sac givs mor r iformio o h rl ss of h ys, howvr, hs CDph ys for p h ss of sac my mislig - si w sr from h sr s of FSM, i will lwys rr Uhk ss v hogh i my o h s. Afr h firs CDph ys w oi sig p o h k h posiio, his im w p sss h rr from sac (lis 36-39). W h rp h s for prs h r flly oi i h rmiig poir r, il hr r o (li 23). I h s w hk if hr is pr h righ ory of h poir i similr wy o prvios h s, w fi h mximl ix p sh s h p h y hs Uhk ss oi llig sac from h p CDph + posiio. I h x horm w prov h vliiy (orrss) of h lgorihm. L P fii s of prs, Trf h omprss rffi. Thorm : ACCH s ll prs i P i h omprss rffi form of Trf. Skh of Proof: Th fll il proof is giv i h ppix. Th proof rlis o h vliiy of AC lgorihm. W prform pr mhig o h omprss rffi wi, o wih h iv lgorihm (omprssio + AC h ss ll ys), o s h Niv lgorihm, ohr im wih ACCH. Th wo lgorihms s h sm FSM, h oly iffr is h ACCH skips sig som of h ys. I Lmm 4 i h ppix, whih is h hr of h proof, w ompr for vry y i h omprss rffi h s ss, h h of h lgorihms rh. W show h h hr ivris lim hols: ) If y hs h ss of Chk i h Niv lgorihm h i will lso hv h ss Chk i h ACCH lgorihm. No h h opposi irio os o hol. 2) Iff y hs Mh ss i h Niv lgorihm, i will lso hv Mh ss i h ACCH lgorihm. No, h his is iff, i.., h wo irios hol. From his lim w ol h ACCH s ll h prs h Niv s horm follows. 3) Boh lgorihms, ACCH Niv rh xly h sm FSM s fr sig of y sigl y or fr sig omprss poir irly. This is o r wihi poir, si h ACCH my skip sig som of h ys. Th proof rlis hvily o h hrrisis of AC FSM. VI. EXPERIMENTAL RESULT I his sio, w vl h prform fi of ACCH lgorihm, fi h opiml CDph, ky prmr of or lgorihm, sig rl lif rffi. A. D S I orr o vl ACCH prform w wo ss, o of h rffi h ohr of h prs. W s rffi h ws pr y orpor firwll for 5 mis. Th rffi ois 23,698 omprss HTTP-rsposs h k 9MB i omprss form. P r, h rio of ys rprs y poirs i hs omprss fils is ql o.92, P l, h vrg lgh of h k-rfr poir is ql o 7.8. 42 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs..2 SNORT MoSriy R S ) s R ( i o R r r h C S.9.8.7.6.5.4.3.2. S Chrr Rio (Rs) Prform i o R.8.6.4.2 2 3 4 5 2 3 4 5 CDph CDph () () Fig. 3. () S hrr rio s fio of CDph for MoSriy, Sor h R Sor -s () Prform Bfi s oppos o h S hrr rio (R s) s fio of CDph for h R Sor -s ompr o h iv lgorihm prform As s of sigrs, w s wo ss: o of h MoSriy [24], op sor w ppliio firwll, o of Sor, op sor work irsio prvio io sysm [9]. I MoSriy w hoos sigrs grop h pply o HTTP-rspos (oly h rspos is omprss). Prs r ormliz i sh wy h hy will o oi rglr xprssios. 4 Tol mr of prs is 24. Th Sor -s, k from h plish rls o J 8 [9], ois mor h 8K prs. Mos of hm r i iry mo fi lss o HTML srhs. Th iry prs hv o ff o h xl HTML fils hrfor, w rmov hm wr lf wih ss of,22 xl prs. W o h h prs of Sor, r o sig o ppli o HTTP-rspos, h r lss pplil i h omi prolm of omprss rffi. Howvr, w i o s his s, si mos pr-mhig pprs rfr o i, si i givs s h o vl h ff of h mr of prs mr of mhs o or lgorihm prform. W o h Sor prs ors sigifily mor i rffi, si Sor hs prs lik s, i= or vr. Th s ois 476 orrs of MoSriy prs ovr 6M of Sor prs. B. ACCH prform I his ssio, w ompr h prform fi sig h ACCH lgorihm. W fi R s s h s hrr rio. R s is omi for for prform improvm for ACCH ovr iv lgorihm s show lr. A impor for for ACCH is h CDph prmr. Figr 3 () smmrizs R s s fio of CDph, for Sor MoSriy prs. CDph qls rprss h Niv lgorihm. As show i Figr 3, CDph 2 shows s prform for oh pr ss. R s for Sor prs is.35 for MoSriy is.24. No h irsig h vl of CDph hs wo ffs, o o h h CDph my r h mr of ys h r mrk s Chk, o h ohr h i irss h mr of ys w o s for y whos ss is Mh or Chk. Sor prs ss sigifi mo of mhs (r 4 Rglr xprssios wr op io svrl pli prs. 2% of h ol mr of ys ss). This ss for mh mor s rs hrfor mh highr R s. I orr o hk h ifl of h mr of mhs o R s w syhsiz r Sor -s y rmovig h mos frq 88 prs. Th ois 249, 886 mhs (is of 6M). As s from 3 (), rmovig oly 88 prs h sigifi ff o h R s vl (R s =.29 for CDph =2). Ths, mh rio hs mh mor sigifi ifl h mr of prs o R s vl. Figr 3 () shows h orrlio w h R s h prform fi. W s Dl Cor Il.8GHZ wih Gig RAM s plform. Thr is sligh ovrh i implmig h lgorihm, whih is w 6%-% h xpli y h iiol mmory rfrs o ss of ys sig lrgr GZIP srr. For CDph = 2, ACCH rig ovr MoSriy hiv 69% prform improvm ACCH rig ovr Sor hiv 6.4% prform improvm. I is sy o s from h lgorihm h wo fors ifl h s hrr rio, h rio of mhs, whih is slly low for sriy ools, h rio of Uhk, h mr of ys i h rffi h rh high ph i h FSM. As o for lyz y [], [2] mos of h im h FSM ss ss of low ph. Comiig h f h ACCH hs 6% 7% prform improvm wih h lysis of h os of GZIP AC (s Tl I), h srprisig fiig is h w work fsr o h omprss fils h o omprss os, spilly wh AC is i mmory. ACCH improvs ovrll prform lso wh FSM is i h for o 2%. VII. CONCLUSION AND FUTURE WORK A h hr of lmos vry mor sriy ool is pr mhig lgorihm. HTTP omprssio oms vry ommo i oy w rffi. I som of h ss omprssio ss sigifi prform ovrh o h ovrll pr mhig pross. Or lgorihm, hivs limiio of p o 75% of ss s o iformio sor i h omprss. Srprisigly, i som siios, i is fsr o o pr mhig o omprss, wih h ply of omprssio, h oig pr mhig o rglr rffi. No h ACCH is o irsiv for h AC lgorihm, 43 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. hrfor ll h mhos h improv AC FSM [], [2], [3], [], [4] r orhogol o ACCH r pplil. As fr s w kow w r h firs ppr, h lyzs h prolm of o-h-fly mli-prs mhig lgorihms o omprss HTTP rffi, sggs solio. O op irsig qsio is how o rfi h lgorihm o mximiz h gi from h ACCH pproh. 5 VIII. ACKNOWLEDGMENT W wol lik o hk Dvi Movshoviz, formr VP sriy Thologis F5 Nworks Iv Risi, VP of Sriy Rsrh of Brh Sriy h ror of MoSriy, for hlpfl sggsios. REFERENCES [] M. Fisk G. Vrghs, A lysis of fs srig mhig ppli o o-s forwrig irsio io, Thil Rpor CS2-67 (p vrsio), 22. [2] P. Dsh, Gzip fil form spifiio, My 996. RFC 952, hp://www.if.org/rf/rf952.x. [3] Wsi opimizio, ll. hp://www.wsiopimizio.om. [4] P. Dsh, Dfl omprss form spifiio, My 996. RFC 95, hp://www.if.org/rf/rf95.x. [5] J. Ziv A. Lmpl, A ivrsl lgorihm for sqil omprssio, IEEE Trsios o Iformio Thory, pp. 337 343, My 977. [6] D. Hffm, A mho for h osrio of miimm-ry os, Proigs of IRE, p. 98, 952. [7] A. Aho M. Corsik, Effii srig mhig: i o iliogrphi srh, Commiios of h ACM, pp. 333 34, 975. [8] R. Boyr J. Moor, A fs srig srhig lgorihm, Commiios of h ACM, pp. 762 772, Oor 977. [9] Sor. hp://www.sor.org (ss o Jly 28). [] N. Tk, T. Shrwoo, B. Clr, G. Vrghs, Drmiisi mmoryffii srig mhig lgorihms for irsio io, i INFOCOM 24, 24. [] T. Sog, W. Zhg, D. Wg, Y. X, A mmory ffii mlipl pr mhig rhir for work sriy, i INFOCOM 28, pp. 66 7, April 28. [2] J. v Lr, High-prform pr-mhig for irsio io, i INFOCOM 26. 25h IEEE Iriol Cofr o Compr Commiios, pp. 3, April 26. [3] V. Dimopolos, I. Ppfshio, D. Pvmikos, A mmoryffii rofigrl ho-orsik fsm implmio for irsio io sysms, i Em Compr Sysms: Arhirs, Molig Simlio. IC-SAMOS, pp. 86 93, Jly 27. [4] M. Alihrry, M. Mhprs, V. Kmr, High sp pr mhig for work is/ips, i ICNP, pp. 87 96, 26. [5] A. Amir, G. Bso, M. Frh, L slpig fils li: Pr mhig i z-omprss fils, Jorl of Compr Sysm Sis, pp. 299 37, 996. [6] T. Ki, M. Tk, A. Shiohr, S. Arikw, Shif- pproh o pr mhig i lzw omprss x, i h Al Symposim o Comioril Pr Mhig (CPM 99), 999. [7] G. Nvrro M. Rffio, A grl pril pproh o pr mhig ovr ziv-lmpl omprss x, i h Al Symposim o Comioril Pr Mhig (CPM 99), 999. [8] G. Nvrro J. Trhio, Boyr-moor srig mhig ovr ziv-lmpl omprss x, i Proigs of h h Al Symposim o Comioril Pr Mhig, pp. 66 8, 2. [9] S. Kli D. Shpir, A w omprssio mho for omprss mhig, i Proigs of omprssio ofr DCC-2, Sowir, Uh, pp. 4 49, 2. [2] M. Frh M. Thorp, Srig mhig i lmpl-ziv omprss srigs, i 27h l ACM symposim o h hory of ompig, pp. 73 72, 995. 5 O irio is o s o mor ss yp. Rll h w hv 2 i pr ss w s oly 3 sss. [2] U. Mr, A x omprssio shm h llows fs srhig irly i h omprss fil, ACM Trsios o Iformio Sysms (TOIS), pp. 24 36, 997. [22] M. Tk, Y. Shi, T. Msmoo, T. Ki, A. Shiohr, S. Fkmhi, T. Shiohr, S. Arikw, Spig p srig pr mhig y x omprssio. h w of w r., Trsios of Iformio Prossig Soiy of Jp, pp. 37 384, 2. [23] N. Zivii, E. Mor, G. Nvrro, R. Bz-Ys, Comprssio: A ky for x-grio x rrivl sysms, Compr, 2. [24] Mosriy. hp://www.mosriy.org (ss o Jly 28). [25] T. Corm, C. Lisrso, R. Rivs, C. Si, Iroio To Algorihms, Th MIT Prss MGrw-Hill Book Compy, 2. APPENDIX Corrss of ACCH lgorihm I his sio w prov horm, h orrss of ACCH lgorihm. Th proof rlis hvily o h followig hrrisis of AC FSM: Th s of h AC FSM fr rig s ip sris of ys is orrl o h pr h hs h logs prfix whih is sffix of h ip (s Lmm 2). A impor oom of his lmm, is h if h ph of s s i h AC FSM is ql o k h oly h ls k ys of h omprss rffi r rlv o h s (s Corollry 3). Formlly, l U U j h omprss ys of h ip rffi fr sig j ys, l P s of prs, P i pr sh s P i P whr P i = P i P i. L s h s of AC FSM fr sig ll ys p o U j. Lmm 2: For y P i P, h lgh of h logs prfix of y pr i P whihissffixofu U j is ql o ph(s). Proof: S [25]. L h ph of AC FSM s fr sig y U j k. Th, Corollry 3: Th s of AC FSM fr sig U U j is ql o h s s if h ip rffi o AC FSM is U j k+ U j. Th followig Lmm 4, is h hr of h proof of h Thorm. L Trf = Trf Trf N h ip, h omprss rffi (fr Hffm omprssio) l U j U j h omprss srig of Trf j.iftrf j is y h =. L U ji y i h omprss rffi - w fi ACCH ss Uji ( Niv ss Uji ) o h ss of U ji orig o ACCH lgorihm ( Niv rspivly) fr sig ll ip ys for i. Similrly, w fi ACCH s Uji (Niv s Uji ) o h s w rh h AC FSM. Lmm 4: For vry Trf m, whr m N h followig hr lims hol: ) For vry U mi (m i < m ), if Niv ss Umi = Chk h ACCH ss Umi = Chk. 2) For vry U mi (m i <m ), Iff Niv ss Umi = Mh h ACCH ss Umi = Mh (No, h wo irios of h sm hol). 3) Niv s = ACCH s Um U m I.., h s fr omprssig h m omprss ip (poir or y) is h sm. 44 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.
This fll x ppr ws pr rviw h irio of IEEE Commiios Soiy sj mr xprs for pliio i h IEEE INFOCOM 29 proigs. Proof: No h horm is srigh oom from lims 2 3. Th proof is y iio. For m =, Trf is h firs y (o poir) i h rffi omprsss o U. Ths h iio ssmpio follows, si h wo lgorihms hv sr from h sm sr s h h sm ip. Th iio sp, w ssm h lim hols il m, w proof for m. I h s Trf m is y ( o poir), i is sy o s h ll hr lims hol, si fr sig Trf m h wo lgorihms r i h sm s of h AC FSM (from Clim 3) riv h sm ip, h will rh h sm s, will hv h sm ss. Th li s, is wh Trf m is poir. W will go ovr y iio o h omprss form of h poir i.., o U m U m. For riliy l P...P l ql o U m U m, whr l = m W will prov h lims o h poir pr sprly: lf ory, irl righ ory. Lf Bory: L s look o h firs pr, whr w srh if hr xiss pr i h lf ory of h poir (lis 7-2). Th lim follows, s o Clim 3 o h iio ssmpio, h s of h wo lgorihms is h sm h poi for w sr sig h poir. Th wo lgorihms sr h sm s, riv h sm ip h will hv h sm s ss for ll h ip ys il h P j y whr Dph(ACCH s Pj )= Dph(Niv s Pj ) j. From Corollry 3 his poi h pr prfix h h Niv ACCH lgorihms ry o mh is omplly oi wihi rfrr ys. From his poi, w ssm h y pr prfix h ihr Niv or ACCH lo is flly oi wihi rfrr ys (i.., irl r s). Irl Ar: Th proof hr is ivi o wo prs, proof o h skipp ys (lis 28-35) proof o h s ys (lis 36-39). Skipp Bys: I lis 28-29 w skip sig som ys, p h ss of hos ys orig o h ss of h rfrr ys. Si w r o i h of h poir, w oly o prov h firs wo lims. Clim : w o prov h for vry P i (j <i<p CDph), if Niv ss Pi = Chk h ACCH ss Pi = Chk. Or proof pros y rio srm. L s look o h firs ix i whr h lim os o hol. H Niv ss Pi = Chk ACCH ss Pi = Uhk (w o o o prov hr h i o Mh si his is srigh oom from Clim 2). Si Niv ss Pi = Chk h ph of h s fr sig h P i y CDph, howvr h sm pr xiss flly lso wihi rfrr ys (s of lf ory proof). Th ss of h Niv lgorihm h orrspoig rfrr y i h sliig wiow is orrl o h logs possil prfix of pr h is sffix of h x s so fr (s Lmm 2) h h ss of h rfrr y is lso Chk (i oly i s wih highr or ql ph). From h iiv ssmpio of Clim h ss of h rfrr y is lso Chk for h ACCH lgorihm. Si h ACCH ss Pi is s orig o h ss of rfrr y i h sliig wiow (li 29) (i.., Chk) w riv oriio. Clim 2: Hr w o prov h for vry P i (j < i < p CDph), iff Niv ss Pi = Mh h ACCH ss Pi = Mh. W firs show h hr is o Mh ss i h skipp y of ACCH, h h irio h if ACCH ss Pi = Mh h Niv ss Pi = Mh, is prov srigh forwr from his f. From h fiiio of p hr is o Mh i h orrspoig rfrr ys. Si h ACCH ss i h rfrr ys i ix j p CDph is h sm s h ss i h sliig wiow (li 29) hr is o Mh i ACCH ss Pi. Th irio h if Niv ss Pi = Mh h ACCH ss Pi = Mh is prov i similr wy o h proof of Clim. S Bys: Hr w will prov h lims for h ys i h s r (i.., lis 36-39). Th sss of h CDph ys for p r mii from h sliig wiow, h sm wy s i h skipp ys r. Thrfor Clims 2 hol for hos ys oo. W oi h prov from posiio p. Clims 2: For l whr p l k w wol prov h Niv s Pl ACCH s Pl r ql h lims,2 follow. Si w r fr h lf ory r, w r sr h ll prs r wihi poir oris wr opi omplly from h rfrr ys h hv h sm ss s i h rfrr ys. L s look poi p, (rll p is h mximl ix for k whr h ss of h orrspoig rfrr y ACCH is Uhk): i is sy o prov h Niv ss Pp is ql o h ss of is rfrr y i h sliig wiow (proof y rio srm). From h iio ssmpio Clim, h ss of his rfrr y is h sm s of h ss orig o ACCH lgorihm. From fiiio of p, h ss is Uhk h h s of h rfrr y orig o oh lgorihms is wih is smllr h CDph. H o h AC s rlv oly h ls CDph ys (P p CDph+...P p ) (from Corollry 3). Si w rs h s of FSM i h ACCH (s lgorihm li 3), w prov i similr wy h oly h ls CDph ys r rlv o h ACCH s. H oh lgorihms r i h sm s poi p. Thrfor from his poi Niv s Pl = ACCH s Pl, for y l whr p l k. Righ Bory: No h i h prvios proof o h s r, w s oly h f h il k (o ilig k) hr is o rfrr y wih Mh ss. H h lim lso follows for h s whr k = l. I his s w lso o prov Clim 3, whih w prov y showig h for l, p l k (i.., ilig k) h ss of h l ys r h sm i oh Niv, ACCH lgorihm. 45 Ahoriz lis s limi o: Niol Chg Kg Uivrsiy. Dowlo o Jly 7, 29 :54 from IEEE Xplor. Rsriios pply.