Cmprssin Outlin CPS 296.3:Algrithms in th Ral Wrl Data Cmprssin III Intrutin: Lssy vs. Lsslss, Bnhmarks, Infrmatin Thry: Entrpy, t. Prbability Cing: Huffman + Arithmti Cing Appliatins f Prbability Cing: PPM + thrs Lmpl-Ziv Algrithms: LZ77, gzip, LZ78, mprss (Nt vr in lass) Othr Lsslss Algrithms: Burrws-Whlr Lssy algrithms fr imags: JPEG, MPEG,... Cmprssing graphs an mshs: BBK 296.3 Pag 1 296.3 Pag 2 Lmpl-Ziv Algrithms LZ77 (Sliing Winw) Variants: LZSS (Lmpl-Ziv-Strr-Szymanski) Appliatins: gzip, Squz, LHA, PKZIP, ZOO LZ78 (Ditinary Bas) Variants: LZW (Lmpl-Ziv-Wlh), LZC Appliatins: mprss, GIF, CCITT (mms), ARC, PAK Traitinally LZ77 was bttr but slwr, but th gzip vrsin is almst as fast as any LZ78. 296.3 Pag 3 LZ77: Sliing Winw Lmpl-Ziv a a a a a b a b a b a Ditinary (prviusly ) Cursr Lkaha Buffr Ditinary an buffr winws ar fix lngth an sli with th ursr Rpat: Output (p, l, ) whr p = psitin f th lngst math that starts in th itinary (rlativ t th ursr) l = lngth f lngst math = nxt har in buffr byn lngst math Avan winw by l + 1 296.3 Pag 4 1
LZ77: Exampl a a a a a b a b a a a (_,0,a) a a a a a b a b a a a (1,1,) a a a a a b a b a a a (3,4,b) a a a a a b a b a a a (3,3,a) a a a a a b a b a a a (1,2,) Ditinary (siz = 6) Lngst math Buffr (siz = 4) Nxt haratr 296.3 Pag 5 LZ77 Ding Dr kps sam itinary winw as nr. Fr ah mssag it lks it up in th itinary an insrts a py at th n f th string What if l > p? (nly part f th mssag is in th itinary.) E.g. it = ab, wr = (2,9,) Simply py frm lft t right fr (i = 0; i < lngth; i++) ut[ursr+i] = ut[ursr-ffst+i] Out = ab 296.3 Pag 6 LZ77 Optimizatins us by gzip LZSS: Output n f th fllwing tw frmats (0, psitin, lngth) r (1,har) Uss th sn frmat if lngth < 3. a a a a a b a b a a a a a a a a b a b a a a a a a a a b a b a a a (1,a) (1,a) (1,) a a a a a b a b a a a (0,3,4) Optimizatins us by gzip (nt.) 1. Huffman th psitins, lngths an hars 2. Nn gry: pssibly us shrtr math s that nxt math is bttr 3. Us a hash tabl t str th itinary. Hash kys ar all strings f lngth 3 in th itinary winw. Fin th lngst math within th rrt hash bukt. Puts a limit n th lngth f th sarh within a bukt. Within ah bukt str in rr f psitin 296.3 Pag 7 296.3 Pag 8 2
Th Hash Tabl Thry bhin LZ77 7 8 9 101112131415161718192021 a a a a a b a b a a a Th Sliing Winw Lmpl-Ziv Algrithm is Asympttially Optimal, A. D. Wynr an J. Ziv, Prings f th IEEE, Vl. 82. N. 6, Jun 1994. Will mprss lng nugh strings t th sur ntrpy as th winw siz gs t infinity. Sur ntrpy fr a substring f lngth n is givn by: a a 19 a b 15 a a 11 H n = n X A 1 p( X )lg p( X ) a a 10 a b 12 a a 9 a a 7 a a 8 Uss lgarithmi (.g. gamma) fr th psitin. Prblm: lng nugh is rally rally lng. 296.3 Pag 9 296.3 Pag 10 Cmparisn t Lmpl-Ziv 78 Bth LZ77 an LZ78 an thir variants kp a itinary f rnt strings that hav bn sn. Th iffrns ar: Hw th itinary is str (LZ78 is a tri) Hw it is xtn (LZ78 nly xtns an xisting ntry by n haratr) Hw it is inx (LZ78 inxs th ns f th tri) Hw lmnts ar rmv Lmpl-Ziv Algrithms Summary Aapts wll t hangs in th fil (.g. a Tar fil with many fil typs within it). Initial algrithms i nt us prbability ing an prfrm prly in trms f mprssin. Mr mrn vrsins (.g. gzip) us prbability ing as sn pass an mprss muh bttr. Th algrithms ar bming utat, but ias ar us in many f th nwr algrithms. 296.3 Pag 11 296.3 Pag 12 3
Cmprssin Outlin Intrutin: Lssy vs. Lsslss, Bnhmarks, Infrmatin Thry: Entrpy, t. Prbability Cing: Huffman + Arithmti Cing Appliatins f Prbability Cing: PPM + thrs Lmpl-Ziv Algrithms: LZ77, gzip, mprss, Othr Lsslss Algrithms: Burrws-Whlr ACB Lssy algrithms fr imags: JPEG, MPEG,... Cmprssing graphs an mshs: BBK Burrws -Whlr Currntly nar bst balan algrithm fr txt Braks fil int fix-siz blks an ns ah blk sparatly. Fr ah blk: Srt ah haratr by its full ntxt. This is all th blk srting transfrm. Us mv-t-frnt transfrm t n th srt haratrs. Th ingnius bsrvatin is that th r nly ns th srt haratrs an a pintr t th first haratr f th riginal squn. 296.3 Pag 13 296.3 Pag 14 Burrws Whlr: Exampl Lt s n: Cntxt wraps arun. Last har is mst signifiant. Cntxt Char Srt Cntxt Cntxt Output Burrws Whlr Ding Ky Ia: Can nstrut ntir srt tabl frm srt lumn aln! First: srting th utput givs last lumn f ntxt: Cntxt Output 296.3 Pag 15 296.3 Pag 16 All rtatins f input 4
Burrws Whlr Ding Nw srt pairs in last lumn f ntxt an utput lumn t frm last tw lumns f ntxt: Cntxt Output Cntxt Output 296.3 Pag 17 Burrws Whlr Ding Rpat until ntir tabl is mplt. Pintr t first haratr prvis uniqu ing. Cntxt Output Mssag was in first psitin, pr in wrapp fashin by :. 296.3 Pag 18 Burrws Whlr Ding Burrws-Whlr: Ding Optimizatin: Dn t rally hav t rbuil th whl ntxt tabl. What haratr ms aftr th first haratr, 1? Cntxt Output 1 1 2 2 1 2 1 2 Just hav t fin 1 in last lumn f ntxt an s what fllws it: 1. Obsrvatin: instans f sam haratr f utput appar in sam rr in last lumn f ntxt. (Prf is an xris.) 296.3 Pag 19 Th rank is th psitin f a haratr if it wr srt using a stabl srt. Cntxt Output Rank 6 1 4 5 2 3 296.3 Pag 20 5
Burrws-Whlr D Funtin BW_D(In, Start, n) S = MvTFrntD(In,n) R = Rank(S) j = Start fr i=1 t n Out[i] = S[j] j = R[j] Rank givs psitin f ah har in srt rr. S D Exampl 4 6 2 4 6 5 3 1 1 2 5 3 Rank(S) Out 1 2 3 4 5 6 296.3 Pag 21 296.3 Pag 22 Ovrviw f Txt Cmprssin ACB (Assiat Cr f Buyanvsky) PPM an Burrws-Whlr bth n a singl haratr bas n th immiatly pring ntxt. LZ77 an LZ78 n multipl haratrs bas n maths fun in a blk f pring txt Can yu mix ths ias, i.., multipl haratrs bas n immiatly pring ntxt? BZ s this, but thy n t giv tails n hw it wrks urrnt bst mprssr ACB als s this ls t bst Kp itinary srt by ntxt (th last haratr is th mst signifiant) Fin lngst math fr ntxt Fin lngst math fr ntnts C Distan btwn maths in th srt rr Lngth f ntnts math Has aspts f Burrws-Whlr, an LZ77 Cntxt Cntnts 296.3 Pag 23 296.3 Pag 24 6