Basic Image Compression Algorithm and Introduction to JPEG Standard

Basc Image Compresson Algorthm and Introducton to JPEG Standard Pao-Yen Ln E-mal: r97942117@ntu.edu.tw Graduate Insttute of Communcaton Engneerng atonal Tawan Unversty, Tape, Tawan, ROC Abstract Because of the explosvely ncreasng nformaton of mage and vdeo n varous storage devces and Internet, the mage and vdeo compresson technque becomes more and more mportant. Ths paper ntroduces the basc concept of data compresson whch s appled to modern mage and vdeo compresson technques such as JPEG, MPEG, MPEG-4 and so on. The basc dea of data compresson s to reduce the data correlaton. By applyng Dscrete Cosne Transform (DCT), the data n tme (spatal) doman can be transformed nto frequency doman. Because of the less senstvty of human vson n hgher frequency, we can compress the mage or vdeo data by suppressng ts hgh frequency components but do no change to our eye. Movng pctures such as vdeo are data n three-dmensonal space conssts of spatal plane and tme axs. Therefore, n addton to reducng spatal correlaton, we need to reduce the tme correlaton. We ntroduce a method called Moton Estmaton (ME). In ths method, we fnd smlar part of mage n prevous or future frames. Then replace the mage by a Moton Vector (MV) n order to reduce tme correlaton. In ths paper, we also ntroduce JPEG standard and MPEG standard whch are the well-known mage and vdeo compresson standard, respectvely. 1

1 Introducton owadays, the sze of storage meda ncreases day by day. Although the largest capacty of hard dsk s about two Terabytes, t s not enough large f we storage a vdeo fle wthout compressng t. For example, f we have a color vdeo fle stream, that s, wth three 720x480 szed layer, 30 frames per second and 8 bts for each pxel. Then we need 720 480 3830 249Mbt/s! Ths equals to about 31.1MB per second. For a 650MB CD-ROM, we can only storage a vdeo about 20 seconds long. That s why we want to do mage and vdeo compresson though the capacty of storage meda s qute large now. In chapter 2, we wll ntroduce the basc concept of data compresson. The man dea of data compressng s reducng the data correlaton and replacng them wth smpler data form. Then we wll dscuss the method that s common used n mage/vdeo compresson n chapter 3. In chapter 4 and chapter 5, we wll ntroduce quantzaton and entropy codng. After reducng data correlaton, the amounts of data are not really reduced. We use quantzaton and entropy codng to compress the data. In chapter 6, we gve an example of mage compresson JPEG standard. The JPEG standard has been wdely used n mage and photo compresson recently. In chapter 7, we dscuss how to reduce tme correlaton wth a method called Moton Estmaton (ME). And then we gve an example of vdeo compresson MPEG standard n chapter 8. Fg. 1 Encoder and decoder of mages from Ref. [3] 2 Basc Concept of Data Compresson The motvaton of data compresson s usng less quantty of data to represent the orgnal data wthout dstorton of them. Consder the system n Fg. 1, when the 2

(a) Orgnal mage 83261bytes (b) Decoded mage 15138bytes Fg. 2 Example of mage compresson usng JPEG standard encoder receves the target mage, t converts the mage nto bt stream b. On the other hand, the decoder receves the bt stream and then converts t back to the mage I. If the quantty of bt stream b less than the orgnal mage then we call ths process Image Compresson Codng. There s an example n Fg. 2 usng JPEG mage compresson standard. The compresson rato s 15138 / 83261, about 0.1818, around one ffth of the orgnal sze. Besdes, we can see that the decoded mage and the orgnal mage are only slghtly dfferent. In fact, the two mages are not completely same, that s, parts of nformaton are lost durng the mage compresson process. For ths reason, the decoder cannot rebuld the mage perfectly. Ths knd of mage compresson s called non-reversble codng or lossy codng. On the contrary, there s another form called reversble codng that can perfectly rebuld the orgnal mage wthout any dstorton. But the compresson rato of reversble codng s much lower. For lossy codng, there s a dstorton between the orgnal mage and the decoded mage. In order to evaluate the codng effcency, we need a method to evaluate the degree of dstorton. There are two common evaluaton tools, whch are Mean Square Error (MSE) and Peak Sgnal to ose Rato (PSR). They are defned as followng: MSE W1H1 x0 y0,, f x y f x y WH 2 (1) 3

See Eq. (1), f x, y and f x, y respectvely. The mage sze s W 255 PSR 20log10 MSE (2) denote the orgnal mage and decoded mage, H. In Eq. (2), the PSR formula s common used for 8-bts mage. ote that the larger the PSR, that smaller the degree of dstorton. ow, we want to know how and why we can make an mage compressed. Generally speakng, the neghborng pxels n an mage have hghly correlaton to each other. That s why mages can be compressed n a hgh compresson rato. The mage codng algorthm today conssts of reducng correlaton between pxels, quantzaton and entropy codng. We wll dscuss these parts one by one n the followng chapters. The codng algorthm system model s shown n Fg. 3. Input mage Reduce correlaton between pxels Quantzaton Entropy codng Bt stream Fg. 3 general consttuton of mage codng algorthm 3 Orthogonal Transform and Dscrete Cosne Transform 3.1 Lnear transformaton We have studed lnear transformaton n Lnear Algebra. It s very useful to represent sgnals n bass form. For smpleness, we dscuss the case n three dmensonal space whereas the case n dmensonal space can be derve easly n the same concept. We t can express any three dmensonal vector x n a column vector x, x, x, where x, x and 1 2 3 1 2 3 x are values of the three correspondng axes. For a proper transformaton matrx A, we can transform vector x nto another vector y, we call ths a lnear transformaton process. It can be wrtten as: 4

y = Ax (3) where x and y are vectors n 3 space and A s called a transformaton matrx. Moreover, consder three lnear ndependent vectors wth dfferent drecton: t t t 1 1 0, 1 0 1, 0 1 1 v v v (4) 1 2 3 Then, any vector n the 3 space can be expressed as the combnaton of these three ndependent vectors, that s, where a1, a2 and a 3 are constants. x a v a v a v (5) 1 1 2 2 3 3 3.2 Orthogonal Transformaton Accordng to 3.3, any vector n the 3 space can be expressed as the combnaton of three ndependent vectors. If we choose these three ndependent vectors such that they are mutually ndependent, we wll have many useful propertes and the numercal computaton wll become easer. As the same n Eq. (5), moreover, we wll have v, v and 1 2 3 v that satsfy v1 v2= v2 v3= v1 v3=0 2 v1 v1= v1 =1 2 v2 v2= v2 =1 2 v3 v3= v3 =1 (6) From Eq. (5) and Eq. (6), a1, a2 and a 3 can be found by a x v, a x v, a x v (7) 1 1 2 2 3 3 We fnd that t s easy to obtan a1, a2 and a 3 just by takng nner product of the vector x and correspondng vectors. 5

3.2 Karhunen-Loeve Transformaton Because mages have hgh correlaton n a small area, for an mage wth sze K K, we usually dvde t nto several small blocks wth sze 1 2 1 2 and we deal wth each block wth a transformaton that can reduce ts pxel correlaton separately. Ths can be seen n Fg. 4. Moreover, f we choose bgger block sze we may obtan hgher compresson rato. However, an overszed block sze may have lower pxel correlaton. There wll be a tradeoff. In order to do lnear transformaton to each block n the mage, we may scan the pxel n the transformaton blocks and transform t nto an dmensonal vector. Ths can be seen n Fg. 5. The number of total transformaton blocks equals to M K1K2 12 and the number of pxels n a transformaton block s 1 2. After horzontal scannng, we have M vectors: 1 1 1 1 x x x x1 x2 x m m m m x1 x2 x M M M M x1 x2 x What we want the do s to acheve the optmal orthogonal transform for these vectors n order to reduce the pxel correlaton n each transformaton blocks. That s, fnd a transformaton matrx V such that m t m t t t (8) y V x (9) Fg. 4 Image partton and transformaton block Ref. [3] 6

Fg. 5 Transform a transformaton block nto an dmensonal vector Ref. [3] ow, consder the covarance of these vectors m m m m Cx x E x j x x j x j m m m m C E y y y y j y y j j (10) (11) The -th element of m y can be wrtten as y m m v x (12) n n n1 Then the mean of m y s y E v x v E x v x (13) m m m m n n n n n n n1 n1 n1 For smpleness, we assume each pxel value m x s subtracted wth ts mean value, that s, we substtute m x wth x m m. Then the means of latest pxel value x m x and m y change to zero. Thus we can rewrte Eq. (10) and Eq. (11): j m m C E x x (14) x x j j m m C E y y (15) y y j Eq. (14) and Eq.(15) can be wrtten n a matrx form: C xx m m m m E x1 x 1 E x1 x m m m m E x x 1 E x x (16) 7

C yy m m m m E y1 y 1 E y1 y (17) m m m m E y y 1 E y y These are called a covarance matrx. We can easly fnd that t must be a symmetrc matrx. They can be rewrtten n a vector form: m m C E xx t x x (18) m m C E yy t y y (19) Moreover, m y s obtaned by applyng lnear transformaton matrx V on m x : By Eq. (19) and Eq. (20), we fnd that m t m y V x (20) t t m m m m t t t t Cyy E E Cxx V x V x V x x V V V (21) The purpose s to obtan uncorrelated y m. ote that for an uncorrelated y m, t has a covarance matrx C yy whch s a dagonal matrx. From Eq. (21), we fnd that f C yy s a dagonal matrx then we can regard the rght-hand part of ths equaton s the egenvalue-egenvector decomposton of C xx. The matrx V s composed of the egenvectors of C xx. Usually, we have t ordered wth egenvectors that have bgger correspondng egenvalues to smaller ones. Ths s called the Karhunen-Loeve Transform (KLT). 3.3 Dscrete Cosne Transform For common used such as JPEG standard or MPEG standard, we do not use KLT. 8

Although we can have the optmal orthogonal transformaton by applyng KLT, t stll has the followng drawbacks: 1 Each mage has to do the KLT respectvely. Ths makes the computaton complexty large. 2 In order to decode the encoded mage we have to transmt the KLT transformaton matrx to the decoder. It costs another process tme and memory spaces. Therefore, f we can derve an orthogonal transform that can preserve the optmal property of KLT for all mage then we can deal wth the problems we mentoned. Then we have the Dscrete Cosne Transform (DCT). The forward DCT s defned as 7 7 1 (2x 1) u (2y 1) v F( u, v) C( u) C( v) f ( x, y)cos cos 4 x0 y0 16 16 for u 0,...,7 and v0,...,7 (22) 1/ 2 for k 0 where Ck ( ) 1 otherwse u 0 1 2 3 4 5 6 7 v Fg. 6 The 88 DCT bass, ( uv, ) xy 9

And the nverse DCT s defned as the followng equaton: 7 7 1 (2x 1) u (2y 1) v f ( x, y) C( u) C( v) F( u, v)cos cos 4 u0 v0 16 16 for x0,...,7 and y 0,...,7 (23) The F u, v s called the DCT coeffcent, and the bass of DCT s: xy, C( u) C( v) (2x 1) u (2y 1) v ( uv, ) cos cos 4 16 16 (24) Then we can rewrte the IDCT by Eq. (24): 7 7 xy, (25) u0 v0 f ( x, y) F( u, v) ( u, v) for x 0,...,7 and y 0,...,7 The 88 two dmensonal DCT bass s depcted n Fg. 6. 4 Quantzaton The transformed 88 block n Fg. 6 now conssts of 64 DCT coeffcents. The frst coeffcent F 0,0 s the DC component and the other 63 coeffcents are AC component. The DC component F 0,0 s essentally the sum of the 64 pxels n the nput 88 pxel block multpled by the scalng factor 1 4C0C0 1 8 as shown n Eq. (22) for F u, v. 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 Fg. 7 Quantzaton matrx 10

The next step n the compresson process s to quantze the transformed coeffcents. Each of the 64 DCT coeffcents s unformly quantzed. The 64 quantzaton step-sze parameters for unform quantzaton of the 64 DCT coeffcents form an 88 quantzaton matrx. Each element n the quantzaton matrx s an nteger between 1 and 255. Each DCT coeffcent F u, v s dvded by the correspondng quantze step-sze parameter Qu, v n the quantzaton matrx and rounded to the nearest nteger as : F( u, v) Fq ( u, v) Round (26) Q ( u, v ) The JPEG standard does not defne any fxed quantzaton matrx. It s the prerogatve of the user to select a quantzaton matrx. There are two quantzaton matrces provded n Annex K of the JPEG standard for reference, but not requrement. These two quantzaton matrces are shown n Fg. 7. The quantzaton process has the key role n the JPEG compresson. It s the process whch removes the hgh frequences present n the orgnal mage. We do ths because of the fact that the eye s much more senstve to lower spatal frequences than to hgher frequences. Ths s done by dvdng values at hgh ndexes n the vector (the ampltudes of hgher frequences) wth larger values than the values by whch are dvded the ampltudes of lower frequences. The bgger values n the quantzaton table s the bgger error ntroduced by ths lossy process, and the smaller vsual qualty. Another mportant fact s that n most mages the color vares slow from one pxel to another. So, most mages wll have a small quantty of hgh detal to a small amount of hgh spatal frequences, and have a lot of mage nformaton contaned n the low spatal frequences. 11

5 An Example of Image Compresson JPEG Standard JPEG (Jont Photographc Experts Group) s an nternatonal compresson standard for contnuous-tone stll mage, both grayscale and color. Ths standard s desgned to support a wde varety of applcatons for contnuous-tone mages. Because of the dstnct requrement for each of the applcatons, the JPEG standard has two basc compresson methods. The DCT-based method s specfed for lossy compresson, and the predctve method s specfed for lossless compresson. In ths artcle, we wll ntroduce the lossy compresson of JPEG standard. Fg. 8 shows the block dagram of Baselne JPEG encoder. 5.1 Zg-zag Reorderng After dong 88 DCT and quantzaton over a block we have new 88 blocks whch denotes the value n frequency doman of the orgnal blocks. Then we have to reorder the values nto one dmensonal form n order to encode them. The DC coeffcent s encoded by dfference codng. It wll be dscussed later. However, the AC terms are scanned n a Zg-zag manner. The reason for ths zg-zag traversng s that we traverse the 88 DCT coeffcents n the order of ncreasng the spatal frequences. So, we get a vector sorted by the crtera of the spatal frequency. In consequence n the quantzed vector at hgh spatal frequences, we wll have a lot of consecutve zeroes. The Zg-zag reorderng process s shown n Fg. 9. Huffman Table AC Zg-zag Huffman Color components (Y, C b, or C r ) 88 DCT Quantzer reorderng Dfference codng Huffman JPEG bt-stream DC Encodng codng Quantzaton Table Huffman Table Fg. 8 Baselne JPEG encoder 12

0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63 Fg. 9 Zg-Zag reorderng matrx 5.2 Zero Run Length Codng of AC Coeffcent ow we have the one dmensonal quantzed vector wth a lot of consecutve zeroes. We can process ths by run length codng of the consecutve zeroes. Let's consder the 63 AC coeffcents n the orgnal 64 quantzed vectors frst. For example, we have: 57, 45, 0, 0, 0, 0, 23, 0, -30, -16, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,..., 0 We encode for each value whch s not 0, than add the number of consecutve zeroes precedng that value n front of t. The RLC (run length codng) s: (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; EOB The EOB (End of Block) s a specal coded value. If we have reached n a poston n the vector from whch we have tll the end of the vector only zeroes, we'll mark that poston wth EOB and fnsh the RLC of the quantzed vector. ote that f the quantzed vector does not fnshes wth zeroes (the last element s not 0), we do not add the EOB marker. Actually, EOB s equvalent to (0,0), so we have : (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; (0,0) The JPEG Huffman codng makes the restrcton that the number of prevous 0's to be coded as a 4-bt value, so t can't overpass the value 15 (0xF). So, ths example would be coded as : (0,57) ; (15,0) ; (2,3) ; (4,2) ; (15,0) ; (15,0) ; (1,895) ; (0,0) (15,0) s a specal coded value whch ndcates that there are 16 consecutve zeroes. 13

5.3 Dfference Codng of DC Coeffcent Because the DC coeffcents n the blocks are hghly correlated wth each other. Moreover, DC coeffcents contan a lot of energy so they usually have much larger value than AC coeffcents. That s why we have to reduce the correlaton before dong encodng. The JPEG standard encodes the dfference between the DC coeffcents. We compute the dfference value between adjacent DC values by the followng equaton: Dff DC DC 1 (27) ote that the ntal DC value s set to zero. Then the dfference s Huffman encoded together wth the encodng of AC coeffcents. The dfference codng process s shown n Fg. 10. Dff 1 = Dff = Dff 1 =DC 1 DC 1DC 2 DC DC 1 DC 0 DC 1... DC 1 DC... 0...... block 1 block 1 block Fg. 10 Dfference codng of DC coeffcents Because the Huffman codng s not n the scope of ths research, the Huffman codng s not dscussed n ths paper. 6 Conclusons We have ntroduced the basc concepts of mage compresson and the overvew of JPEG standard. Although there s much more detals we dd not mentoned, the mportant parts are dscussed n ths paper. The JPEG standard has become the most popular mage format; t stll has some propertes to mprovement. The compresson 14

rato can be hgher wthout block effect by usng wavelet-based JPEG 2000 standard. 7 References [1] R. C. Gonzalez and R. E. Woods, Dgtal Image Processng 2/E. Upper Saddle Rver, J: Prentce-Hall, 2002. [2] J. J. Dng and J. D. Huang, Image Compresson by Segmentaton and Boundary Descrpton, June, 2008. [3] 酒井善則. 吉田俊之共著, 白執善編譯, 影像壓縮技術. 全華科技圖書, 2004 年 10 月. [4] G. K. Wallace, 'The JPEG Stll Pcture Compresson Standard', Communcatons of the ACM, Vol. 34, Issue 4, pp.30-44. 15