Performance Analysis and Comparison of H.264 and VP6 Siddhartha Mukkamala (1000571314) (Siddhartha.mukkamala@mavs.uta.edu) Under guidance of Dr.K.R.Rao
What is VP6? VP6 codec was first introduced in May 2003 and was officially released in October 2003. It is a proprietary video codec developed by On2 Technologies as a successor to earlier efforts such as VP3 and VP5. Can provide higher visual quality when using at lower bitrates Macromedia has licensed this for its Flash family products. VP6 is the preferred video compression format for use with Flash Player 8 and higher.
Frames in VP6 Only two frame types Intra, or I-frames, may be reconstructed from their compressed representation with no reference to other frames in the sequence. I-frames provide entry points into the bitstream that do not require preceding frames to be decoded providing a method for fast random access. Inter, prediction or P-frames, are encoded differentially with respect to a previously encoded reference frame in the sequence. This reference frame may either be the reconstruction of the immediately previous frame in the sequence or a stored previous frame known as the Golden Frame. The alternative prediction, or Golden Frame, is a frame buffer that by default holds the last decoded I-frame but it may be updated at any time. A flag in the frame header indicates to the decoder whether or not to update the Golden Frame buffer.
Coding Profiles in VP6 Two Different Profiles - Simple and Advanced Each Frame header contains a flag,vp6profile,which indicates the profile that was used to code it. In both profiles the BoolCoder is used for encoding. The BoolCoder is a simplified binary arithmetic coder allowing tokens to be encoded with fractions of a bit. It is much more efficient in terms of compression performance than the Huffman coder, but this comes with a significantly increased computational complexity.
H.264 H.264 was developed by ITU-T Video Coding Experts Group together with the ISO/IEC Moving Picture Experts Group in 2003. Key video compression (codec) scheme in MPEG-4 part 10 format for digital media exchange. H.264 provides better image quality compared to previous standards when reaching its limits. It does not break into blocks but degrades much more smoothly, making the image softer.
H.264 [2] H.264 attains the MPEG-2 quality compression at a lower bit rate but with increased computational effort, which is not a big deal in modern super computers age. It will provide the same quality as MPEG-2 at half the date rate as shown in Figure below Video quality MPEG-2 Vs H.264 [20]
AVC Encoder H.264 Encoder [1]
AVC Decoder H.264 Decoder [1]
How does an H.264 codec work? Encoder--Prediction: (Motion estimation and compensation) Intra Prediction [2] Inter Prediction[2]
How does an H.264 codec work? [2] Intra Prediction : Intra-prediction uses the macroblocks from the same image for prediction. Two types of prediction schemes are used for the luminance component. These two schemes can be referred as INTRA_4x4 and INTRA_16x16 [16]. In INTRA_4x4, a macroblock of size 16x16 samples is divided into 16 4x4 subblocks. Intra prediction scheme is applied individually to these 4x4 subblocks. There are nine different prediction modes supported as shown in Fig below 4x4 Luma prediction (intra-prediction) modes in H.264 [10]
How does an H.264 codec work? [3] In mode 0, the samples of the macroblock are predicted from the neighboring samples on the top. In mode 1, the samples of the macroblock are predicted from the neighboring samples from the left. In mode 2, the mean of all the neighboring samples is used for prediction. Mode 3 is in diagonally down-left direction. Mode 4 is in diagonal down-right direction. Mode 5 is in vertical-right direction. Mode 6 is in horizontal-down direction. Mode 7 is in vertical-left direction. Mode 8 is in horizontal up direction. The predicted samples are calculated from a weighted average of the prediction samples A to M.
How does an H.264 codec work? [4] As shown in below figure four modes are used for prediction of 16x16 intra prediction of luminance components. The three modes, mode 0 (vertical), mode 1 (horizontal) and mode 2 (DC) are similar to the prediction modes of 4x4 block. In the fourth mode, the linear plane function is fitted in the neighboring samples.
How does an H.264 codec work? [5] Inter Prediction : Inter-prediction is used to reduce the temporal correlation with the use of motion estimation and compensation algorithms. An image is divided into macroblocks; each 16x16 macroblock is further partitioned into 16x16, 16x8, 8x16, 8x8 sized blocks. A 8x8 sub-macroblock can be further partitioned in 8x4, 4x8, 4x4 sized blocks. Figure in the next slide illustrates the partitioning of a macroblock and a sub-macroblock. The input video characteristics govern the block size. A smaller block size ensures less residual data; however smaller block sizes also mean more motion vectors and hence more number of bits required to encode theses motion vectors.
How does an H.264 codec work? [6] Macroblock partitions for motion estimation/motion compensation 16x16, 16x8, 8x16 and 8x8 [9] Macroblock sub-partitions for motion estimation/ motion compensation 8x8, 8x4, 4x8 and 4x4 [9]
How does an H.264 codec work? [7] Deblocking Filter : H.264 suffer from blocking artifacts due to block-based transform in intra and inter-prediction coding, and quantization of transform coefficients. Deblocking filter reduces the artifacts at the block boundaries and prevents the propagation of accumulated noise. The presence of the filter however adds to the complexity of the system as shown in the figure below. Filtering is applied to horizontal or vertical edges of 4x4 blocks in a macroblock. Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries shown with dotted lines) [1]
How does an H.264 codec work? [8] Subtracts the prediction from the current block---residuals Residual samples are transformed using 4x4 or 8x8 integer transform (DCT) The output of the transform a block of transform coefficients, is quantized. Encoding Variable length coding, Arithmetic coding
H.264 Profiles and levels Profile---Set of coding tools or algorithms that can be used in generating a bit stream (specific encoding tech). Level---Places constraints on certain key parameters of the bit stream. H.264 Profiles [1]
H.264 Profiles and levels [2] Baseline Profile (BP): Primaryly for lower-cost applications with limited computer resources. In BP macro blocks need not to be in raster scan order, this profile is used for real-time conversational services such as video conferencing and videophone. Main Profile (MP): Originally intended as the mainstream consumer profile for broadcast and storage applications, the importance of this profile faded when the high profile was developed for these applications. Extended Profile (XP): Intended as the streaming video profile, this profile has relatively high compression capability and some extra tricks for robustness to data losses and server stream switching.
H.264 Profiles and levels [3] High Profile (HiP): This is the primary profile for broadcast and disc storage applications, particularly for high-definition television applications. High 10 Profile (Hi10P): Going beyond today s mainstream consumer product capabilities, this profile builds on top of the High Profile-adding support for up to 10 bits per sample of decoded picture precision. High 4:2:2 Profile (Hi422P): Primarily targeting professional applications that use interlaced video, this profile builds on top of the High 10 profileadding support for the 4:2:2 chroma subsampling format while using up to 10 bits per sample of decoded picture precision.
H.264 Profiles and levels [4] High 4:4:4 Predictive Profile (Hi444PP): This profile builds on top of the High 4:2:2 profile-supporting up to 4:4:4 chroma sampling, up to 14 bits per sample, and additionally supporting efficient lossless region coding and the coding of each picture as three seperate color planes.
H.264 Frames Depending on the H.264 profile--i-frames, P-frames and B-frames can be used by an encoder. The first image in a video sequence is always an I-frame. It can be independently decoded without any reference to other images. A P-frame, which stands for predictive inter frame, makes references to parts of earlier I and/or P frame(s) to code the frame. A B-frame, or bi-predictive inter frame, is a frame that makes references to both an earlier reference frame and a future frame as shown in Figure Characteristics of I, B and P -frames [10]
Flash Commonly used to create animations. Supports bidirectional streaming of audio and video. Macromedia Flash Player is the most widely distributed software in the history of Internet. Over 414 million web users can see Macromedia Flash content without having to download a player.
What is Flash MX? Combination of Macromedia Flash player client technology and the Macromedia Flash MX environment. Allows rich internet applications to be accomplished quickly.
VP6 in Flash 8 or H.263 in MX The images in the next slides are excerpts from a 12:30 minute video of coral reef exploration. The original source was shot on DVCAM and was stored using photo-jpeg compression. The only tool used for compressing this video was Flix Professional, using default settings. since the source was directely from a camera, the 720x486 DV source needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. Flash MX ---a video format which is based on H.263
VP6 in Flash 8 or H.263 in MX [2] a b (a) VP6 true to original (b) Flash MX (H.263) - oversaturation of colors [4]
VP6 in Flash 8 or H.263 in MX [3] a b (a) VP6 Better quality picture (b) Flash MX (H.263) - Blockiness of the subject and background [4]
VP6 in Flash 8 or H.263 in MX [4] a b (a) VP6 Better picture quality (b) Flash MX (H.263) - Loss of fine details in the background [4]
VP6 in Flash 8 or H.263 in MX [5] a b Blocking artifacts (a) VP6 Better quality picture (b) Flash MX (H.263) Blocky artifacts can be visible in the subject and background [4]
Test Sequences CIF: CIF stands for Common Intermediate Format, it is a video format used in video conference systems. As illustrated in the below figure, it specifies a data rate of 30 frames per second (fps), with each frame containing 288 lines and 352 pixels per line. Hence it has a resolution of 352 x 288. Y C b C r 4:2:0 chroma sampling for CIF [1] [21]
Test Sequences [2] QCIF: QCIF stands for Quarter CIF, as illustrated in figure below it specifies each frame with 144 lines, with 176 pixels per line. Hence it has a resolution of 176x144. The "Quarter" terminology is meant to indicate that QCIF frames contain quarter as many pixels as the CIF frame and thus take up less bandwidth. 4:2:0 chroma sampling for QCIF [1] [21]
Results
Examples of VP6 Compression: Bus_cif_sequence:
Hall_cif_sequence:
Foreman_qcif_sequence:
H.264 Vs VP6 for different sequences: Foreman_qcif Sequence: Encoding 5 Frames
Hall_cif Sequence: Encoding 5 Frames
Akiyo_qcif Sequence : Encoding 5 Frames
Bus_cif Sequence: Encoding 5 Frames
Akiyo_qcif Sequence: Encoding 15 Frames
Container_qcif Encoding 15 Frames
References 1. S. Kwon, A. Tamhankar and K. R. Rao, Overview of H.264 / MPEG 4 Part 10, J VCIR, vol.17,pp.186-216, April 2006. 2. I. Richardson, V-Codex, White Paper An overview of H.264 Advanced Video Coding, www.vcodex.com, 2007. 3. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization, www.vcodex.com, 2007. 4. On2 Technologies, Inc., White Paper On2 VP6 for Flash 8 Video, http://www.on2.com,september 12, 2005. 5. On2 Technologies, Inc., White Paper TrueMotion VP7 video codec, http://www.on2.com,january 10, 2005. 6. A. Beach, Real World Video Compression, realworldvideocompression.com
7. AXIS communications, White Paper - H.264 video compression standard : New possibilities within video surveillance, www.axis.com, 2008 8. Apple Inc., Technology Brief Quicktime and MPEG-4, http://www.apple.com, 2008 9. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction, 2007,www.vcodex.com. 10. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction, 2007,www.vcodex.com. 11. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction Loop Filter,2007, www.vcodex.com. 12. H.264/AVC JM software: http://iphome.hhi.de/suehring/tml/
13. T. Wiegand, et al Overview of the H.264/AVC video coding standard, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 13, pp.560-576, July 2003. 14. A. Puri et al. Video coding using the H.264/MPEG-4 AVC compression standard, Signal Processing: Image Communication, vol.19, pp 793-849, Oct 2004. 15. G. J. Sullivan and T. Wiegand, "Video Compression From Concepts to the H.264/AVC Standard", Proceedings of the IEEE, vol. 93, Jan.2005, pp. 18-31. 16. A. Luthra, et al "Special issue on the H.264/AVC video coding standard," IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp. 557-559, July 2003. 17. G.J.Sullivan, The H.264/MPEG-4 AVC video coding standard and its deployment status, SPIE/VCIP 2005, vol.5960, pp.709-719, Beijing, China, July 2005.
18. Video test sequences link: http://trace.eas.asu.edu/yuv/ 19. The SSIM Index for Image Quality Assessment http://www.ece.uwaterloo.ca/~z70wang/research/ssim/ 20. Advanced Micro Devices White Paper - H.264 video compression standard, www.ati.amd.com. 21. http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for information about CIF and QCIF formats. 22. J.Padia H.264 to VP6 Transcoder, http://ee.uta.edu/dip/courses/ee5359/index.html 23. J.Padia Complexity reduction in VP6 to H.264 transcoder using motion vector (MV) reuse, Masters Thesis, EE Dept, UTA, MAY 2010. 24. B.Erol, et al The H.263+ Video Coding Standard: Complexity and Performance, Proceedings of the IEEE,Data Compression Conference, pp. 259 268, Dec.1998, 25. R. Neelamani, et al JPEG compression history estimation for color images, IEEE Trans. On Image Processing, Vol. 15, pp. 1365-1378, June 2006.