Video compression: Performance of available codec software Introduction. Digital Video A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes advantage of the spatio temporal properties of the human eye to simulate continuity in motion. The persistence of the human eye is such that nanoseconds of exposure to an image results in milliseconds of image on the retina. Hence, images played at a speed greater than a millisecond would appear to be continuous. In general, the eye cannot differentiate between individual images when they are played at a rate of 25 per second or higher. Several standards for television exist, which define the frame rate of the video being displayed. Some of them are the NTSC, PAL etc. The frame rate varies from 25 fps to 60fps depending on the standard. The video file consists of the individual images (also known as frames) and the sequencing information..2 The Size Barrier Consider a video that is being played out at the rate of 30 images per second. For a 640x480 grayscale video represented in the raw lossless format, that would be 640x480x30 bytes per second. For a 30 minute video, this would be approximately 6GB. For a colour video, using three bytes per pixel, that would be 48GB not even including the audio and the sequencing information. This is almost the size of 2 blue ray discs for a small sized SD video. For some modern HD transmissions, the frame sizes are as high as 920x080 which would work out to video sizes be greater than 300GB when uncompressed..3 Video Compression It is impossible to even imagine transmitting videos of such huge sizes. To reduce the size of the video to manageable proportions, the videos are usually never stored or transmitted in the raw format. Even in situations where compression is not required, the video is still compressed. This is because, the human eye is insensitive to higher frequencies and minute variations in colour and transmitting this information would be a waste of resources. Every video is subjected to some kind of compression. The compression method is based upon the application and bandwidth constraints where the video is used. Compression may be classified I. Based upon the reproducibility into a. Lossless compression
As the name indicates, videos compressed using this method can be reproduced to the original content without any change in data. Some methods which perform lossless compression are Huffman coding, Run Length Coding etc. The amount of compression achieved using these methods is very less compared to those achieved using loss methods( which are discussed next). Further, the amount of compression is also greatly dependent upon the content of the video. b. Lossy compression This compression is performed by dropping information which does not significantly affect the visualization of the video. For example, the human eye is insensitive to high frequencies and also does not recognize minor variations in colours. Hence this information can be dropped while encoding the video. Methods such as JPEG perform lossy compression. II. Based upon where the compression is performed into a. Intraframe compression This method takes advantage of the spatial redundancy present in each frame of the video and compresses each frame based upon one of the compression methods. In general, indoor videos have a uniform, non changing background and have a very high spatial redundancy which can be greatly compressed. b. Interframe compression This method identifies the temporal redundancies between consecutive frames in a video and attempts to remove them. Usually, videos do not have much scene changes and hence will have a lot of temporal redundancy. Usually good video formats implement both Inter and intra frame compression techniques..4 Video encoding formats A video encoding format is a representation for compressed video. Such a format specifies the representation of each frame, the sequencing information between frames and compression and decompression methods for inter and intra frame redundancy. Although maximum compression is targeted, usually, all formats have a certain amount of redundancy in them. This is to maintain performance in environments where there is frame dropping and data loss. Error propagation resistance mechanisms are part of the specifications of all video encoding formats. These also assist in seeking of data. Without these, every time we watch a move, we would have to start from the beginning without being able to cue forward.
Some popular formats for video encoding are: wmv Mpeg Mpeg 4 Asf.5 Container formats Container formats are different from encoding formats. They hold combinations of the video and audio encoded formats. They specify the bitrates of the audio and video and help maintain the synchronization between the audio and video. Some containers are designed to hold only a specific combination of audio and video while some are capable of holding several combinations (but only one combination at a time). Two popular container formats are.avi and.wmv..avi can be used to hold several video formats including mpeg 4 and mpeg Some video encoding formats are containers in themselves and are capable of holding both audio and video. For example, MPEG, MPEG 4.6 Codec A codec is an acronym for Coder Decoder. It is capable of encoding a set of images into a video and decoding a video into a set of images. Each image usually constitutes a frame in the video. However, several additional frames are added for the reasons discussed earlier. Each codec is capable of working with only a specific video format. However, several codecs can exist for a single format. Usually, each multimedia company format has its own codec for its player for a format. For example,, mov are all codecs for the mpeg 4 format. Codecs can be in either software or hardware. The software codecs are slower and inexpensive as compared to the hardware codecs which are much faster. The specifications for a format are not rigid and provide for some variations. Although codecs implement a specified format, they may vary in their method of operation resulting in variations in quality and performance. 2 Codec Evaluation With the ever increasing need for bandwidth, codec designers tend to be over greedy and design algorithms which might badly affect the aesthecity of the video content. Hence, evaluation criteria for codec performances are required to verify the quality of the compressed videos. 2. Criteria for Comparison The codecs are compared based on the following criteria
. Quality of Video 2. Performance of the codec 2.2 Quality of video Quality of video corresponds to the look and feel of the video, the resolution, the artifacts, the blurring and other visual aesthetic components. The quality of video depends on both the format of the video and the codec used to encode to that format. Usually, several codecs implement a single format. However, each one differs from the other. Quality also depends on the amount of information on the video being encoded. Also, the performance will not be constant throughout the video. Clips with higher information have more artifacts than scenes with little movement and scene changes. Quality can be measured as objective or subjective. 2.2. Objective Quality Objective quality is to measure the quality in mathematical terms which makes it very easy to compare and evaluate. Some of the metrics available to measure objective quality are: a. Mean Square Error (MSE): It is the second moment of the difference and describes the variance between the original frame and the encoded frame. b. Peak Signal to Noise Ratio (PSNR): The ratio between the maximum signal level and the noise. Mathematically, it is given by: c. Colour Difference: This is the absolute difference of the individual colour components between the input frame and the output frame. It is calculated by d. Structural Similarity (SSIM)[2] This is used to measure the similarity between two images. It is a number between 0 and. It is a function of luminance, contrast and structural similarity. It is independent of the colour components. 2.2.2 Subjective Quality Subjective quality is measured by visually inspecting the encoded video for artifacts, blurring, blocking and overall quality.
2.3 Performance of the Codec The performance of the codec is measured as a function of three quantities. Compression ratio of the codec( File size) 2. Speed of encoding( compression) 3. Speed of decoding( decompression) 2.3. Compression ratio of the codec The compression ratio of the codec is measured by encoding a repetitive set of frames using the codecs to yield videos of different formats. The file size of the encoded video to the uncompressed video will act as a measure of the capacity for compression. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios. 2.3.2 Encoding and Decoding speed The encoding and decoding speed vary from codec to codec and within the same codec for different frames. Higher the redundancy, slower the encoding and smaller is the size of the file. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios. 2.4 Bit Rates Bit Rate is measured in Kilo Bits per second and represents the amount of data flow per unit time. It is an important factor that decides the quality of the video. For example, consider a video which has a bit rate of 000 KbPS. For a standard definition video, this would mean that there would be about 29.8 frames in the 000 Kilo Bits i.e about 33 Kilo Bits per frame or 4 Kilo bytes per frame. This restricts the amount of data that can be used to represent a frame. Lower bit rates mean higher compression and lower quality of video, more noise, blocking, discolouration etc. Application Bit Rates a Video streaming 00 500 KbPS b SD video 500 2000 KbPS c HD video >2000 KbPS By measuring each of the quantities discussed in 2.3 and 2.4, we will be able to identify the appropriate codec for a specific application. 3 Implementation 3. Codecs The following codecs are being evaluated in this study
Sl. Codec Designer/Developer Format Container No WMV2 Microsoft wmv wmv 2 Theora Xiph.org MPEG 4 avi 3 Asf Microsoft 4 MPEG4 MPEG MPEG 4 mp4 5 Quicktime Apple MPEG 4 mov 6 MPEG MPEG MPEG mpeg All codecs are part of the ffmpeg library. 3.2 Dataset The following videos are used for evaluation of the codecs. The reason for selection of the video is also described. All videos are of 352x288 pixel dimension, but may appear stretched in this document. 3.2. Quality Measurement 3.2.. Akiyo Figure A frame from the Akiyo video sequence This is a 300 frame video in the uncompressed YUV format. This video shows a news reader. It has no background changes and almost negligible foreground changes.
3.2..2 Foreman Figure 2 A frame from the Foreman video sequence This is a 300 frame video in the uncompressed YUV format. This video has a sudden scene change at the end. Other than that, there is no background change. Only the face shows rich emotions which can be hard to compress 3.2..3 Football Figure 3 A frame from the Football video sequence This is a 25 frame video in the uncompressed YUV format. This video has a constant background and a very rapid and large change in the foreground as player keep coming in and going out of the frames.
3.2..4 Stephan Figure 4 A frame from the Stephan video sequence This is a recording of Stephan Edberg s tennis match. This is 300 frames in length and is also in the uncompressed YUV format. This video has a fast foreground change as the player runs about, and a background change as the camera follows him. This would be the hardest kind of natural video to encode. Video Foreground Change Background Change Akiyo Foreman Football Stephan 3.2.2 Performance Measurement In order to measure the performance in terms of compression ratio and speed of encoding, I have proposed a set of frames as shown below. These frames will together allow us to measure the best and worst case scenarios.
Alternate frames Spatial Redundancy Temporal Redundancy 3.2.2. 00% 00% 3.2.2.2 00% 0% 3.2.2.3 =0% 00% 3.2.2.4 =0% =0% These pairs of alternating frames incorporate the best and worst case scenarios for compression. 3.2.3 Bit Rates In order to cover the entire range of applications, the videos will be encoded to the following Bit rates: a. 600 KbPS This is the range at which youtube plays its videos. b.,000 KbPS This is the bit rates generally used in video conferencing c. 3,000KbPS These bit rates are generally used in optical disc playbacks. 4 Results and Discussion As part of the exercise, I was able to mesure most of the evaluation parameters. However, due to issues with the ffmpeg library, I did not get an accurate measure of the coding and decoding times. 4. Akiyo
Akio MSE 2.5 2.5 qt 0.5 0 600 000 3000 Figure 5 Mean Squared Error for Akiyo Akio PSNR 48 47 46 45 44 qt 43 600 000 3000 Figure 6 PSNR for Akiyo Akio Absolute Color Distance. 2 0.8 0.6 0.4 0.2 qt 0 600 000 3000 Figure 7 Absolute Colour distance for Akiyo Akio SSIM 0.9995 0.999 0.9985 0.998 0.9975 0.997 0.9965 0.996 600 000 3000 qt Figure 8 Structural Similarity for Akiyo
Football MS E 0 9 8 7 6 5 4 3 2 0 600 000 3000 Theora Figure 9 Mean Squared Error for Football Football PSNR 43 42 4 40 39 38 37 36 600 000 3000 wmv Theora Figure 0 PSNR for Football Football Absolute Color Distance 5 4.5 4 3.5 3 2.5 2.5 0.5 0 600 000 3000 Figure Absolute Colour Distance for Football Football SSIM 0.98 0.96 0.94 0.92 0.9 Theora 0.88 600 000 3000 Figure 2 Structural Similarity for Football
Foreman MS E 5 4.5 4 3.5 3 2.5 2.5 0.5 0 600 000 3000 Figure 3 Mean Squared Error for foreman Theora Foreman PSNR 47 46 45 44 43 42 4 40 39 600 000 3000 Figure 4 PSNR for foreman Theora Foreman Absolute Color Distance 3 2.5 2.5 0.5 0 600 000 3000 Figure 5 Absolute Colour Distance for foreman Foreman SSIM.005 0.995 0.99 0.985 0.98 Theora 0.975 600 000 3000 Figure 6 Structural Similarity for foreman
Stephan MSE 0 8 6 4 2 qt 0 600 000 3000 Figure 7 Mean Squared Error for Stephan Stephan PSNR 45 44 43 42 4 40 39 38 37 36 600 000 3000 qt Figure 8 PSNR for Stephan Stephan Absolute Color Distance 5 4 3 2 qt 0 600 000 3000 Figure 9 Absolute Colour Distance for Stephan Stephan SSIM.02 0.98 0.96 0.94 0.92 0.9 0.88 600 000 3000 qt Figure 20 Structural Similarity for Stephan
W-W File Sizes B-W File Sizes 0 00 200 300 400 mpeg- mpeg-4 0 200 400 600 800 000 200 400 600 mpeg- mpeg-4 Size in KB Size in KB Figure 22 File sizes with high spatial and temporal redundancy Figure 2 File sizes with high spatial and low temporal redundancy Figure 23 File sizes with low spatial and high temporal redundancy C-C File Sizes Figure 24 File sizes with low spatial and low temporal redundancy C-N File Sizes mpeg- mpeg-4 mpeg- mpeg-4 0 500 000 500 2000 2500 3000 0 5000 0000 5000 20000 Size in KB Size in KB
Following are some sample frames from the encoded videos Figure 25 Counter Clockwise from the top a frame from the Stephan video original frame, wmv encoded at 600kbps and wmv encoded at 3000kbps Figure 26 Counter Clockwise from the top a frame from the Akiyo video original frame, wmv encoded at 600kbps and wmv encoded at 000kbps
In Figure 25, the distortion is clearly visible when encoded at 600kbps, but at 3000kbps, it is almost negligible. However, in Figure 26, there is no visible distortion even at 600kbps. This implies that the encoding process is sensitive to the content of the video also. 5 Conclusion Selection of a format for encoding or representation depends upon the application which uses the video. The various criteria to be considered before selecting a format are: Application o Transmission Videos used for transmission and viewing over the internet require a high compression ratio. They can compromise on the quality as such videos are rarely used for important applications. o Video Conferencing Video conferencing applications have specific criteria when it comes to quality. They need the videos to be clear, but the frame rate can be compromised. Surveillance videos also fall into this category. The encoding and decoding speed are of significance here. o Archiving Videos used for this purpose do not have significant demands on encoding or decoding speed. They require higher resolution and quality with lower file sizes. Performance Requirements o Real time video processing for UAVs etc. The requirement here is for faster encoding speed and very little blurring o Video Viewing Video viewing, in general, does not have much processing requirements. This is because of the availability of sufficient processing capability and non real time nature of the application. Quality requirements o Entertainment o Conferencing o Surgical procedures 6 Future Work Possible future work includes
a. Measuring blurring effects of the codecs b. Measuring blocking effects and impact on edge detection algorithms c. Evaluating coding and decoding times. d. Identifying impact of frame size on coding speed and compression ratio. 7 References [] Madhuri Khambete, and Madhuri Joshi, Blur and Ringing Artifact Measurement Image Compression using Wavelet Transform, PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 20 APRIL 2007 ISSN 307-6884 [2] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 3, NO. 4, APRIL 2004