COMPARISON OF VIDEO CODECS AND QUALITY MEASUREMENT ZACHOS KONSTANTINOS

Transcription

1 COMPARISON OF VIDEO CODECS AND QUALITY MEASUREMENT ZACHOS KONSTANTINOS Master of Science in Networking and Data Communications THESIS

2 Thesis Title COMPARISON OF VIDEO CODECS AND QUALITY MEASUREMENT Dissertation submitted for the Degree of Master of Science in Networking and Data Communications By ZACHOS KONSTANTINOS SUPERVISOR DR. PAPADAKIS ANDREAS KINGSTON UNIVERSITY, SCHOOL OF COMPUTING AND INFORMATION SYSTEMS TEI OF PIRAEUS, DEPARTMENTS OF ELECTRONICS AND AUTOMATION JULY 2010

3 Table of Contents List of Figures and Tables Abstract iv vii 1 Introduction 1 2 Video codecs overview MPEG MPEG-4 Visual MPEG-4 AVC VC Quality metrics General Objective Peak Signal to Noise Ratio (PSNR) Mean Square Error (MSE) Structural Similarity (SSIM) Video Quality Metric (VQM) Subjective Double Stimulus Continuous Quality Scale (DSCQS) Double Stimulus Impairment Scale (DSIS) 27 4 Comparison Video sequences Quality measurement tool Comparison procedure 36 5 Results Objective metrics 43 ii

4 5.2 Subjective metrics Comparison 57 6 Conclusion 62 References 65 Appendix 69 A Figures and Tables 69 A.1 Objective metrics 69 A.2 Subjective metrics 85 B Encoding times and File sizes 89 B.1 Encoding times 89 B.2 File sizes 89 iii

5 List of Figures and Tables Figures Figure 2.1 A simplified MPEG-1 video encoder (Ghanbari, 2003) 5 Figure 3.1 Double Stimulus Continuous Quality Scale (DSCQS) presentation procedure (ITU-R, 2002) 25 Figure 3.2 Double Stimulus Continuous Quality Scale (DSCQS) score sheet (ITU-R, 2002) 26 Figure 3.3 Double Stimulus Impairment Scale (DSIS) variant I (ITU-R, 2002) 27 Figure 3.4 Double Stimulus Impairment Scale (DSIS) variant II (ITU-R, 2002)27 Figure 4.1 Test sequences screen shots (frame 0) 32 Figure 4.2 MSU Video Quality Measurement Tool main screen 34 Figure 4.3 MSU Perceptual Video Quality Tool main screen 35 Figure 4.4 YUV to AVI Converter main screen 37 Figure 4.5 CREW CIF 768 Kbps SSIM graph 39 Figure 4.6 Comparison procedure 42 Figure 5.1 CITY CIF PSNR-Y 43 Figure 5.2 CITY CIF PSNR-V 43 Figure 5.3 CITY CIF MSE-Y 44 Figure 5.4 CITY CIF SSIM 44 Figure 5.5 CITY CIF VQM 45 Figure 5.6 CITY 4CIF PSNR-U 45 Figure 5.7 CITY 4CIF MSE-V 45 Figure 5.8 CITY 4CIF SSIM 46 Figure 5.9 CITY 4CIF VQM 46 Figure 5.10 CREW CIF PSNR-Y 47 Figure 5.11 CREW CIF MSE-Y 47 Figure 5.12 CREW CIF SSIM 47 Figure 5.13 CREW CIF VQM 48 Figure 5.14 CREW 4CIF PSNR-Y 48 Figure 5.15 CREW 4CIF MSE-Y 49 Figure 5.16 CREW 4CIF SSIM 49 iv

6 Figure 5.17 CREW 4CIF VQM 49 Figure 5.18 SOCCER CIF PSNR-Y 50 Figure 5.19 SOCCER CIF MSE-Y 50 Figure 5.20 SOCCER CIF SSIM 50 Figure 5.21 SOCCER CIF VQM 51 Figure 5.22 SOCCER 4CIF PSNR-Y 51 Figure 5.23 SOCCER 4CIF MSE-Y 52 Figure 5.24 SOCCER 4CIF SSIM 52 Figure 5.25 SOCCER 4CIF VQM 52 Figure 5.26 CITY CIF DSCQS 53 Figure 5.27 CITY CIF DSIS 53 Figure 5.28 CITY 4CIF DSCQS 54 Figure 5.29 CITY 4CIF DSIS 54 Figure 5.30 CREW CIF DSCQS 55 Figure 5.31 CREW CIF DSIS 55 Figure 5.32 CREW 4CIF DSCQS 55 Figure 5.33 CREW 4CIF DSIS 56 Figure 5.34 SOCCER CIF DSCQS 56 Figure 5.35 SOCCER CIF DSIS 56 Figure 5.36 SOCCER 4CIF DSCQS 57 Figure 5.37 SOCCER 4CIF DSIS 57 Figure 5.38 CREW CIF 192 Kbps frame 124 original 58 Figure 5.39 CREW CIF 192 Kbps frame Figure 5.40 CREW CIF 192 Kbps frame Figure 5.41 CREW CIF 192 Kbps frame Figure 5.42 CREW CIF 192 Kbps frame Tables Table 2.1 profiles and levels (Bock, 2009) 7 Table 2.2 MPEG-4 Visual profiles (Richardson, 2003) 9 Table 2.3 MPEG-4 Visual levels of the simple based profiles (Richardson, 2003) 9 v

7 Table 2.4 MPEG-4 AVC profiles and tools (Richardson, 2003) 11 Table 2.5 MPEG-4 AVC levels (Jack, 2007) 12 Table 2.6 VC-1 profiles and levels (SMPTE 421M, 2006) 13 Table 4.1 Video sequences format (Richardson, 2003) 30 Table 4.2 Video sequences format and bitrate 32 Table 4.3 Objective metrics values interpretation 39 Table 4.4 Monitor specifications 40 Table 4.5 CREW CIF 768 Kbps score sample (observer f2) 41 Table 5.1 CREW files encoding times (sec) 61 Table 5.2 CREW file sizes (KB) 61 vi

8 Abstract Digital video has the primary role in home entertainment. There is a variety of home video devices that are capable of reproducing video of various resolutions and codecs. DVB and IPTV are going to be the main media that will be able to send digital video to our homes. Moreover, the advances in technology allow the construction of smaller and more effective mobile devices that are able to receive and reproduce various kinds of digital video. One of the main advances is the design and the creation of more efficient video codecs. The purpose of this research is the comparison of the performance of the most commonly used codecs by evaluating their quality. Also it presents the most common objective and subjective video quality metrics. Furthermore it presents the evolution of the video codecs by examining the features and the functions of the selected codecs. vii

9 1 1 Introduction Nowadays, digital technology has a dominant role in our houses. All the forms of home entertainment are now in digital. There are various home devices capable of reproducing digital video such as home media players, video game consoles and of course Blu-Ray players. All of them can reproduce video of various resolutions and codecs. Also they can receive video streams using the internet. With the cease of the analogue television broadcasting, DVB-T will be the main medium that will be able to send digital video to our homes. Also IPTV is taking its own place in the digital television. Furthermore, modern mobile devices are capable of receiving and reproducing digital video in such quality that was unimaginable a few years ago. Mobile phones, portable video players and DVB-H capable devices flood the market. The purpose of this research is the comparison of the performance of the most commonly used codecs by evaluating their quality. The aims of this study are multiple. Firstly, is the comparison and the evaluation of the selected codecs. Secondly, is the comparison of the results between the objective and the subjective metrics. Next is the validation of the results of other studies and last is the evaluation of the used tools. Furthermore the results of this research could be used by other researchers on their studies. The first step of the process is the selection of the codecs. DVB television standard supports the following codecs:, MPEG-4 AVC and VC-1. Only the handheld version of digital television (DVB-H) does not support (Jack, 2007). Furthermore, Blu-Ray also supports all the previous codecs (Blu-ray Disk Association, 2005). It is obvious that MPEG-4 AVC and VC-1 are the encoding standards for all the modern video reproduction systems. The codec is used mostly for compatibility reasons. It was the standard video codec during the last decade, so a vast archive of already encoded material exists. Furthermore, many of the digital video devices that we already own are able to reproduce.

10 2 So, the selection of the codecs was simple: MPEG-4 AVC and VC-1 for the modern media players and as a standard for the evaluation of the other codecs. It is noticeable that the average viewer is already familiar with the quality of a DVD video. The last selected codec is the MPEG-4 Visual. This codec is implemented by various famous encoders. Some of these encoders are the and the DVIX. The majority of the videos that can be found on the internet, have been encoded by these encoders. The quality of a video sequence can be assessed by using two major methods. The first is the subjective and second is the objective. The subjective is based on the human observation. A group of observers watch the video sequence under specific conditions and they evaluate it according to the specifications of each metric. There are many methods but the most commonly used, are the double stimulus metrics. These are the Double Stimulus Continuous Quality Scale (DSCQS) and the Double Stimulus Impairment Scale (DSIS). These metrics require specific viewing conditions. The ITU-R BT and the ITU-T P.901 recommendations, describe the whole procedure in detail. This research is based on the previous recommendations. The objective is usually algorithms that are based on mathematical models. Their aim is to assess the quality of a video under specific conditions. The results should be easily reproduced under the same conditions. The most commonly used objective metrics are: Peak Signal to Noise Ratio (PSNR), Mean Square Error (MSE), Structural Similarity (SSIM) and Video Quality Metric (VQM). The first two are based on the error detection between the two sequences. SSIM, measures the structural distortion. This method is more reliable because it is closer to the Human Vison System (HVS). Last is the VQM that is based on subjective observations. It is more complex but it is reliable and very close to the results of the subjective metrics. Next is the selection of the video sequences. The performance of the codecs is influenced by the content of the video sequence, so the sequences are selected primarily for their content. High motion is proved to be very important for the

11 3 performance of a codec. Furthermore, frames with human faces strongly influence the opinion of the observers. So, the selected sequences must include those characteristics. The duration of the sequences is also important because subjective metrics have strict specifications. The resolution has been decided to be CIF and 4CIF. CIF is selected for its use on mobile devices, such as mobile phones and DVB-H. 4CIF is selected because this is the standard of the SDTV. DVB-T and IPTV follow SDTV standard. The selected bitrates are three for each resolution. For the CIF format:192 kbps, 384 kbps and 768 kbps. For the 4CIF format: 2 Mbps, 4 Mbps and 8 Mbps. The selection of these bitrates is primarily based on each codec s profiles and levels and then on the needs of the modern multimedia devices. The tools that are used are the MSU Video Quality Measurement Tool and the MSU Perceptual Video Quality Tool. The first is used for the objective metrics and the second for the subjective metrics. Both tools support the selected metrics and they are fully compliant with the ITU-R BT recommendation. The initial intention was to use more video sequences and more than three resolutions for each format. But after the beginning of the testing procedure it was obvious that the required time was enormous. Relevant studies use more video sequences and more resolutions but they use fewer codecs and only one or two metrics. So the overall procedure is not so time consuming. This research practically uses 8 objective (PSNR YUV, MSE YUV, SSIM and VQM) and 2 subjective metrics. As a result the actual measurements are 576 only for the objective metrics (3 files x 2 formats x 3 bitrates x 4 codecs x 8 metrics). For the subjective, the completion of the evaluation of the sequences by one observer required nearly two hours. So the overall process took more than one month excluding the data processing.

12 4 2 Video codecs overview 2.1 MPEG-1 MPEG-1 is a video compression algorithm which was developed by the International Organization for Standardization (ISO). The main goal of the algorithm is to encode a video sequence with audio (such as a movie) and compress it to a size that is able to fit on a CD. The resolution that is being used to achieve that is 352x288 at 25 fps or 352x240 at fps (SIF resolution). The bit rate was set at about 1.15 Mbps (Golston and Rao, 2006). The I frame coding is based on the JPEG standard. It uses an 8 x 8 DCT (Discrete Cosine Transform) on chrominance and on luminance. On the format there are four luminance and one of each chrominance block (4Y, 1Cb and 1Cr). These blocks are forming a macroblock (Bock, 2009). Motion compensation on MPEG-1 is based on macroblocks. It uses one motion vector for each macroblock and that means that the same vector is used for all the six luminance and chrominance blocks. The accuracy of the motion vector is 0.5 pixels. This show how smooth is the natural movement on the video sequence (Bock, 2009). Compared to the previous compression algorithms, MPEG-1 includes the use of B frames and the adaptive perceptual quantization. B frames are used for bi-directional prediction. These frames depend on both previous and following frames. Although the video quality is increased, so is increased the computational power that is needed. The computations are more complex and this adds an increased latency on the result. Each application has different requirements in latency. That is why some applications in order to perform better, are skipping the decoding of the B frames. That is commonly used in applications with low bit rate requirements. Adaptive perceptual quantization is a method that is used for improving the video quality in terms of visual perception of humans. This is achieved by applying a quantization factor to each frequency (Golston, 2004). A simplified MPEG-1 encoder is shown on the next figure.

13 5 Figure 2.1: A simplified MPEG-1 video encoder (Ghanbari, 2003) Apart from I, P and B frames, the D frames are also introduced. These frames have only low frequency information and they are used only for searching. Using these frames, a user is able to find a specific part of a video. supports D frames for compatibility reasons although they are no longer used (Bock, 2009). Originally MPEG-1 was used only on video CD and supported only progressive video signals. But later a new version of MPEG-1 was introduced in order to be used on standard television (SDTV). That version supports interlaced video and bit rates up to 10 Mbps. It is referred as MPEG-1.5 but it is not widely used due to the arrival of the encoder (Bock, 2009). 2.2 probably is the most popular video compression codec. Initially it was designed for use in digital television but soon its use was widespread in almost any application that uses digital video and compression. It supports all the standards of SDTV, such as interlaced and progressive video and the proper resolutions. For PAL television systems 720 x 576 at 50 frames per second and for NTSC television systems 720 x 480 at 60 frames per second (Golston, 2004). was improved significantly in many areas in comparison to MPEG-1. Firstly it supports interlaced video and secondly it supports motion compensation

14 6 with search ranges that are much wider than previous codecs. This is necessary in order to support much higher resolutions. Consequently the complexity and the computational power needs of an encoder are much higher (Golston and Rao, 2006). The usual compression ratio for adequate performance of the codec is 30:1. Also the bit rate has to be 4 to 8 Mbps in order to sustain a good picture quality. The most common consumer applications that use are standard and high-definition television, DVD video and satellite video (Golston, 2004). The codec is able to support many new features with the use of the right tools. The new features are: multiple layer coding, data partitioning, SNR scalability, spatial scalability and temporal scalability (Golston and Rao, 2006). The profiles are six: simple, main, 4:2:2, SNR, spatial and high. The main features of each profile are the following (Richardson, 2002). Simple: supports only I, P frame coding, 4:2:0 subsampling, low complexity Main: supports interlaced video, B frames, with 4:2:0 subsampling 4:2:2: use of 4:2:2 subsampling (four luminance and two of each Cr and Cb) SNR: same as Main with the addition of an enhancement layer for better quality Spatial: same as SNR with the use of spatial scalability for better quality High: same as spatial with support of 4:2:2 subsampling The combination of a profile with one of the three basic levels defines the use of the codec in each case. The profile that is commonly used is the Main profile. Thus the use of each level on the main profile is the following: Main profile / Low level: is basically the same with MPEG-1 Main profile / Main level: for digital television Main profile / High level: for high-definition television The use of for high definition practically canceled the plans for implementation of the MPEG-3 codec that was originally intended to be used as the standard in high definition video (Richardson, 2002).

15 7 The following table shows the profiles and levels. Level Profile Simple Main 4:2:2 SNR Spatial High Picture type I, P I, B, P I, B, P I, B, P I, B, P I, B, P Chroma format 4:2:0 4:2:0 4:2:2 4:2:0 4:2:2 4:2:2 High Samples/line Lines/frame frames/s Bit rate (Mbps) High 1440 Samples/line Lines/frame frames/s Bit rate (Mbps) Main Samples/line Lines/frame frames/s Bit rate (Mbps) Low Samples/line Lines/frame frames/s Bit rate (Mbps) 4 4 Table 2.1: profiles and levels (Bock, 2009) Initially decoders were too expensive and their needs on computational power were very high. Eventually, these decoders became cheaper and simpler because new techniques were implemented on the construction of the decoder. Also, the rapid expansion of the use of the codec led to the mass production of compatible devices. (Golston and Rao, 2006). 2.3 MPEG-4 Visual MPEG-4 Visual (Part 2) is one of the two MPEG-4 codecs that have been standardized. The other is the MPEG-4 AVC (Part 10) (Bock, 2009). Originally MPEG-4 focused on supporting video for applications that require low bit rate, because MPEG-1 and are not efficient enough. The rapid expansion of the internet and the use of video streaming has increased the need for producing quality video on low bit rates. Therefore MPEG-4 Visual is designed to be able to efficient compress videos on low bit rate. Of course now it is widely used in various applications that require various bit rates.

16 8 Another innovation is the introduction of the object based coding. Conversely to previous codecs, now a video sequence can be managed as set of single objects. This new technique opens a whole new way of processing the video sequence. The main concepts are the video object (VO) and the video scene (VS). Video scene is a group of video frames that comprise a scene. Video object is a single object that can be defined in a video scene. Of course a video scene may have multiple video objects. Also, MPEG-4 introduces the concept of the toolkit. Thus, new tools can be added to the MPEG-4 standard and create new versions (Richardson, 2002). The new tools that have been introduced in MPEG-4 Visual are the following: (Golston and Rao, 2006) Unrestricted Motion Vectors: It predicts the movement of the objects that move outside of the frame Variable Block Size Motion Compensation: It supports motion compensation for 8 x 8 and 16 x 16 blocks. Intra DCT DC/AC Prediction: It predicts DC/AC coefficients by using blocks that are above or left of a specific block. Quantized AC coefficients with extended dynamic range: It supports AC coefficients with extended dynamic range in order to improve video quality. Furthermore, in order to support packet loss recovery it introduces the following features: (Golston and Rao, 2006) Slice Resynchronization: It creates slices inside the images, so it is able to resynchronize much quicker after the occurrence of an error. Data Partitioning: It divides the data of a video packet on the DCT part and the motion part. The checks on the motion vector are more strict and accurate. So when an error occurs not all the data of the specific packet are discarded. Reversible Variable Length Codes: It allows backward decoding with the use of the VLC tables. So it can resume decoding much faster. New prediction: It is used on real time applications to request additional data when a packet is lost. MPEG-4 Visual has many different profiles. The following table shows the profiles and their main features.

17 9 Profile Main Features Simple Low-complexity coding of rectangular video frames Advanced Simple Coding rectangular frames with improved efficiency and support for interlaced video Advanced Real-Time Simple Coding rectangular frames for real-time streaming Core Basic coding of arbitrary-shaped video objects Main Feature-rich coding of video objects Advanced Coding Efficiency Highly efficient coding of video objects N-Bit Coding of video objects with sample resolutions other than 8 bits Simple Scalable Scalable coding of rectangular video frames Fine Granular Scalability Advanced scalable coding of rectangular frames Core Scalable Scalable coding of video objects Scalable Texture Scalable coding of still texture Advanced Scalable Texture Scalable still texture with improved efficiency and object-based features Advanced Core Combines the features of Simple, Core and Advanced Scalable Texture profiles Simple Studio Object-based coding of high-quality video sequences Core Studio Object-based coding of high-quality video with improved compression efficiency Table 2.2: MPEG-4 Visual profiles (Richardson, 2003) Many codecs are based on the MPEG-4 Visual algorithm. The most popular are the divx, xvid, and the quick time. Initially xvid used the simple profile but later it introduced the advanced simple profile. Also dvix implements the advanced simple profile (Golston and Rao, 2006), (Ma and Tucker, 2008). The next table shows the levels of the simple based profiles. Profile Level Typical resolution Max bitrate Max objects L kbps 1 simple Simple Advanced Simple (AS) Advanced Real-Time Simple (ARTS) L kbps 4 simple L kbps 4 simple L kbps 4 simple L kbps 1 AS or simple L kbps 4 AS or simple L kbps 4 AS or simple L kbps 4 AS or simple L Mbps 4 AS or simple L Mbps 4 AS or simple L kbps 4 ARTS or simple L kbps 4 ARTS or simple L kbps 4 ARTS or simple L Mbps 16 ARTS or simple Table 2.3: MPEG-4 Visual levels of the simple based profiles (Richardson, 2003) 2.4 MPEG-4 AVC MPEG-4 AVC (Part 10) is the second MPEG-4 compression algorithm that has been standardized. International Organization for Standardization (ISO) approved the

18 10 standard in Also International Telecommunication Union (ITU) approved it under the name H.264 (Bock, 2009), (Golston and Rao, 2006). AVC stands for Advanced Video Codec. The codec s range of application is broad. It can be used from video for mobile devices to high definition video. So, the bit rate capabilities of the codec are from very low to very high bit rates (Ali, 2008). The efficiency of the codec is significantly improved. It reduces the bit rate up to 2x in comparison to and MPEG-4 earlier codecs. Therefore, MPEG-4 AVC can be used in order to provide new services. These services are: provide video over ADSL lines, produce video equivalent to VHS quality at 600 Kbps and store and distribute high definition video using common DVD disks (Golston, 2004). MPEG-4 AVC is based on the same principles as the previous compression algorithms. However, it presents many new features that make this codec more efficient. The new features that were introduced in MPEG-4 AVC are the following: (Golston and Rao, 2006) Adaptive Loop De-blocking Filter: removes artifacts caused by block prediction errors. Context-Adaptive Binary Arithmetic Coder (CABAC): a probability model is used in order to decode and encode syntax elements such as motion vectors and transform coefficients. Entropy Coding: uses a single Universal VLC for all the symbols and a Context- Adaptive VLC for the transform coefficients. Integer Transform: uses an integer 4x4 spatial transform (an approximation of DCT) in order to reduce the quality loss that is caused by IDCT mismatches. Intra and Inter Prediction and Coding: uses spatial domain Intra prediction and Inter frame coding. Multiple Reference Frame Prediction: uses up to 16 different reference frames. Quantization and Transform Coefficient Scanning: uses scalar quantization for the transform coefficients. Quarter - Pel Motion Estimation: allows quarter - pel and half - pel motion vector resolution. Variable Vector Block Sizes: uses different block sizes for motion compensation.

19 11 Weighted Prediction: uses the weighted sum of backward and forward predictions. The MPEG-4 AVC codec initially had three profiles: baseline, main and extended. The more important profiles are the baseline and the main profile. The baseline profile is appropriate for mobile devices and generally for applications with low bit rate demands. The main profile can provide high quality video using high compression. Of course the computational needs are also very high. Finally, the extended profile is used in video streaming (Richardson, 2003). The next table shows the three profiles and their main tools. Baseline Main Extended SP and SI slices X Data Partitioning X B slices X X Weighted Prediction X X I, P slices X X X CAVLC X X X Slice Groups and ASO X X Redundant Slices X X CABAC X Interlace X Table 2.4: MPEG-4 AVC profiles and tools (Richardson, 2003) Later a new set of profiles was added. These are called high profiles and they are four: high, high 10, high 4:2:2, and high 4:4:4. These profiles add new tools that improve the efficiency of the codec. The main additions to each high profile are the following: (Jack, 2007). High (HP): supports encoder-specified frequency-dependent scaling matrices and adaptive selection between 4x4 and 8x8 block sizes. High 10 (Hi10P): supports 9 or 10 bit 4:2:0 YCbCr. High 4:2:2 (Hi422P): supports 4:2:2 YCbCr. High 4:4:4 (Hi444P): supports 4:2:2 YCbCr or RGB, 11 or 12 bit samples, predictive lossless coding and residual color transform.

20 12 The High profile (HP) also introduces the following features: (Golston and Rao, 2006) 8x8 Luminance Intra Prediction: adds eight more modes for intra prediction. Adaptive Residual Block Size and Integer 8 x 8 Transform: adds a new 16 bit integer transform for 8 x 8 blocks. Monochrome: supports black & white video coding Quantization Weighting: adds new quantization weighting matrices. The various levels of the MPEG-4 AVC codec are described in the next table. Level Maximum MB per Second Maximum Frame Size (MB) Typical Frame Resolution Typical Frames per Second Maximum MVs per Two Consecutive MBs Maximum Reference Frames Maximum Bit-Rate kbps kbps , kbps kbps Mbps , Mbps 4 Mbps 10 Mbps Mbps Mbps Mbps 50 Mbps Mbps Mbps Mbps Table 2.5: MPEG-4 AVC levels (Jack, 2007). 2.5 VC-1 VC-1 is a video compression algorithm that is able to produce high quality video. The range of the produced bit rates are from very low to very high. Although the bit rate of a high definition video can be very high the need for computational power is kept on a reasonable level (Regunathan and Srinivasan, 2005).

21 13 The VC-1 codec contains the knowledge of more than 75 companies. The codec was originally implemented by Microsoft as Windows Media Video 9 (9). Later, The Society of Motion Picture and Television Engineers (SMPTE) has standardized the codec as VC-1. The function of the VC-1 codec is analyzed in three documents. The SMPTE 421M describes the main functions of the codec. The SMPTE RP227 and the SMPTE RP228 documents, describe the specifications of the bitstream transport and the bitstream conformance (Loomis and Wasson, 2007). The range of the supporting bit rates is broad. VC-1 can produce high definition video at 1080p with 6 to 30 Mbps bit rate. Furthermore, it can produce video with resolution at 2048 x 1536 with 135 Mbps bit rate. This is the highest possible resolution. The lowest is 160 x 120 with bit rate at 10 Kbps. The profiles of VC-1 are three: simple, main and advanced. There are also various levels that combined with the proper profile result to the proper resolution and bit rate for each application. The combination of those, determine the complexity of the encoder and the decoder (Loomis and Wasson, 2007). The following table shows the profiles and the levels of the VC-1 codec. Profile Level Max Bit Rate Resolution Frame Rate Low 96 Kbps Simple Medium 384 Kbps Low 2 Mbps Main Medium 10 Mbps High 20 Mbps L0 2 Mbps L1 10 Mbps L2 20 Mbps Advanced L3 45 Mbps L4 135 Mbps Table 2.6: VC-1 profiles and levels (SMPTE 421M, 2006)

22 14 As all the previous codecs, VC-1 is based on schemes such as spatial transformation and motion compensation. The basic principles are the same. However VC-1 introduces a new set of innovative techniques that make the codec more efficient. Furthermore, the increased codec efficiency is followed by the ability to produce high definition video with high quality. These innovations are the following: (Loomis and Wasson, 2007), (Regunathan and Srinivasan, 2005) 16-bit transform implementation: transforms are constrained to 16 bit in order to keep low decoder computational complexity adaptive block size transform: uses various combinations of the 8 x 8 transform to fit better to the needs of each case advanced B-frame coding: B frames do not refer to other frames so they can be sent separate or even be omitted differential quantization: supports quantization for several levels fading compensation: adds fading parameters in the encoding procedure interlace coding: adds new characteristics from the interlaced frames loop-filtering: uses a filter to eliminate discontinuities from the block boundaries motion compensation: uses four modes in order to use the most suitable for each case. VC-1 performs far better than and MPEG-4 Visual simple profile. Also various comparisons show that video quality is even better than MPEG-4 AVC. The compression ratio is similar but the complexity is kept low. So, the computational needs of the decoder are much lower and as a result, the hardware requirements of the decoding devices are lower too. This is a great advantage against the other competitor codecs (Golston and Rao, 2006). Furthermore, VC-1 along with MPEG-4 AVC, are the two codecs that are used for high definition video in Blu-ray players (Blu-ray Disk Association, 2005).

23 15 3 Quality metrics 3.1 General Video quality assessment is a difficult and complicated task. Digital pictures are distorted during processing, compression, transmission and reproduction. Any of these factors can result to the degradation of the quality of the video. Sometimes this is preferable in order to reduce the overall size of the file for transmission and storage purposes. Thus, it is very important to know, how the degradation of the quality of a video affects the resulted video sequence and if the outcome is satisfactory for the viewers. Therefore the assessment of video quality is a process that involves human beings in order to evaluate the picture quality. This is called subjective evaluation. In this case, a group of viewers watch and evaluate the quality of a video sequence. Although this procedure is preferable, it is also not convenient. It is expensive and needs a lot of time. So, researchers develop procedures that can assess and evaluate the quality of a video sequence without the need of observers. The results of these procedures are objective and they can be reproduced easily using the same parameters. This is called objective evaluation (Ghanbari, 2003), (Bovic et al., 2004). Quality metrics can be categorized according to the type of reference and the amount of information that they require in order to assess a video sequence. The categories are the following three. Full Reference (FR) Reduced Reference (RR) No Reference (NR) Full reference metrics compares the compressed video with the original video sequence that is used as a reference for the evaluation. The original video has to be in its original uncompressed form. In order to compare the two video sequences the color and the luminance has to be calibrated and also the temporal and spatial alignment has to be precise. So the two related pixels can be easily compared.

24 16 Reduced reference metrics are using only some features of the reference video sequence. The evaluation is based only on those features. Thus the comparison procedure is faster and easier due to reduced comparison factors. Also it avoids the use of the no reference metrics assumptions. No reference metrics evaluates the video sequence without the need of the original. The real challenge is to make the distinction between the distortion and the real video content. This type of metrics makes assumptions about the video type and the distortion due to the lack of the original video sequence. Each type of metrics is used in different situations. Full reference metrics are used for offline video quality assessment. They are usually used for codec evaluation in lab environment. The other two metrics are used for online video quality assessment on different stages in the transmission system. They can monitor and evaluate the video sequence in every stage. However, reduced reference metrics must have access to the original sequence (Winkler, 2009). As it is mentioned earlier subjective metrics involve the participation of a group of observers in order to evaluate a video sequence. Each observer gives his opinion using a specific quality scale. Of course, beforehand a number of matters have to be clarified. The viewing conditions have to be strict, the observers have to match a certain profile, the test material has to follow certain parameters and the data analysis has to adhere to a specific procedure (Winkler, 2005). Although they are expensive and more time consuming, subjective metrics are widely used because they are based on the human vision system. The subjective quality metrics are the following: Double Stimulus Continuous Quality Scale (DSCQS): observers evaluate short video sequences of the original and the test sequence. Double Stimulus Impairment Scale (DSIS): observers evaluate the test sequence in comparison with the original sequence. Single Stimulus Continuous Quality Evaluation (SSCQE): observers watch a 20 to 30 minutes video sequence while they continuously rate the sequence. Absolute Category Rating (ACR): observers evaluate the video sequence only one time without the reference of the original sequence.

25 17 Absolute Category Rating with hidden reference (ACR-HR): is the same with ACR with the addition of the original version of each sequence. Degradation Category Rating (DCR): is identical to Double Stimulus Impairment Scale. Pair Comparison (PC): observers evaluate test sequences from the same scene but under dissimilar conditions in various different combinations. The observers evaluate each video sequence and the results are averaged into a single score. This results to the Mean Opinion Score (MOS) that is unique for each sequence. The number of the observers has to be at least fifteen (Winkler, 2005). Generally each metric has a different application. The metrics that are widely used today are the first three and the most popular are the double stimulus metrics (Winkler, 2005), (Bock, 2009). Objective metrics are actually algorithms based on mathematical models that are capable to assess the quality of a video sequence in order to imitate the human vision system (HVS) and match the observers opinion (Ghanbari, 2003). The classification of the objective quality metrics is the following: (Winkler, 2009) Data metrics Picture metrics Packet or bitstream based metrics Hybrid metrics Data metrics evaluate the fidelity of the video signal with no concern about the content of the signal. In this case the video content of the signal is not taken into account. The two main metrics are the Peak Signal to Noise Ratio (PSNR) and the Mean Square Error (MSE). These are similar to the bit error rate and the packet loss rate that are used for transmission errors since none of them are taking into account the content of the signal. Picture metrics evaluate the signal as a video sequence. They take into account the image distortion and the overall quality of the sequence. They are based on the human vision system (HVS) and on the analysis of specific artifacts and features of the video.

26 18 Packet or bit stream based metrics are used on packet networks for the evaluation of compressed video sequences. They do not decode the video but they assess it by checking the encoded bit stream and the packet header. The main advantage is that they require much lower processing power and that s why they are able to process multiple video streams. However, they can be used only on specific network protocols and video codecs. Hybrid metrics use a combination of the previous quality metrics. Except for the Peak Signal to Noise Ratio and the Mean Square Error, two other commonly used objective metrics are the Structural Similarity (SSIM) and the Video Quality Metric (VQM). The Video Quality Metric is an objective full reference metric that is based on subjective observations. These two metrics give the most reliable results (Wang, 2006). The standards of the objective quality measurements are based on the following criteria: (Winkler, 2009) Define the Mean Opinion Score for a specific application: A specific Mean Opinion Score has to mean the same for every similar video in terms of quality in a video sequence. Define a reliable Mean Opinion Score prediction: The tool that is used for quality measurement has to produce results similar to the observers score. Define a reproducible Mean Opinion Score prediction: The tool has to produce the same results for the same video comparison every time. The previous criteria are partially achieved by the existing standards. So far, no standard is able to fully accomplish the implementation of the three criteria. Video quality assessment is performed under specific standards that have been released by various groups and forums. The most active groups that are working on the release of standards for video quality measurements are the following: Video Quality Experts Group (VQEG) ITU-T ATIS IIF

27 19 These groups have released several recommendations regarding the video quality assessment procedure. They include recommendations for both subjective and objective assessment. The most common recommendations are the ITU-R BT and the ITU-T P.901 for subjective video assessment and the VQEG Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment (Phase I and II) for objective video assessment. There are also several other recommendations for various different situations (Winkler, 2009). The most reliable and commonly used metrics are presented in more detail in the following sections (Hewage, 2009), (Winkler, 2009), (Wang, 2006). 3.2 Objective Peak Signal to Noise Ratio (PSNR) Peak Signal to Noise Ratio (PSNR) is one of the most widely used metric and also one of the most simple. It is calculated on a logarithmic scale and requires the calculation of the Mean Square Error (MSE). The next equation shows the calculation of the PSNR. where (2 n -1) 2, is the square of the highest possible signal value in the entire image (255 for 8-bit images) (Richardson, 2002), (Winkler, 2005). PSNR is easily calculated, therefore is the most popular quality metric. It is usually used for the comparison between compressed and uncompressed video sequences. However, PSNR has a number of disadvantages. Firstly, it is a full reference metric, so it requires the original video sequence. But the most important is that the results of the metric are not always in accordance with the results of the subjective metrics. It has been noticed that in some occasions, PSNR scores higher an image that has low subjective Mean Opinion Score.

28 20 An example of such a contrast in the results is when the compared image contains a human face. If the face is clear and bright, most observers will rate it higher even if the background is blurred. On the other hand, PSNR calculates all the pixels of an image. Therefore the rating of the PSNR in this case, is going to be lower (Richardson, 2002). After extended use of the PSNR metric, the following has been observed: (Bock, 2009) The DC levels of the two video sequences have to be the same. If they are different, the PSNR rate is lower even if the distortion of the sequence is not noticeable. PSNR has to be used only for the calculation of the distortion of similar types. This comparison is valuable in contrast with the comparison of different types of distortion that is meaningless. The two video sequences must have similar error signal distribution. If not, PSNR is not able to provide accurate ratings for the two different sequences. The PSNR metric gives more reliable results when the error signal is low and the quality of the image is high. PSNR metric should not be used on low quality images because it is not able to decide if the loss of quality is acceptable and in which condition. The size of the picture is related to the accuracy of the PSNR metric. It has been noticed that smaller image sizes result to more accurate PSNR values compared to the subjective scores. Larger sizes result to values that are much different. This happens because the observers concentrate only to a portion of the large image in comparison with the whole image on the smaller sizes. Despite the limitations of the Peak Signal to Noise Ratio (PSNR) metric, it is widely used in video quality measurement because it is easy and very simple to use Mean Square Error (MSE) Along with PSNR, Mean Square Error (MSE) is the most popular quality metrics in video quality assessment. Assuming that there are two sequences I and Ī of size X x Y with T frames, MSE calculates the squared differences between the two sequences

29 21 (the gray level values) and then calculates their mean value. The next equation shows the calculation of the MSE. Mean Square Error is very fast and easy to use. It measures the differences between the two video sequences, by comparing the two sequences pixel by pixel. As PSNR, Mean Square Error suffers from the same problems. This is obvious because the calculation of the PSNR depends on the calculation of the MSE. The main problem is that MSE can score lower an image that has high subjective mean opinion score. This happens because the observers can rate higher an image by just concentrating on a specific part of it. On the other hand, MSE and PSNR calculate the distortion on the whole image (Winkler, 2005). Even though Mean Square Error suffers from the same problems as Peak Signal to Noise Ratio, these metrics are widely used because they have low cost and they are very simple to use. Also the results of these metrics under controlled environment such as a video testing lab are trustworthy and reliable Structural Similarity (SSIM) Structural Similarity (SSIM) introduced a new approach on the assessment of video quality. All the previous methods are based on error detection between the two sequences. Conversely, SSIM measures the structural distortion. This method is more reliable than the previous, because its function is closer to the Human Vision System (HVS). Human vision is more sensitive to the structural information of an image than to the visual error extraction. In that way, the results of this metric are more accurate and closer to the subjective metrics (Wang, 2006). SSIM is a full reference metric and needs access to the original sequence in order to perform the comparison. Firstly, it assumes that the original sequence has the perfect quality. Then, similarity is used in order to compare it with the second sequence. The measurement of

30 22 similarity is divided into three separate factors: contrast, luminance and structure. Then a comparison is performed for each different factor. The three factors are independent. Thus, the structure is not affected even if contrast or luminance is changed (Bovic et al., 2004). These three comparisons lead to the following three equations, one for each situation. The first is for luminance, the second is for contrast and the last is for structure. The SSIM equation is created by the combination of the previous comparisons. So, SSIM(x,y) is the measured similarity between signal x and y. To simplify the previous equation, it is set that α = β = γ = 1 and C 3 = C 2 / 2. The simplified equation is the following. where μ x - μ y : averages of x and y, σ 2 x σ 2 y : variances of x and y, σ xy : covariance of x and y, C 1 - C 2 : constants to avoid instability when μ 2 x + μ 2 y is very close to zero. The resulted values are between -1 and 1. The -1 value represents the worst quality and the 1 value the highest quality. The quality of the entire image is expressed by the mean structural similarity (MSSIM).

31 23 where X: reference image, Y: distorted image, x j - y j : image contents at jth local window and M: number of the local windows. After testing, Structural Similarity (SSIM) metric has proved more reliable than PSNR and MSE. The results are much closer to the results of the subjective metrics. Of course it has a major disadvantage. The calculations of the equations are complex and require more time and computational power (Bovic et al., 2004), (Wang, 2006) Video Quality Metric (VQM) Video Quality Metric was developed by The Institute for Telecommunication Science (ITS) (Wang, 2006). The aim of the metric is to offer an objective assessment based on subjective observations. It is a full reference metric that measures various factors and combines them into a single result. These factors are: block distortion blurring color distortion global noise unnatural motion VQM shows that it has a high degree of correlation with the resulted scores of the subjective metrics. Video Quality Experts Group (VQEG) has evaluated the metric and rated it very high. Also ANSI approved Video Quality Metric as a standard of objective video quality measurement (Hewage, 2009), (Wang, 2006). The steps of the Video Quality Metric procedure are the following: Calibration: Calibrates the video sequence for future extraction. It estimates the correct temporal and spatial shift and makes the appropriate corrections. Then it adjusts the brightness and the contrast according to the original sequence. Quality Features Extraction: Using a mathematical function, it extracts a set of features regarding picture quality. These features are about changes in the chrominance, spatial and temporal properties of the video sequence.

32 24 Quality Parameters Calibration: Comparing the two video sequences, it spots the quality differences and calculates the appropriate parameters. VQM Calculation: Combining all the previous parameters it calculates the VQM result. Video Quality Metric can be calculated using different criteria according to the use of the video sequence. These models are: Developer, General, PSNR, Television and Videoconference (Wang, 2006). The General model uses seven parameters. The five parameters are based on the luminance component and the other two on the chrominance components. The parameters are the following: (Hewage, 2009) chroma extreme: detects color problems such as impairments that have been created by errors in transmission. chroma spread: detects variations in the spread of the color distribution ct ati gain: Temporal and contrast information hv gain: detects a shift on the orientation of the edges (diagonal to vertical and horizontal) hv loss: detects a shift on the orientation of the edges (vertical and horizontal to diagonal) si gain: measures the quality improvements that are a product of the implementation of various enhancements such as edge sharpening. si loss: detects loss of information on spatial properties In general Video Quality Metric (VQM) is one of the most reliable metrics and its results are very close to the scores of the subjective metrics. 3.3 Subjective Double Stimulus Continuous Quality Scale (DSCQS) Double Stimulus Continuous Quality Scale (DSCQS) is one of the most widely used subjective metric. It was introduced by The International Telecommunication Union. As every subjective methodology, firstly it is necessary to set the correct parameters for the test. The room environment, the number and the profile of the observers have to be set. Also the testing material has to be chosen and prepared accordingly.

33 25 In general, double stimulus methods use a repetitive presentation of the material. The original and the testing sequences are shown consequently with small intervals. Then the observers have to vote keeping in mind the two sequences. DSCQS has two presentation ways. These ways are the Variant I and the Variant II. In Variant I there is only one observer. This observer is permitted to switch between the two sequences as many times as he wants, until he has a clear opinion. A typical observer watches each set of sequences two or three times. Each sequence lasts approximately, 10 seconds. In Variant II the observers are more than one. The two sequences are shown one or more times consequently, in order to help the observers establish a clear view of their quality. After that, the sequences are shown again and each observer votes. The duration of each video sequence is 10 seconds and the repetitions are usually two. The next figure shows the structure of the presentation. Figure 3.1: Double Stimulus Continuous Quality Scale (DSCQS) presentation procedure (ITU-R, 2002) The presentation phases are the following: T1 is the test sequence A (10s) T2 is the mid-gray interval (3s) T3 is the test sequence B (10s) T4 is a mid-gray sequence (5-11s) One of the video sequences is the testing sequence, and the other is the original sequence. These sequences are not shown in a particular order. Thus the observers do not know which sequence is the original. Also it is possible that some sets do not

34 26 contain a testing sequence but are comprised of two original sequences. At the end, the observers vote using a five grade quality scale. The grades are the following: Excellent Good Fair Poor Bad The voting sheet usually has five double columns, one for each set of sequences. The next figure shows a typical voting sheet. Figure 3.2: Double Stimulus Continuous Quality Scale (DSCQS) score sheet (ITU-R, 2002) The scores are converted to a scale from 0 to 100. Next, the differences between the two sequences are calculated and then they are recorded. So, the resulted scores are the differences between the original and the testing sequence. Hence, it is important not to associate the scores with terms about the sequences quality description. These stands for all the subjective assessment methods, even if the descriptions are the DSCQS descriptions. Of course the variances on the results depend on the type of each used sequence. Therefore, the results have to be processed separately and not combined with the other sequences in order to create an average score (ITU-R, 2002).

35 Double Stimulus Impairment Scale (DSIS) Double Stimulus Impairment Scale (DSIS) along with DSCQS, are the most common subjective video quality metrics. They are widely used in the evaluation of video sequences. DSIS follows all the standards of the subjective quality assessment. This means that initially it is necessary to set all the parameters as in the previous method. This metric uses the impairment scale. Although this scale is more consistent for small impairments, the method is also used for large impairments. There are two different ways of presenting the testing material. These ways are called Variant I and Variant II. In Variant I the original sequence and the testing sequence are shown only one time. Between the two sequences there is a small interval of three seconds. The voting period is set right after the beginning of the second sequence. The next figure shows the presentation of the Variant I. Figure 3.3: Double Stimulus Impairment Scale (DSIS) variant I (ITU-R, 2002) In Variant II the original sequence and the testing sequence are presented two times. The intervals are also three seconds. The voting period begins right after the start of the second period. This variant needs more time and is usually used on sequences with small impairments. The next figure shows the presentation of the Variant II. Figure 3.4: Double Stimulus Impairment Scale (DSIS) variant II (ITU-R, 2002)

36 28 The presentation phases are the following: T1 is the original sequence (10s) T2 is the mid-gray interval (3s) T3 is the test sequence (10s) T4 is a mid-gray sequence (5-11s) The scale has five grades: 5 imperceptible 4 perceptible, but not annoying 3 slightly annoying 2 annoying 1 very annoying The voting sheet has to be clearly formed in order to be filled correctly by the observers. At the beginning of the presentation, the observers have to be informed about the procedure. The watching and the voting period have to be clarified. Also, it is important that observers must not have the impression that the worst sequence has to be graded with the worst possible grade. That means that their grade has to be based on the whole impression of the sequences and to be expressed by the right expression of the grading scale. The sequences have to be presented in a seemingly random order. The same sequences must not be presented two times consequently even if the levels of impairments are different. The sequences must be chosen in such a way that all the possible grades have to be used by the observers. Also, before the presentation of the testing sequences, it is possible to present a series of images that are indicative of the grading scale. The whole procedure has to last about thirty minutes. This time includes the explanations, the presentation of the sequences and the voting period. (ITU-R, 2002).

37 29 4 Comparison 4.1 Video sequences The selection of the video sequences is the first step in the video quality measurement procedure. This step is very important for the proper progress of the assessment. Many factors have to be considered in order to choose the right sequences for the procedure. It is certain that the sequences with the highest quality are not always the most preferable. The test has to be performed with sequences of various types of video quality. The content of the video is also very important. The sequences have to be of various contents in order to test the codecs under different conditions. As it is mentioned in the previous chapter, it is important to have in mind, not only the objective metrics but also the subjective ones. The subjective metrics require sequences with specific types of contents. One of them is the human faces. So, one of the sequences has to contain one or more faces. Also, the motion is very important. The sequences have to contain normal motion and high motion. Another factor is the duration of the video sequences. Double stimulus metrics require ten seconds sequences. Thus, the videos have to be ten seconds in duration. Except for the type of the video sequences that are going to be used in this procedure, the technical specifications of the sequences are also very crucial. These specifications include the resolution and the bit rate. As it is mentioned earlier, this assessment is focused on the reproduction of a video sequence of standard definition in a home environment such as DVB-T and IPTV, and also the use of video material on mobile devices such as mobile phones and other devices capable of DVB-H. So, for the selection of the resolution it is essential to bare in mind the following. According to Richardson (2003) the following table shows the association between the resolution and their use.

38 30 Format Resolution Use QCIF 176 x 144 Mobile CIF 352 x 288 Videoconference 4CIF 704 x 576 SDTV, DVD Table 4.1: Video sequences format (Richardson, 2003) However, the advances in technology tend to reduce the usage of the QCIF format. Even since 2005, mobile devices are using CIF format videos as a standard (Bistrom, 2005). Nowadays, modern mobile devices such as mobile phones, PSP and IPOD Touch have the ability to reproduce video even larger than the CIF format. Therefore, the QCIF format is not considered for the present study. 4CIF format is used for SDTV. DVD video has a resolution of 720x576 in PAL system (Taylor, 2001). So, 4CIF is suitable for the comparison of the video sequences in DVD quality. The resolutions that were used in this assessment were the CIF and the 4CIF. CIF is used for mobile devices and 4CIF for SDTV. Next is the selection of the bit rates. The selection is based primarily on each codec s profiles and levels (see chapter 2). Of course the specifications of the codecs are not matching perfectly but they have a common base for comparison. The other factor is the real bitrate usage in the appropriate applications. Various studies such as Bistrom s (2005) and Nemcic s (2007) select CIF format videos with bitrates from 192 to 768 kbps. Of course this is in accordance to the codec s specifications. There is also the 96 kbps bitrate, but it is discarded because it is close to 192 kbps and the advances in technology makes the transmission of a 192 kbps video easier. Higher bitrates are also discarded because bitrates beyond 1 Mbps are going to be used on 4CIF format. So the bitrates for the CIF format are the following: 192 kbps, 384 kbps and 768 kbps. For the selection of the 4CIF format bitrates the procedure is a bit more complex. The profiles and the levels of the codecs, have a variety of bitrates. The most common bitrates are 4 Mbps, 8 Mbps and 10 Mbps. However it is essential to consider the used bitrates under real circumstances. According to Bock (2009), the common bitrates of SDTV for home usage is 3,5 Mbps with encoding, and

39 31 1,5 to 2,5 Mbps with MPEG-4 AVC encoding. SDTV includes both IPTV and DVB-T transmissions. As it is mentioned earlier a DVD quality video is also needed for the comparisons. The maximum bitrate for DVD is 9,8 Mbps. In reality most DVD videos have an average bitrate between 4 to 6 Mbps (Taylor, 2001). So, in order to include all the appropriate bitrates for all the used codecs, the chosen bitrates are the following: 2 Mbps, 4 Mbps and 8 Mbps. These bitrates are in accordance with the specifications of the codecs and the needs of the modern multimedia devices. After the selection of the resolution and the bit rates, the next step is to select the actual video sequences according to the content and apparently the availability. Of course in order to encode the original video into various formats using the selected codecs, the original video has to be in an uncompressed format. Many universities and institutes offer uncompressed video sequences for the purpose of video quality measurement. The majority of the studies that involve video quality assessment use those videos. The video sequences of this study were downloaded from the web site (ftp) of the Faculty of Electrical Engineering and Computer Science of the University of Hanover. Their website offers a variety of video sequences. The selected sequences are the following: City, Crew and Soccer. These sequences fulfill all the requirements of this study. Firstly, each one has a duration of ten seconds. Next, they are offered on both CIF and 4CIF format, so it prevents the quality loss from a possible video process in order to change the resolution of the sequence. The final issue is the content of the video sequences. Each of them must have a unique characteristic. The City sequence shows a city landscape filmed from above with a normal camera motion. The Crew sequence shows the arrival of a shuttle crew who are walking towards the camera. It contains multiple human faces with some of them close to the camera. The Soccer sequence shows a football match and the camera follows the ball. It contains high motion.

40 32 Figure 4.1: Test sequences screen shots (frame 0) Based on the previous selections the produced video sequences are the following. SEQUENCE CITY CREW SOCCER FORMAT CIF 4CIF CIF 4CIF CIF 4CIF BITRATE 192 kbps 384 kbps 768 kbps 2 Mbps 4 Mbps 8 Mbps 192 kbps 384 kbps 768 kbps 2 Mbps 4 Mbps 8 Mbps 192 kbps 384 kbps 768 kbps 2 Mbps 4 Mbps 8 Mbps Table 4.2: Video sequences format and bitrate

41 33 Therefore, the produced video sequences are 18. Each of them has to be encoded by each one of the four codecs. Hence, the total amount of the sequences is 72. The frame rate of the sequences is 25 fps in order to comply with the PAL system. 4.2 Quality measurement tool There are various tools that are able to perform objective evaluation on the video quality. But each one has limitations about the supported metrics and the supported video formats. Only the MSU Video Quality Measurement Tool has the ability to perform evaluation using various objective metrics and also support all the video formats. This tool was developed by the MSU Video Group of the Faculty of Computational Mathematics and Cybernetics of Moscow State University. It is free of charge for the purposes of scientific research. There is also a non free version that is aimed for professional use. It has enhanced functions but they are not required to this research. The usage of this tool is very simple. Firstly the user has to choose the original file. Then the user chooses the test file. The tool has the ability to open a second test file at the same time for comparative analysis. In order to open most of the video formats, it is using the AviSynth program. AviSynth is a program that has to be installed to the testing computer in order to open the various video file formats. It uses scripts that they are automatically created by the MSU tool. After the selection of the files, the user has to select the appropriate metric. MSU Video Quality Measurement Tool has the ability to use all the objective metrics that are selected for the evaluation. So, it can measure PSNR, MSE, SSIM, VQM and many more. For the PSNR and the MSE metrics it can measure luminance and chrominace separately. So, it measures Y,U an V. The other two do not need separate measurements due to the type of the metrics (see chapter 3).

42 34 The Process button starts the measurement. After the completion of the measurement it creates a graph that shows the performance of the codec. Figure 4.2: MSU Video Quality Measurement Tool main screen After that, the user can examine the compared sequences, frame by frame and at the same time to examine the created graph. This is very important in order to examine the codecs performance on certain frames. In addition it creates an Excel (csv) file with the measurements of each frame separately. The average value is also calculated and recorded on the file. Of course this program has more features but they are not necessary for this research. For the subjective quality evaluation the MSU Video Group has created the MSU Perceptual Video Quality Tool. This tool creates a pc-based environment for the

43 35 subjective metrics to take place. It supports the selected double stimulus metrics and it is fully compliant with the ITU-R BT recommendation. This tool is divided into two programs: the MSU perceptual video quality task manager and the MSU perceptual video quality player. Figure 4.3: MSU Perceptual Video Quality Tool main screen The manager is used for the creation of the tasks. Each task has a sequence of comparisons based on the same metric. To create a task the user has to add the comparative files: the original file and the four encoded files. Then it has to set, which is the original file and then to select the appropriate metric. Also it can select various other options regarding the behavior of the player during the evaluation. These include the pause and rewind functions and also the number of the repetitions. An appropriate name has to be set for each task. This is very important during the result processing phase due to the great amount of the produced files. After that the task can be saved. A single task has to be created for each comparison. So the total amount of the tasks is 36: 18 for the DSCQS metric and 18 for the DSIS metric.

44 36 It is important to mention that unlike the MSU Video Quality Measurement Tool, MSU Perceptual Video Quality Tool is not able to create the appropriate AviSynth scripts. So, the user has to create one script for each video sequence. This means the creation of 72 scripts. After the creation of the tasks the evaluation procedure is taking place. Firstly, the perceptual video quality player asks for the observer s name. This is required for the processing of the results. Then the task is loaded and the procedure starts. After the end of the procedure the observer is called to evaluate each video according to the used metric. Additionally an Excel file is created with the scores of the observer. This file is saved under the task s name folder. So, the results of the various observers are going to be saved on the appropriate folder. This procedure is repeated for all the comparisons of the task. Then, it loads the next task. This is a time consuming procedure. MSU perceptual video quality player has the ability to execute batch files that include multiple tasks. This is very convenient because observers do not have to execute each task file separately. After the completion of all the subjective evaluations that include at least 15 observers, each task s folder will include multiple files with the names of the observers. These files are needed for the calculation of the Mean Opinion Score. This is calculated automatically by the task manager. The user loads the appropriate task file and chooses the Count Results button. This program gives the option to the user to calculate the results by three different ways. These options include a simple average calculation and the ITU-R BT based calculation. Then it creates the result s file to the appropriate folder. 4.3 Comparison procedure Before the beginning of the comparison procedure, it is necessary to create the proper video files for the comparison. The original uncompressed files are in YUV format. So, the files have to be converted into AVI uncompressed format. This is

45 37 required because the encoders of the codecs that are going to be used, do not use YUV files as input. Therefore the input source files have to be in AVI format. This is not a problem because these AVI files are also uncompressed. To create the AVI files an appropriate program is used. It is called YUV to AVI Converter, it was developed by the University of Surrey (UK) and it is a freeware program. The resulting files are six: two for each sequence in CIF and 4CIF format. Figure 4.4: YUV to AVI Converter main screen Then the test sequences are produced. The codecs are four:, MPEG-4 Visual, MPEG-4 AVC () and VC-1. For the MPEG-4 Visual codec, the encoder is used. implements the Simple and the Advanced Simple profile of the codec. The x264 encoder is used for the MPEG-4 AVC codec. x264 is an implementation of the and studies show that it performs even better than commercial versions of the codec (Ma and Tucker, 2008). The 3 encoder of the 9 codec is used for the VC-1 codec. 9 implements the Simple and the Main Profiles of VC-1 (Kalva and Lee, 2009). All of the used encoders are offered free of charge.

46 38 The creation of the files is time consuming. The resulted files are 72. Each original file creates 12 encoded files. Also the user has to be very careful in order to maintain the desired properties for all the resulting files. Firstly the format, then the frames per second, and last the proper bitrate. Next the user has to disable the audio for all the files. Since the original files do not contain audio information, the test files must not contain audio as well. During the encoding phase, the encoding time was measured. It was measured by a digital timer. Of course these times are indicative and this procedure was not meant to create precise results. The purpose was to obtain a clear view about the behavior and the processing time of each codec. Following the creation of the test files, the files have to be tested for the proper number of frames. The frame number has to be 250 (10 seconds x 25 frames). Using the MSU Video Quality Measurement Tool preview option, all the files are checked in comparison with the original file. So, all the files must have the correct frames (0..249). This is crucial because the objective metrics compare the files frame by frame and a possible frame loss simply ruins the results. After the successful checking, the testing procedure begins. The process is very simple. Firstly, the user sets the original file and then the encoded file. Even if the MSU Video Quality Measurement Tool can use the YUV files as the original files, the AVI uncompressed files are used because these files were used for the creation of the encoded files. Then the user chooses the appropriate metric. The metrics that are used on this research are the following: PSNR (Y, U, V), MSE (Y, U, V), SSIM and VQM. So, the comparisons for each of the 72 test sequences are 8. Therefore the total comparisons are 576. This amount of comparisons means that a lot of time is spent during this process. The completion of the comparisons took about two weeks. Each comparison creates an Excel (csv) file which is automatically saved in the appropriate folder. It contains the values of the metric for all the 250 frames. It also includes the average value. Furthermore it creates a graph based on the values. This graph is saved as a picture.

47 39 Figure 4.5: CREW CIF 768 Kbps SSIM graph Hence, the resulting files are 576 excel files and 576 graphs. The first thought was to include them in the Appendix but it proved that it needed more that 150 pages just for them. So, each excel file is opened and the average value is copied to a properly formatted table. These values are used for the evaluation of the codecs. Now is the time to explain the values of each objective metric in order to understand the significance of the values. The next table shows the interpretation of these values. METRIC INTERPRETATION PSNR Higher values are better, 100 for equal frames MSE Lower values are better, 0 for equal frames SSIM Higher values are better, 1 for equal frames VQM Lower values are better, 0 for equal frames Table 4.3: Objective metrics values interpretation For the subjective evaluation the procedure is more complex. The most important thing is the testing environment. The lighting of the room, the viewing distance and the monitor are essential to the success of the procedure. Of course, the whole procedure is compliant with the ITU-R BT and the ITU- T P.901 recommendations. The background room illumination was 20 lux at maximum. A 650 lumens lamp is used on a 33 m 2 room. The viewing distance was 3 to 4 times the height of the test sequence. The used monitor was a LCD monitor (LG Flatron M227WD) with the following specifications.

48 40 Screen size (inches) 21.5 Panel Type TN Aspect Ratio 16:9 Resolution 1920 x 1080 Brightness (cd/m2) 300 Contrast Ratio 10000:1 Response Time (ms) 5 Viewing Angle 170 /160 Color Depth 16.7 M Pixel Pitch (mm) x Surface Treatment non Glare Table 4.4: Monitor specifications The specifications of the monitor exceed the requirements of the previous recommendations. The luminance of the screen was set at 200 cd/m 2. The resolution of the monitor during the evaluation was 1280 x 720. This resolution keeps the proper ratio and does not distort the images. Also it is more than adequate for the presentation of the sequences. The viewers have to be non experts, with non prior knowledge of the codec s behavior. Of course they must have normal visual acuity or they have to use corrective glasses. Also, the number of the observers has to be at least 15. On this research the observers were 16 with normal visual acuity and they were not experts. The next step is the creation of the AviSynth scripts. As it is mentioned in the previous section, this procedure is necessary in order for the MSU Perceptual Video Quality Tool to open the encoded files. The total number of the script files is 72. Then, the task files have to be created. These files are 36, 18 for DSIS and 18 for DSCQS. Also the batch files are created in order to make the whole evaluation process easier. The batch files are three: one for each testing sequence. The completion of the evaluation of each sequence takes about 30 minutes. This is the upper limit according to the previous recommendations. So, after the end of the evaluation of a sequence, a ten minutes break is mandatory. The observer can continue the process after the

49 41 break. Thus each observer needs about 2 hours to complete the whole evaluation. The completion of the subjective evaluation took about two weeks. DSIS DSCQS original impaired Table 4.5: CREW CIF 768 Kbps score sample (observer f2) Of course before the beginning of each evaluation, instructions were given to the observers. These instructions were about the whole procedure and the scoring scale according to the instructions for each of the subjective metrics. After the end of the evaluation phase, the Mean Opinion Score is calculated by the MSU Perceptual Video Quality Tool. Then the resulting excel files are opened and the average scores are copied on a properly formatted table. After that the appropriate graphs are created according to the average scores. In order to comply with the ITU-R BT the results in this research are calculated according to the previous recommendation. This type of calculation discards the observers that differ from the average. The analysis of this procedure is complex and exceeds the purposes of this research. The MSU Perceptual Video Quality Tool average values are different to each metric. DSIS scale is from 0 to 10 and DSCQS scale is from 0 to 5. On both metrics better values are the higher values.

50 Figure 4.6: Comparison procedure 42

51 43 5 Results 5.1 Objective metrics The first sequence is the City sequence. The following are referred to the CIF format sequences. The PSNR Y results are shown in the following figure. SEQUENCE A - CIF PSNR-Y BITRATE (kbps ) Figure 5.1: CITY CIF PSNR-Y The results show a slight advantage of the codec compared to the codec. However, the PSNR U and V values are show that the codec have better performance. SEQUENCE A - CIF 50 PSNR-V BITRATE (kbps ) Figure 5.2: CITY CIF PSNR-V

52 44 The same is also observed at the MSE results. has better values on Y and is better on U and V (see Appendix A). SEQUENCE A - CIF MSE-Y BITRATE (kbps ) Figure 5.3: CITY CIF MSE-Y The SSIM metric shows similar results. and have much better performance than and in lower bitrates but the values are getting closer at higher bitrates. SEQUENCE A - CIF 1,00 0,95 SSIM 0,90 0,85 0,80 0,75 BITRATE (kbps) Figure 5.4: CITY CIF SSIM The same is shown by the VQM values.

53 45 SEQUENCE A - CIF 2,5 2,0 VQM 1,5 1,0 0,5 0,0 BITRATE (kbps ) Figure 5.5: CITY CIF VQM The 4CIF format shows that performs better that the other codecs. The U and V values of the PSNR and the MSE are clearly showing better performance. SEQUENCE A - 4CIF 50 PSNR-U BITRATE (Mbps) Figure 5.6: CITY 4CIF PSNR-U SEQUENCE A - 4CIF 1,2 MSE-V 1,0 0,8 0,6 0, BITRATE (Mbps) Figure 5.7: CITY 4CIF MSE-V

54 46 The SSIM shows that all the codecs have similar quality in the higher bitrates. This is normal because at the 8 Mbps the picture quality of the codec is near the maximum of the DVD quality SEQUENCE A - 4CIF SSIM 1,00 0,98 0,96 0,94 0,92 0,90 0, BITRATE (Mbps) Figure 5.8: CITY 4CIF SSIM VQM values have the same outcome. SEQUENCE A - 4CIF 1,5 VQM 1,0 0,5 0, BITRATE (Mbps) Figure 5.9: CITY 4CIF VQM The next sequence is the CREW sequence. Here, it is obvious that the codec is superior to the codec. All of the values show clearly that it performs better than the others.

55 47 SEQUENCE B - CIF PSNR-Y BITRATE (kbps ) Figure 5.10: CREW CIF PSNR-Y SEQUENCE B - CIF MSE-Y BITRATE (kbps ) Figure 5.11: CREW CIF MSE-Y SEQUENCE B - CIF 1,00 SSIM 0,95 0,90 0,85 0,80 BITRATE (kbps) Figure 5.12: CREW CIF SSIM

56 48 SEQUENCE B - CIF 2,0 VQM 1,5 1,0 0,5 0,0 BITRATE (kbps ) Figure 5.13: CREW CIF VQM As it is shown in the previous figures, all the metrics have similar results. The best codec is the, next is the, the and last the. In the VQM metric it is noticed that shows better performance than. The same is also valid, for the 4CIF sequences. The results of the metrics follow the CIF format results. The following figures show the CREW 4CIF results. SEQUENCE B - 4CIF PSNR-Y BITRATE (Mbps) Figure 5.14: CREW 4CIF PSNR-Y

57 49 SEQUENCE B - 4CIF MSE-Y BITRATE (Mbps) Figure 5.15: CREW 4CIF MSE-Y SEQUENCE B - 4CIF 1,00 0,98 SSIM 0,96 0,94 0,92 0, BITRATE (Mbps) Figure 5.16: CREW 4CIF SSIM SEQUENCE B - 4CIF VQM 1,2 1,0 0,8 0,6 0,4 0,2 0, BITRATE (Mbps) Figure 5.17: CREW 4CIF VQM The last sequence is the SOCCER sequence. This sequence includes high motion. Thus the results of this sequence are very important.

58 50 The CIF format sequences clearly show similar results to the CREW sequence. The codec performs better than. SEQUENCE C - CIF PSNR-Y BITRATE (kbps ) Figure 5.18: SOCCER CIF PSNR-Y SEQUENCE C - CIF MSE-Y BITRATE (kbps ) Figure 5.19: SOCCER CIF MSE-Y SEQUENCE C - CIF 1,00 0,95 SSIM 0,90 0,85 0,80 0,75 BITRATE (kbps) Figure 5.20: SOCCER CIF SSIM

59 51 SEQUENCE C - CIF 2,0 VQM 1,5 1,0 0,5 BITRATE (kbps ) Figure 5.21: SOCCER CIF VQM In this sequence the results are clear. The best performance is from the codec, next is the, followed by the and the. Naturally the results of the 4CIF format are pretty much expected. They are similar to the CIF format. SEQUENCE C - 4CIF PSNR-Y BITRATE (Mbps) Figure 5.22: SOCCER 4CIF PSNR-Y

60 52 SEQUENCE C - 4CIF 15 MSE-Y BITRATE (Mbps) Figure 5.23: SOCCER 4CIF MSE-Y SEQUENCE C - 4CIF 1,00 0,98 SSIM 0,96 0,94 0,92 0, BITRATE (Mbps) Figure 5.24: SOCCER 4CIF SSIM SEQUENCE C - 4CIF 1,2 1,0 VQM 0,8 0,6 0,4 0, BITRATE (Mbps) Figure 5.25: SOCCER 4CIF VQM As in the previous sequence, the VQM values for the codec are lower than the expected values.

61 53 The complete tables of the metric s values along with the complete set of figures are shown in the Appendix. 5.2 Subjective metrics The first sequence is the CITY sequence. The following figures show the DSCQS and the DSIS mean opinion score of the CIF format sequence. SEQUENCE A - DSCQS 5 4 MOS BITRATE (kbps) Figure 5.26: CITY CIF DSCQS SEQUENCE A - DSIS 10 8 MOS BITRATE (kbps ) Figure 5.27: CITY CIF DSIS It is obvious that scored higher than. However the differences are small. follows and last is the.

62 54 The 4CIF format has the same results. The values have a tendency to get closer at higher bitrates. This is normal because the differences between the sequences at high bitrates are practically unnoticeable. SEQUENCE A -DSCQS 5,0 MOS 4,5 4,0 3,5 3, BITRATE (Mbps) Figure 5.28: CITY 4CIF DSCQS SEQUENCE A -DSIS MOS BITRATE (Mbps) Figure 5.29: CITY 4CIF DSIS The next sequence is the CREW sequence. This video contains multiple human faces. Thus the scores of the observers are very important. In contrast with the CITY sequence, the codec scored higher than. This happens mostly on CIF format.

63 55 SEQUENCE B - DSCQS 4 MOS BITRATE (kbps) Figure 5.30: CREW CIF DSCQS SEQUENCE B - DSIS 8 MOS BITRATE (kbps) Figure 5.31: CREW CIF DSIS The 4CIF format has pretty much the same results. The and the codecs continuously change place between the first and the second place. SEQUENCE B -DSCQS MOS 5,0 4,5 4,0 3,5 3,0 2,5 2, BITRATE (Mbps) Figure 5.32: CREW 4CIF DSCQS

64 56 SEQUENCE B -DSIS MOS BITRATE (Mbps) Figure 5.33: CREW 4CIF DSIS The last sequence is the SOCCER sequence. The performance of the codecs on this sequence is important due to the high motion content of the sequence. On CIF format the and the codec have similar performance. SEQUENCE C - DSCQS 4 MOS BITRATE (kbps) Figure 5.34: SOCCER CIF DSCQS SEQUENCE C - DSIS 8 MOS BITRATE (kbps) Figure 5.35: SOCCER CIF DSIS

65 57 The main difference here is that on 4CIF format, the codec has very good performance which is similar and in some occasions even better than and. SEQUENCE C -DSCQS MOS 5,0 4,5 4,0 3,5 3,0 2,5 2, BITRATE (Mbps) Figure 5.36: SOCCER 4CIF DSCQS SEQUENCE C -DSIS 10 9 MOS BITRATE (Mbps) Figure 5.37: SOCCER 4CIF DSIS 5.3 Comparison It is clear that the codec along with the codec are performing better than and. The performance of these codecs depends on many factors. Firstly it is the bitrate and secondly it is the content of the video sequence. According to the previous, each one of these two codecs has the highest performance under different conditions.

66 58 The objective and subjective metrics in most cases are producing similar results. However a closer look can reveal that the codec has a slight advantage according to the objective methods whereas shows an improved performance according to the subjective methods. In most cases the values at the lower bitrates are more distinct but they have a tendency to converge at higher bitrates. Thus an easy way to spot the different behavior of each codec is to examine the lower bitrate sequences. The next figures show the screen shots of the same frame of all the four codec. The sequence is the CREW sequence at 192 Kbps. Figure 5.38: CREW CIF 192 Kbps frame 124 original Figure 5.39: CREW CIF 192 Kbps frame 124

67 59 Figure 5.40: CREW CIF 192 Kbps frame 124 Figure 5.41: CREW CIF 192 Kbps frame 124

68 60 Figure 5.42: CREW CIF 192 Kbps frame 124 It is obvious that and perform better. It is also noticeable that has a tendency to slightly blur the edges of the objects in order to provide a better looking image. This is observed in all sequences. As the bitrates are getting higher the differences between the codecs are getting smaller. Thus between the codecs which have similar values at high bitrates are difficult to spot any differences. When the DSCQS values are higher than 4,5 it means that the sequences are undistinguishable (Baroncini et al., 2004). The same can be assumed for DSIS values higher than 9. According to the combined results of the objective and the subjective metrics, the quality of the codec at 6 to 8 Mbps is achieved by 3 to 4 Mbps bitrate of the or the codec. During the creation of the encoded files, the encoding time was recorded. Of course these times are just indicative and reflect only the codecs behavior. The encoding times also depend on the computational power of the computer. Nevertheless, these times show a clear picture about the performance of the codecs in terms of speed. The following figure shows the encoding times of the CREW sequence.

69 61 CREW 352x x ,3 2,9 3,3 8,0 8,2 9,3 2,5 3,1 3,9 9,0 9,7 9,9 2,8 3,4 4,5 11,9 12,9 16,6 9 15,2 17,9 18,7 71,5 79,8 90,0 Table 5.1: CREW files encoding times (sec) It is obvious that 9 is by far more time consuming than the other three codecs, even from the codec which has similar quality performance. However, this behavior is expected, because according to the theory (see chapter 2) the 9 encoded file will need less computational power for decoding than the other codecs (Golston and Rao, 2006). Moreover, according to other studies, 9 requires significantly less decoding computational power even from (Holcomb et al., 2004). The next table shows the file sizes of the CREW sequence. CREW 352x x Table 5.2: CREW file sizes (KB) The size of the resulted files is similar. This is expected because all the files have the same bitrate. However it is noticeable that the 9 files are a bit larger especially at higher bitrates. The complete tables are presented in Appendix B.

70 62 6 Conclusion Video quality evaluation is a complex and time consuming process. The assessment was divided into two phases. The first was the objective and the second was the subjective evaluation. The objective was based on metrics that are able to assess the quality of the video. They are usually based on mathematical models and their results can be easily reproduced under the same conditions. The subjective involves the participation of human beings that evaluate the quality of the video sequence. The chosen sequences were three and their selection was based on the variety of their contents. The sequences resolution was 4CIF for the evaluation of standard definition video and CIF for the evaluation of video for mobile devices. For the 4CIF format the research is focused on the digital SDTV. DVB-T is the main medium that is able to send video to our homes. Most DVB-T channels have a bandwidth of 8 MHz. So, according to the used modulation and the parameter configuration, an 8 MHz channel can provide from 4,98 to 31,67 Mbps (Aaltonen et al., 2009). The average bitrate of a DVD quality sequence is 4 to 6 Mbps (Taylor, 2001). Taking into account the results of the evaluation, the analogous to this quality bitrate of the and the codec is 2 to 4 Mbps. This is in accordance to various other studies. Jack (2007) shows that SDTV quality at 3,5 Mbps is equal to 1,5 to 2,5 Mbps of MPEG-4 AVC. Also, Kalva and Lee (2008) show that a 6,3 Mbps is equal to 2,5 Mbps VC-1. Of course as it was proved in the previous chapter, the resulted quality of a codec depends on the content of the video sequence. So, if VC-1 or MPEG-4 AVC with a bitrate of 4 Mbps is used on a DVB-T 30 Mbps channel, the channel can carry up to 7 video streams of DVD quality. If the bitrate is reduced to 2 Mbps the channel will double its capacity and the quality will be kept in a very good level. For the CIF format the results are pretty much the same. MPEG-4 AVC and VC-1 perform better than and much better than. Of course there are some

71 63 variations to the results that are due to the contents of the video and the used bitrate. As in 4CIF format the values on the results have a tendency to converge when the bitrates are higher. So, a better view of the codecs performance is shown at the lower bitrates. The comparison between the objective and subjective results, shows that generally VC-1 has better performance by the objective metrics, whereas MPEG-4 AVC scores higher by the subjective metrics. Of course the differences between the two codecs are very small, especially when the bit rates are high. These results are also confirmed by various studies such as Kalva and Lee s (2008). The previous study shows that MPEG-4 AVC performs better at same bitrates according to the subjective metrics. Another goal of this research is the evaluation of the used tools. The objective metrics tool is the MSU Video Quality Measurement Tool and the subjective metrics tool is the MSU Perceptual Video Quality Tool. The results from both tools are accurate and they are in accordance with the majority of the relevant studies. Both tools are based on the standards and the recommendations of the groups and forums that they specialize in this area and they are highly accepted by the research community. Future Work During the evaluation process, the objective metrics tool has produced large amounts of data that are not processed. These data include PSNR, MSE, SSIM and VQM values for all the frames of all the video sequences. This means that a more detailed study can be performed, by examining each codec s performance in relation to specific frames. That is going to give a more thorough examination of the brightness, the color, and the motion behaviour of each codec. Of course this is going to be a tremendous endeavour with the frame by frame examination of the encoded files. HDTV video is also a very promising area for future work. Nowadays the invasion of the HDTV is imminent. High definition content has already invaded to our homes by

72 64 Blu-Ray video and various internet transmissions. The evaluation of HD content is essential to the development of HDTV. The standards are already set by various recommendations. Another newly developed technology is the 3DTV. Three dimensional video is still in a developing stage. However 3D video content was broadcasted during the recent football World Cup. So, this area is very promising and it has huge potentials for further development. Last but not least are the testing and the evaluation of the newly approved recommendation by the International Telecommunication Union. This is the ITU-R BT recommendation that was released in September As the previous BT.500 recommendations it is related to the subjective assessment methodology.

73 65 References Aaltonen E., Jolma P., Penttinen J.T.J. and Vare J. (2009), The DVB-H Handbook: The functioning and planning of mobile TV, Chichester, West Sussex, United Kingdom, John Wiley & Sons Ltd. Ali M. (2008), The latest advances in video compression and the MPEG family, University of Pitesti, Electronics and computers science, Scientific bulletin, No 8, Vol. 1, 2008, pp Baroncini V., Fenimore C., Oelbaum T. and Tan T.K., (2004), Subjective quality assessment of the emerging AVC/H.264 video coding standard, in Proc. International Broadcasting Convention (IBC), Amsterdam, Netherlands, Biström J. (2005), Comparing Video Codec Evaluation Methods for Handheld Digital TV, T Research Seminar on Digital Media, Helsinki University of Technology, Spring Blu-ray Disk Association (2005), Audio visual application format specifications for BD-ROM, White paper, Blu-ray Disk Format, March Bock M. A. (2009), Video Compression Systems, London, United Kingdom, The Institution of Engineering and Technology. Bovic A. C., Sheikh H. R., Simoncelli E. P. and Wang Z. (2004), Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, vol. 13, no. 4, Apr. 2004, pp Ghanbari M. (2003), Standard Codecs: Image Compression to Advanced Video Coding, London, United Kingdom, The Institution of Electrical Engineers.

74 66 Golston J. (2004), Comparing media codecs for video content, Embedded Systems Conference, San Francisco, Golston J. and Rao A.(2006), Video codecs tutorial: Trade-offs with H.264, VC-1 and other advanced codecs, Embedded Systems Conference, Silicon Valley, Hewage C. (2009), Image and video quality assessment, Multimedia Communications lecture notes, Kingston University, London. Holcomb T., Hsu P., Liang J., Lin B., Lee M., Mukerjee K., Regunathan S.L., Ribas- Corbera J. and Srinivasan S.(2004), Windows Media Video 9: overview and applications, Signal Processing: Image Communication, Volume 19, Issue 9, October 2004, pp Index of ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/, [Internet], Institut fur Informationsverarbeitung, Leibniz Universitat Hannover, <ftp://ftp.tnt.unihannover.de/pub/svc/testsequences/>, [Accessed March 2010]. ITU-R Recommendation BT (2002), Methodology for the Subjective Assessment of the Quality of Television Pictures, International Telecommunication Union, Geneva, Switzerland, ITU-T Recommendation P.910 (2008), Subjective video quality assessment methods for multimedia applications, International Telecommunication Union, Geneva, Switzerland, Jack K. (2007), Video Demystified, Fifth Edition, Linacre House, Jordan Hill, Oxford, UK, Elsevier Inc. Kalva H. and Lee J.B., (2008), The VC-1 and H.264 Video Compression Standards for Broadband Video Services, New York, NY, Springer Science+Business Media, LLC.

75 67 Loomis J. and Wasson M. (2007), VC-1 Technical Overview, Microsoft Corporation, October Ma Z. and Tucker D. W. (2008), Adapting x264 to Asynchronous Video Telephony for the Deaf, Proceedings of South African Telecommunications Networks and Applications Conference, (SATNAC 2008), Wild Coast Sun, Eastern Cape, South Africa, 2008, pp Nemčić O., Rimac-Drlje S. and Vranješ M. (2007), Comparison of H.264/AVC and MPEG-4 Part 2 Coded Video, Proceedings ELMAR-2007, Croatian Society Electronics in Marine ELMAR, Zadar, 2007, pp Regunathan L. S. and Srinivasan S. (2005), An Overview of VC-1, Proceedings of SPIE, Bellingham WA, 2005, Vol. 5960, pp Richardson E. G. I. (2002), Video Codec Design: Developing Image and Video Compression Systems, Chichester, West Sussex, England, John Wiley & Sons Ltd. Richardson E. G. I. (2003), H.264 and MPEG-4 Video Compression, Chichester, West Sussex, England, John Wiley & Sons Ltd. SMPTE 421M (2006), VC-1 Compressed Video Bitstream Format and Decoding Process, Society of Motion Picture and Television Engineers, Taylor J. (2001), DVD Demystified, Second Edition, New York, NY, McGraw-Hill. VQEG (2003), Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, Phase II, Video Quality Experts Group, August 2003.

76 68 Wang Y. (2006), Survey of Objective Video Quality Measurements, Technical Report WPI-CS-TR-06-02, EMC Corporation Hopkinton, MA 01748, USA. Winkler S. (2005), Digital Video Quality Vision Models and Metrics, Chichester, West Sussex, England, John Wiley & Sons Ltd. Winkler S. (2009), Video Quality Measurement Standards Current Status and Trends, Proc. 7th International Conference on Information, Communications and Signal Processing (ICICS), Macau, Dec. 7-10, Software AviSynth, ver 2.58, Ben Rudiak-Gould et al., MSU Perceptual Video Quality Tool, ver 1.0, MSU Graphics & Media Lab, Video Group, MSU filters and codecs, MSU Video Quality Measurement Tool, ver 2.5, MSU Graphics & Media Lab, Video Group, MSU filters and codecs, YUV to AVI Converter, ver 2.3, Stewart Worrall, University of Surrey, UK, June 2001.

77 69 Appendix A Figures and Tables A.1 Objective metrics Test Sequence A PSNR-Y , , , , , , , , , , , , , , , , , , , , , , , ,16653 PSNR - U , , , , , , , , , , , , , , , , , , , , , , , ,53911 PSNR - V , , , , , , , , , , , , , , , , , , , , , , , ,88091 MSE - Y , , , , , , , , , , , , , , , , , , , , , , , ,49131

78 70 MSE - U , , , , , , , , , , , , , , , , , , , , , , , ,91027 MSE - V , , , , , , , , , , , , , , , , , , , , , , , ,53087 SSIM , , , , , , , , , , , , , , , , , , , , , , , ,98612 VQM , , , , , , , , , , , , , , , , , , , , , , , ,48093

79 71 SEQUENCE A - CIF PSNR-Y BITRATE (kbps ) SEQUENCE A - 4CIF PSNR-Y BITRATE (Mbps) SEQUENCE A - CIF 46 PSNR-U BITRATE (kbps ) SEQUENCE A - 4CIF 50 PSNR-U BITRATE (Mbps)

80 72 SEQUENCE A - CIF 50 PSNR-V BITRATE (kbps ) SEQUENCE A - 4CIF 52 PSNR-V BITRATE (Mbps) SEQUENCE A - CIF MSE-Y BITRATE (kbps ) SEQUENCE A - 4CIF 20 MSE-Y BITRATE (Mbps)

81 73 SEQUENCE A - CIF MSE-U 4,0 3,5 3,0 2,5 2,0 1,5 1,0 BITRATE (kbps ) SEQUENCE A - 4CIF MSE-U 3,0 2,5 2,0 1,5 1,0 0,5 0, BITRATE (Mbps) SEQUENCE A - CIF 2,5 2,0 MSE-V 1,5 1,0 0,5 0,0 BITRATE (kbps ) SEQUENCE A - 4CIF 1,2 MSE-V 1,0 0,8 0,6 0, BITRATE (Mbps)

82 74 SEQUENCE A - CIF 1,00 0,95 SSIM 0,90 0,85 0,80 0,75 BITRATE (kbps) SEQUENCE A - 4CIF SSIM 1,00 0,98 0,96 0,94 0,92 0,90 0, BITRATE (Mbps) SEQUENCE A - CIF 2,5 2,0 VQM 1,5 1,0 0,5 0,0 BITRATE (kbps ) SEQUENCE A - 4CIF 1,5 VQM 1,0 0,5 0, BITRATE (Mbps)

83 75 Test Sequence B PSNR-Y , , , , , , , , , , , , , , , , , , , , , , , ,61613 PSNR - U , , , , , , , , , , , , , , , , , , , , , , , ,21381 PSNR - V , , , , , , , , , , , , , , , , , , , , , , , ,06390 MSE - Y , , , , , , , , , , , , , , , , , , , , , , , ,24630 MSE - U , , , , , , , , , , , , , , , , , , , , , , , ,23509 MSE - V , , , , , , , , , , , , , , , , , , , , , , , ,80666 SSIM , , , , , , , , , , , , , , , , , , , , , , , ,97929 VQM , , , , , , , , , , , , , , , , , , , , , , , ,49564

84 76 SEQUENCE B - CIF PSNR-Y BITRATE (kbps ) SEQUENCE B - 4CIF PSNR-Y BITRATE (Mbps) SEQUENCE B - CIF 45 PSNR-U BITRATE (kbps ) SEQUENCE B - 4CIF PSNR-U BITRATE (Mbps)

85 77 SEQUENCE B - CIF 45 PSNR-V BITRATE (kbps ) SEQUENCE B - 4CIF PSNR-V BITRATE (Mbps) SEQUENCE B - CIF MSE-Y BITRATE (kbps ) SEQUENCE B - 4CIF MSE-Y BITRATE (Mbps)

86 78 SEQUENCE B - CIF 20 MSE-U BITRATE (kbps ) SEQUENCE B - 4CIF 5,0 4,0 MSE-U 3,0 2,0 1,0 0, BITRATE (Mbps) SEQUENCE B - CIF 20,0 MSE-V 15,0 10,0 5,0 0,0 BITRATE (kbps) SEQUENCE B - 4CIF 5,0 4,0 MSE-V 3,0 2,0 1,0 0, BITRATE (Mbps)

87 79 SEQUENCE B - CIF 1,00 SSIM 0,95 0,90 0,85 0,80 BITRATE (kbps) SEQUENCE B - 4CIF 1,00 0,98 SSIM 0,96 0,94 0,92 0, BITRATE (Mbps) SEQUENCE B - CIF 2,0 VQM 1,5 1,0 0,5 0,0 BITRATE (kbps ) SEQUENCE B - 4CIF VQM 1,2 1,0 0,8 0,6 0,4 0,2 0, BITRATE (Mbps)

88 80 Test Sequence C PSNR-Y , , , , , , , , , , , , , , , , , , , , , , , ,21691 PSNR - U , , , , , , , , , , , , , , , , , , , , , , , ,22918 PSNR - V , , , , , , , , , , , , , , , , , , , , , , , ,31752 MSE - Y , , , , , , , , , , , , , , , , , , , , , , , ,95609 MSE - U , , , , , , , , , , , , , , , , , , , , , , , ,77654 MSE - V , , , , , , , , , , , , , , , , , , , , , , , ,48010 SSIM , , , , , , , , , , , , , , , , , , , , , , , ,98486 VQM , , , , , , , , , , , , , , , , , , , , , , , ,41098

89 81 SEQUENCE C - CIF PSNR-Y BITRATE (kbps ) SEQUENCE C - 4CIF PSNR-Y BITRATE (Mbps) SEQUENCE C - CIF 50 PSNR-U BITRATE (kbps ) SEQUENCE C - 4CIF 50 PSNR-U BITRATE (Mbps)

90 82 SEQUENCE C - CIF 48 PSNR-V BITRATE (kbps ) SEQUENCE C - 4CIF 52 PSNR-V BITRATE (Mbps) SEQUENCE C - CIF MSE-Y BITRATE (kbps ) SEQUENCE C - 4CIF 15 MSE-Y BITRATE (Mbps)

91 83 SEQUENCE C - CIF 15 MSE-U BITRATE (kbps ) SEQUENCE C - 4CIF 4,0 MSE-U 3,0 2,0 1,0 0, BITRATE (Mbps) SEQUENCE C - CIF 5,0 4,0 MSE-V 3,0 2,0 1,0 0,0 BITRATE (kbps ) SEQUENCE C - 4CIF 2,0 MSE-V 1,5 1,0 0,5 0, BITRATE (Mbps)

92 84 SEQUENCE C - CIF 1,00 0,95 SSIM 0,90 0,85 0,80 0,75 BITRATE (kbps) SEQUENCE C - 4CIF 1,00 0,98 SSIM 0,96 0,94 0,92 0, BITRATE (Mbps) SEQUENCE C - CIF 2,0 VQM 1,5 1,0 0,5 BITRATE (kbps ) SEQUENCE C - 4CIF 1,2 1,0 VQM 0,8 0,6 0,4 0, BITRATE (Mbps)

93 85 A.2 Subjective metrics CITY DSCQS ,31 1,25 2,19 3,05 3,52 4,37 0,94 1,64 2,81 3,83 4,06 4,77 1,56 2,50 3,98 4,29 4,53 4,92 1,72 2,11 3,53 4,24 4,46 4,77 CITY DSIS ,63 1,09 4,38 5,31 7,50 8,75 1,72 3,44 6,25 7,03 7,97 8,44 3,59 4,69 7,81 8,13 9,53 9,38 3,91 4,21 7,19 8,13 9,09 9,69 CREW DSCQS ,00 0,78 2,42 2,34 3,98 4,22 0,23 1,87 2,50 2,97 4,30 4,69 1,33 2,03 3,05 4,30 4,45 4,61 0,94 2,27 3,75 3,67 4,24 4,44 CREW DSIS ,00 1,56 4,38 4,69 7,34 8,44 0,31 2,66 5,31 6,25 8,28 8,75 3,08 3,28 6,41 6,25 8,59 9,53 2,90 3,91 7,19 6,88 8,13 9,06 SOCCER DSCQS ,08 1,17 3,20 2,97 4,19 4,30 0,55 1,87 3,28 3,98 4,53 5,00 1,80 2,50 3,52 3,67 4,78 5,00 1,80 2,03 3,42 3,52 4,30 4,61 SOCCER DSIS ,31 3,44 5,63 6,25 8,88 9,13 1,09 3,75 6,03 7,34 9,22 9,22 3,59 4,53 6,56 7,50 9,06 9,53 3,46 4,84 6,91 7,66 8,91 9,22

94 86 SEQUENCE A - DSCQS 5 4 MOS BITRATE (kbps) SEQUENCE A -DSCQS 5,0 MOS 4,5 4,0 3,5 3, BITRATE (Mbps) SEQUENCE A - DSIS 10 8 MOS BITRATE (kbps ) SEQUENCE A -DSIS MOS BITRATE (Mbps)

95 87 SEQUENCE B - DSCQS 4 MOS BITRATE (kbps) SEQUENCE B -DSCQS MOS 5,0 4,5 4,0 3,5 3,0 2,5 2, BITRATE (Mbps) SEQUENCE B - DSIS 8 MOS BITRATE (kbps) SEQUENCE B -DSIS MOS BITRATE (Mbps)

96 88 SEQUENCE C - DSCQS 4 MOS BITRATE (kbps) SEQUENCE C -DSCQS MOS 5,0 4,5 4,0 3,5 3,0 2,5 2, BITRATE (Mbps) SEQUENCE C - DSIS 8 MOS BITRATE (kbps) SEQUENCE C -DSIS 10 9 MOS BITRATE (Mbps)

97 89 B. Encoding times and File sizes B.1 Encoding times (sec) CITY 352x x ,2 3,0 3,2 6,2 7,1 9,1 2,3 3,0 3,5 7,8 8,9 9,9 2,8 3,3 4,2 11,6 13,2 14,4 9 13,1 16,7 17,5 65,3 69,1 75,5 CREW 352x x ,3 2,9 3,3 8,0 8,2 9,3 2,5 3,1 3,9 9,0 9,7 9,9 2,8 3,4 4,5 11,9 12,9 16,6 9 15,2 17,9 18,7 71,5 79,8 90,0 SOCCER 352x x ,8 3,3 3,9 9,3 9,9 11,4 3,2 3,5 4,2 10,2 10,6 10,8 3,7 4,0 5,5 13,0 13,7 16,3 9 16,9 17,9 19,2 68,4 69,2 75,6 B.2 File sizes (KB) CITY 352x x CREW 352x x SOCCER 352x x