EE-5359 Multimedia Processing Project Report. Performance analysis of Dirac video, AVS-China video and AAC audio codec

EE-5359 Multimedia Processing Project Report Performance analysis of Dirac video, AVS-China video and AAC audio codec Under guidance of Dr.K.R.Rao Submitted By, ASHWINI S URS M.S.E.E ID # 1000646070 1

Acknowledgement I would like to thank Dr. K.R.Rao for his constant guidance, support and motivation throughout the project which led to successful completion of the project. I would also like to thank Dr. Kim for his support and the software provided by him which helped me to clip the frames in a sequence towards successful completion of the project. Finally, I would like to thank all my friends for their support without which it would not have been possible for me to complete this project. 2

Acronyms: AAC Advanced audio coding. ADIF Audio data interchange format. ADTS Audio data transport stream. AES Audio engineering society. AFC Adaptation field control. ATSC Advanced system television committee. AVC Advanced video coding. AVS Audio video coding standard. BBC British broadcasting corporation. CIF Common intermediate format. HDTV High definition television. HL High low component. HH High high component. ICT Integer cosine transform. IEC International electrotechnical commission. IPTV Internet protocol television. ISO International organization for standardization. Fps Frames per second. KB Kilo bytes. KBD Kaiser-Bessel derived. Kbps Kilo bits per second. KBps Kilo bytes per second. LL Low low component. LH Low high component. MB Macro block. Mbps Mega bits per second. MDCT Modified discrete cosine transform. MPEG Moving picture experts group. MSE Mean square error. M/S Mid/Side. 3

MV Motion vector. PSNR Peak-to-peak signal to noise ratio. Q Quantization. QCIF Quarter common intermediate format. QF Quality factor. SDTV Standard definition television. SPIE Society of photographic instrumentation engineers. SSIM Structural similarity index metric. SSR Scalable sampling rate. TNS Temporal noise shaping. VCIR Visual communication and image representation. 4

Contents 1. List of figures... 7 2. List of tables... 8 3. Introduction... 9 4. Dirac video codec 4.1 Encoder..... 10 4.2 Wavelet transform... 11 4.3 Scaling and quantization... 12 4.4 Entropy coding... 12 4.5 Motion estimation... 13 4.6 Motion compensation... 15 4.7 Decoder... 15 5. AVS-China video codec 5.1 Profiles and levels.16 5.2 Video coding chain...17 5.3 Encoder..19 5.4 RD Optimization 20 5.5 Decoder.. 21 6. AAC audio codec...22 6.1 AAC audio encoder and decoder 22 5

7. Results 25 7.1 Performance analysis of Dirac video 25 7.2 Performance analysis of AVS-china video 33 7.3 Performance analysis of AAC audio 41 8. Conclusion...43 9. Future work...43 10. References...44 6

1. List of figures Figure.1 Dirac encoder block diagram. Figure.2 Wavelet transform block diagram. Figure.3 Stages of wavelet transform. Figure.4 Wavelet transform frequency decomposition. Figure.5 Dead zone quantizer. Figure.6 Entropy coding architecture. Figure.7 Hierarchical motion estimation. Figure.8 Search patterns in Dirac. Figure.9 Frame prediction in Dirac. Figure.10 Modes of Splitting MB Figure.11 Dirac decoder block diagram. Figure.12 Video coding chain of AVS-China. Figure.13 GOP structure of AVS-China. Figure.14 Macro block format. Figure.15 AVS-China encoder block diagram. Figure.16 AVS-China decoder block diagram. Figure.17 AAC encoder block diagram. Figure.18 AAC decoder block diagram. 7

2. List of Tables Table.1 Application based profiles of AVS. Table.2 Different parts of AVS-China standard. Table.3 Performance of Dirac for Akiyo sequence. Table.4 Performance of Dirac for Tempete sequence. Table.5 Performance of Dirac for Night move sequence. Table.6 Performance of Dirac for Harbor sequence. Table.7 Performance of AVS for Akiyo sequence. Table.8 Performance of AVS for Tempete sequence. Table.9 Performance of AVS for Night move sequence. Table.10 Performance of AVS for Harbor sequence. Table.11 Performance of AAC audio. 8

3. Introduction In today s digital world, with the advancement in the technology to SDTV, HDTV, compression plays a prominent role. Today, we enjoy watching all the events taking place in various parts of the world lively telecasted at minimum expense. This has been possible due to compression achieved in the video and audio broadcasted. Compression is thus used to exploit the limited storage capacity and data bandwidth as efficiently as possible. There is a plethora of video and audio coding standards in competent with each other available to exploit the compression. Some of the popular video coding standards are H.264 (MPEG-4 Part- 10), VC-1, AVS-china, Dirac and audio coding standards are MPEG-1 layers I, II, III, AAC, HE-AAC. The performance of the video/audio codecs can be analyzed which enables us to have a complete understanding of the codecs and to choose the best amongst them. Every codec will be designed to perform at its best towards a specific application. Thus, the need to analyze and choose for our application becomes necessary. Hence, in this project, I propose to analyze the performance of Dirac video codec [1], audio video coding standard (AVS) video [2] and advanced audio coding (AAC) audio codec [3]. Dirac video was developed by the British broadcasting corporation [1]. This has been already used for broadcasting the Olympics held at Beijing, China in 2008. Dirac was named after famous physicist Paul Dirac. It is an open technology which means that it involves no license, royalty fees and available without any fees. Audio-video coding standard (AVS) is a working group of audio and video coding standard in China, which was established in 2002. Advanced audio coding (AAC) [2, 3], is a combination of state-of-the-art technologies for high-quality multichannel audio coding from organizations namely, AT&T Corp., Dolby Laboratories, Fraunhofer Institute for Integrated Circuits (Fraunhofer IIS), and Sony Corporation. AAC is one of the popularly used audio codecs due to efficiency and dynamic range of operation. 9

4. Dirac video codec Dirac is a hybrid video codec developed by British Broadcasting Corporation (BBC). The key feature of Dirac is that it is an open technology, which means that the technology can be used without payment of licensing fees. Dirac is a hybrid video codec because it involves both transform and motion compensation. Motion compensation is used to remove any temporal redundancy in data and transform is used to remove the spatial redundancy [1]. Dirac uses modern techniques like, wavelet transform and arithmetic coding for entropy coding. The image motion is tracked and the motion information is used to make a prediction of a later frame. A transform is applied to the predicted frame and the transform coefficients are quantized and entropy coded. The applications of Dirac range from high definition television (HDTV) to web streaming due to its flexibility. Dirac compresses pictures from low resolution of 176 144 pixels (QCIF) to 1920 1080 (HDTV). However, Dirac promises improvements in quality and significant amount of savings in data rate over other codecs like H.264/VC-1[26]. 4.1 Dirac Encoder: Fig.1 Dirac Encoder block diagram [1]. 10

4.2 Wavelet transforms: Dirac uses wavelet transform on the entire picture at once providing flexibility to operate at several resolution ranges. When the transform is applied, the wavelet filters split the signal into 4 frequency subbands namely LL (Low-Low), LH (Low-High), HL (High-Low) and HH (High-High). For our sequence the filter is applied both horizontally and vertically. Since, LL sub-band consists of most significant information, for further stages the LL is decomposed and the rest can be discarded. This decomposition is carried out up to 4 stages. The discrete wavelet transform retains the finer details though data is roughly de-correlated in a frequency-sensitive manner [27]. Within the Dirac wavelet filters, the encoded data is as shown in figure2. Fig. 2 Wavelet transform block diagram [28] The choice of filters having compact impulse responses to reduce ringing artifacts caused by wavelets is essential. So, Daubechies wavelet filters are used to transform and divide the data in sub-bands which then are quantized with the corresponding RDO (rate distortion optimization) parameters and then variable length encoded. At the decoder these stages are reversed [28]. The 2 stage decomposition of wavelet transform with only LL sub-band is shown in figure.3. Also, the decomposition of bands to various sub-bands is shown in figure.4. Fig.3 Stages of wavelet transform [1] 11

Fig. 4 Wavelet transform frequency decomposition [25] 4.3 Scaling and Quantization Scaling is the next stage after transform and is required to scale the coefficients to perform quantization. Quantization employs a rate distortion optimization algorithm to strip information from the frame data that results in as little visual distortion as possible. Dirac uses a dead-zone quantization as shown in figure.5. Fig. 5 Dead-zone quantizer [28] 4.4 Entropy Coding Entropy coding is performed to reduce the number of bits used. In Dirac, arithmetic coding is used due to its flexibility, lossless compression and efficiency [30]. It consists of three stages: binarization, context modeling and arithmetic coding as shown in figure 6.. The principle is that whether a coefficient is small or not, it is well-predicted by its neighbors and parent. The purpose of the first stage is to provide a bit stream with easily analyzable statistics that can be encoded using arithmetic coding, which can adapt to those statistics, reflecting any local statistical features. The non-zero values in the higher frequency sub-bands of the wavelet transform are often in the same part of the picture as they are in lower frequency sub-bands. Arithmetic coding exploits the correlations provided by the statistical models and achieves better compression. 12

Fig. 6 Entropy coding architecture [29] The motion information estimated at the encoder also uses statistical modeling and arithmetic coding to compress it into the fewest number of bits. This compressed data is put into the bit stream, to be used by the decoder as part of the compressed video [24]. 4.5 Motion Estimation Motion estimation exploits temporal redundancy in video streams by finding similarities between adjacent frames. Dirac implements hierarchical motion estimation shown in figure.7 in three distinct stages. Fig. 7: Hierarchical motion estimation [31] 13

In hierarchical motion estimation, Dirac first down samples the size of the current and reference of all types of inter frames (both P and B) using low pass filter. Suitable low pass filter like FIR, IIR and CIC filters may be used. The number of down conversion levels depends upon the frame format [9]. The search pattern used in lowest level is diamond shape with the search range 5 and all other levels except the lowest level use square shape search pattern with search range 1. Figure.8 shows both search patterns[33] [24]. Fig. 8: Search patterns in Dirac [33] Initially, a list of points to be searched (candidate list) is generated. These points follow either a diamond or square pattern and are centered at coordinates pointed by the motion vector (MV). For the lowest search level, two candidate lists are generated that are centered at zero motion vector and predicted motion vector respectively with a diamond search pattern. Predicted motion vector is a spatially predicted MV, which is the median vector of left, top left and top blocks of the current block where motion estimation is carried out [33]. Dirac also defines three types of frames. Intra (I) frames, (L1) frames and (L2) frames are both inter frames that are coded with reference to other previously coded frames. A prediction structure for frame coding using a standard GOP structure is shown in figure.9 [24]. Fig. 9: Frame prediction in Dirac [34] 14

4.6 Motion Compensation Motion compensation is used to predict the present frame. Dirac uses overlapped block-based motion compensation (OBMC) to achieve good compression and avoid block-edge artifacts which would be expensive to code using wavelets. OBMC allows interaction of neighboring blocks. OBMC is performed with basic blocks arranged into macro-blocks consisting of a 4x4 array of blocks [32][24]. The OBMC overlapping function used is an integer approximation to the raised-cosine function. Each macro-block may be split in one of three ways into prediction units shown in figure.10. 4.7 Decoder Fig. 10 Modes of splitting macro-block [32] The decoder performs inverse operations of encoder and is shown in figure.11. Fig.11 Dirac decoder block diagram [14] 15

5. AVS-China video codec Audio-video coding standard (AVS) is a working group of audio and video coding standard in China, which was established in 2002. Based on versatile applications in the area of video, AVS-china is categorized into various profiles, which combine advanced video coding tools with trade-off between coding efficiency and encoder/decoder implementation complexity as well as functional properties and target to category of applications [16]. 5.1 Profiles and level: AVS-china consists of four profiles namely: Jizhun (base) profile, Jiben (basic) profile, Shenzhan (extended) profile and Jiaqiang (enhanced) profile, defined in AVS-video targeting to different applications (Table.1) [16]. The purpose of defining profiles and levels is to facilitate interoperability among streams from various applications. AVS Part 2 defines Jizhun profile which comprises of 4 levels, level 4.0 and 4.2 for standard definition (SD) video with 4:2:0 and 4:2:2, level 6.0 and 6.2 for high definition (HD) video with 4:2:0 and 4:2:2, respectively. Profiles Jizhun profile (base) Jiben profile (basic) Shenzhan profile (extended) Jiaqiang profile (enhanced) Key applications Television broadcasting, HDTV, etc. Mobility applications, etc. Video surveillance, etc. Multimedia entertainment, etc. Table.1 Application based profiles of AVS [16]. Jizhun profile is preferable for high coding efficiency on video sequences of higher resolutions, at the expense of moderate computational complexity. To fulfill the needs of multimedia entertainment, one of the major concerns of Jiaqiang profile is movie compression for high-density storage. Relatively higher computational complexity can be tolerated at the encoder side to provide higher video quality, with compatibility to AVS-Part 2 as well. The different parts of AVS video are indicated in table.2 [16].The typical video coding chain in AVS-video is shown in figure.12. 16

Part Category 1 System 2 Video 3 Audio 4 Conformance test 5 Reference Software 6 Digital media rights management 7 Mobile video 8 Transmit AVS via IP network 9 AVS file format 10 Mobile speech and audio coding Table.2 Different parts of AVS-China standard [15]. 5.2 Video coding chain : Fig.12 Video coding chain of AVS-China 17

Picture format: AVS-Part 2 is mainly intended towards SD/HDTV applications. Since, it is a generic standard it can actually code pictures with a rectangular format of up to 16K x 16K pixels in size [35]. Pixels are coded in standard YUV format i.e. YUV 4:2:0 formats. AVS supports 4:2:0, 4:2:2 and chroma formats. The GOP consists of several sequences which are divided into pictures, pictures to slices, slices to blocks, and blocks to macro blocks. The sequence, picture and slice begin with unique start codes that allow the decoder to find them within a bit stream as shown in figure.13. Fig.13 GOP structure for AVS-China [35] Sequence layer provides an entry point on to the coded video. Sequence headers should be placed in the bit stream to support the appropriate transmission of video. Repeat sequence headers may be inserted to provide random access and these are terminated with sequence end code [35]. There are three types of pictures are defined by AVS namely Intra pictures (I), Predicted pictures (P)- At most two reference frames (P or I),Interpolated pictures (B)- two reference frames (I or P or both). The slice structure provides the lowest-layer mechanism for re-synchronizing the bit stream in case of transmission error. Slices comprise an arbitrary number of raster-ordered rows of macro blocks. Block consists of transform co-efficient data for prediction errors. A macro block contains luminance and chrominance pels that represent a 16x16 sized picture. In 4:2:0 format, the chrominance pels are subsampled by a factor of 2 and henceforth in this format each chrominance pixel contains one 8x8 block [35]. 18

Fig.14 Macro Block format [35] 5.3 Encoder: The input video sequence is integer transformed and quantized. The inverse of quantization and transform is done to extract the motion vector data for motion estimation and compensation. The block diagram of AVS-2 encoder is shown in figure.15. Each input Macro Block (MB) needs to be predicted (intra predicted or inter predicted). The predicted MB is then subtracted from the original MB to obtain the prediction residue. The residue is then transformed by ICT and then quantized. The quantized coefficients along with the motion vectors (if the MB was inter-predicted) are entropy coded with 2-D VLC and bit streams obtained are transmitted to decoder side. The decoder is embedded in the encoder itself and hence we can obtain a reconstructed image after encoding and decoding. This is done so that the exact frame, the encoder uses for prediction, is used by the decoder for the reconstruction. Thus, encoder and decoder work in synchronization. If this is not maintained then the quantization error accumulates. 19

Fig.15 AVS-China encoder block diagram. 5.4 RD Optimization: For I-Frame, an RD cost is calculated for each of the intra-block mode, by equation-1 to select the best mode out of available intra modes. RD Cost (mode) = D (mode) + λ*r (mode) (1) Where λ is a lagrangian multiplier, which is derived based on the rate-cost curve optimization. RD Cost (mode) is the rate-distortion cost for particular mode for a block, and D (mode) represents the distortion if the block is coded with that mode, and R(mode) is the bit-rate produced if the block is coded with that particular mode. So to decide a block mode for one block, all the 5 mode costs are calculated and for calculating each cost the encoder needs to transform, quantize and entropy code a block with all the modes. This is because R (mode) can be known only if we encode the quantized coefficients with the entropy coding. Also to know D (mode), encoder has to go through whole procedure, because it can calculate distortion for particular mode only if reconstruct the block on its side, to calculate the difference. After calculating the best RD Cost () for all blocks in the MB, the encoder calculates the RD Cost of the MB if all the blocks are coded with MPM, if is less, DIP is used to encode the MB. 20

For P-Frames, encoder calculates cost even for inter-modes. The best mode is calculated for intraprediction. Then best inter-mode is selected based on R-D optimization and then out of these two again best is selected on the basis of R-D cost. 5.5 Decoder: Fig.16 AVS-China decoder block diagram. 21

6. AAC audio codec Advanced audio coding (AAC) [2, 3], is a combination of state-of-the-art technologies for highquality multichannel audio coding from organizations namely, AT&T Corp., Dolby labs, Fraunhofer IIS and Sony Corporation. AAC supports a wide range of sampling rates (8 96 khz), bit rates (16 576 kbps) and from one to 48 audio channels [4]. The improved compression ratio feature of AAC provides higher quality audio at the same bit rate as previous standards or same quality audio at lower bit rates [10]. 6.1 AAC encoder and decoder: AAC consists of three profiles, namely: main, low-complexity and scalable sampling rate (SSR) profile. The key feature of low-complexity profile is, it deletes the prediction tool and reduces the temporal noise shaping (TNS) tool in complexity. Hence, favorable if memory and power constraints are to be met. Fig.17 AAC encoder block diagram [7] 22

Filter Bank: Audio coder firstly breaks an audio sample into segments, called blocks. A time domain filter, called a window, provides smooth transitions from block to block by modifying the data in these blocks [10]. This is done by applying modified discrete cosine transform (MDCT) to the blocks. Selection of optimal block size for the audio material is a problem faced by audio coders. AAC handles the difficulty associated with coding audio material that vacillates between steady-state and transient signals by dynamically switching between the two block lengths: 2048-samples, and 256-samples, referred to as long blocks and short blocks, respectively [10]. AAC also switches between two different types of long blocks: sine-function and Kaiser-Bessel derived (KBD) according to the complexity of the signal. Temporal Noise Shaping (TNS): The TNS technique provides enhanced control of the location, in time, of quantization noise within a filter bank window. This allows for signals that are somewhere between steady state and transient in nature. Quantization noise appears throughout the length of audio block if a transient-like signal lies at an end of a long block. The non-transient locations in the blocks are described due to the availability of greater amount of information allowed by TNS. The result is an increase in quantization noise of the transient, where masking will render the noise inaudible, and a decrease of quantization noise in the steady-state region of the audio block. Also, TNS can be applied to whole or a part of frequency spectrum, or, such that the time-domain quantization can be controlled in a frequency-dependant fashion[10]. Intensity Stereo: Intensity stereo coding is based on an analysis of high-frequency audio perception based on the energy-time envelope of the region of the audio spectrum. Intensity stereo coding allows a stereo channel pair to share a single set of spectral values for the high-frequency components with little or no loss in sound quality. This is achieved by maintaining the unique envelope for each channel by means of a scaling operation so that each channel produces the original level after decoding [10]. Prediction: The prediction module is used to represent stationary or semi-stationary parts of an audio signal. A repeat instruction can be passed rather than repeating such information subsequently which results in a reduction of redundant information. The prediction process is based on a second-order backward adaptive model in which the spectral component values of the two preceding blocks are used in conjunction with each predictor. The prediction parameter is adapted on a block-by-block basis [10]. Mid/Side (M/S) Stereo Coding: M/S stereo coding is another data reduction module based on channel pair coding. In this case channel pair elements are analyzed as left/right and sum/difference signals on a block-by-block basis. In cases where the M/S channel pair can be represented by fewer bits, the spectral coefficients are coded, and a bit is set to note that the block has utilized m/s stereo coding. During decoding, the decoded channel pair is de-matrixed back to its original left/right state [10]. Quantization and Coding: Majority of the data reduction generally occurs in the quantization phase after the data has already achieved certain level of compression when passed through the previous modules. In the AAC module, the spectral data is quantized under the control of the psychoacoustic model. The number of bits used must be below a limit determined by the desired bit rate. Huffman coding is also applied 23

in the form of twelve codebooks. In order to increase coding gain, scale factors with spectral coefficients of value zero are not transmitted [10]. Noiseless Coding: This block is nested inside the quantization and coding module. Noiseless dynamic range compression can be applied prior to Huffman coding. A value of +1/- 1 is placed in the quantized coefficient array to carry sign, while magnitude and an offset from base, to mark frequency location, are transmitted as side information. This process is only used when there is a reduction in the number of bits [10]. Fig.18 AAC decoder block diagram [2] 24

7. Performance analysis The performance of AAC audio codec, Dirac video and AVS-China video codecs were analyzed. The Dirac video codec was analyzed by varying QF from 0 to 10 and the quality metrics MSE, SSIM and PSNR were calculated. AVS-china video was analyzed at various QP ranging from 0 to 63 and quality measures like MSE, PSNR and SSIM were calculated. The test sequences used were QCIF, CIF, SDTV and HDTV. The bit rate was plotted against the QF and QP for Dirac and AVS-China respectively. The audio codec was analyzed at constant BW of 16 khz. 7.1 Performance analysis of Dirac video codec: The performance analysis of the Dirac video codec is tabulated for Akiyo (QCIF), Tempete (CIF), night move (SDTV) and Harbor (HDTV) sequences. The chroma sub-sampling format used is YUV 4:2:0 for all the four sequences. Since, luma component carries majority of the information, only this component is used for the metrics calculation. The sequences Akiyo and Tempete are displayed at 45 th frame and night move and harbor at 30 th frame are displayed. QCIF sequence: Akiyo (YUV- 4:2:0) Total No: of frames : 300 frames. Frames Used : 150 Width : 176. Height: 144. Frame rate: 30fps. QF Original File Size (KB) Compressed File Size (KB) Compression Ratio 25 Bit rate (KBps) Y-MSE Y-PSNR (db) Y- SSIM 0 5569 26 214:1 5.15 113.787 27.570 0.799 1 5569 28 199:1 5.42 83.374 28.920 0.840 2 5569 29 192:1 5.64 67.414 29.843 0.870 3 5569 31 180:1 6.12 51.613 31.003 0.896 4 5569 36 155:1 7.03 32.882 32.961 0.924 5 5569 42 133:1 8.23 19.062 35.329 0.948 6 5569 51 109:1 10.17 10.746 37.818 0.968 7 5569 66 84:1 13.06 6.128 40.257 0.978 8 5569 87 64:1 17.25 3.421 42.790 0.986 9 5569 118 47:1 23.60 2.277 44.557 0.990 10 5569 180 31:1 35.95 1.632 46.003 0.992 Lossless 5569 1277 4:1 255.21 0.000 100.000 1.000 Table.3 Performance of Dirac for Akiyo test sequence (150 frames).

Original Image Reconstructed image QF = 0 Reconstructed image QF =5 Reconstructed image QF = 10 Reconstructed image lossless mode 26

CIF sequence: Tempete (YUV- 4:2:0) Total No: of frames : 260 frames. Frames used: 90 Width : 352. Height: 288. Frame rate: 30fps. QF Original File Size (KB) Compressed File Size (KB) Compression Ratio Bit rate (KBps) Y-MSE Y-PSNR (db) Y- SSIM 0 13365 67 199:1 22.09 398.811 22.123 0.600 1 13365 76 176:1 25.21 305.017 23.288 0.678 2 13365 92 145:1 30.39 208.276 24.944 0.766 3 13365 117 114:1 38.71 136.118 26.792 0.834 4 13365 158 85:1 52.38 84.964 28.838 0.888 5 13365 224 60:1 74.45 50.610 31.088 0.929 6 13365 331 40:1 110.16 31.533 33.143 0.954 7 13365 505 26:1 168.33 20.159 35.086 0.969 8 13365 793 17:1 264.20 12.021 37.332 0.980 9 13365 1185 11:1 394.73 6.945 39.714 0.987 10 13365 1776 8:1 591.94 3.808 42.324 0.991 Lossless 13365 7866 2:1 2621.98 0.000 100.000 1.000 Table.4 Performance of Dirac for Tempete test sequence (90 frames). Original Image 27

Reconstructed image QF = 0 Reconstructed image lossless mode 28

SDTV sequence: Night move (4:2:0 format) Total No: of frames : 800 frames. Frames used: 60 frames. Width : 704. Height: 576. Frame rate: 25fps QF Original File Size (KB) Compressed File Size (KB) Compression Ratio Bit rate (KBps) Y-MSE Y-PSNR (db) Y- SSIM 0 35640 192 186:1 79.59 81.703 29.008 0.718 1 35640 295 121:1 122.59 63.850 30.079 0.735 4 35640 593 60:1 246.73 36.632 32.492 0.776 7 35640 1825 20:1 760.31 24.900 34.169 0.816 10 35640 6572 5:1 2738.13 7.941 39.132 0.934 Lossless 35640 24007 2:1 10002.79 0.000 100.00 1.000 Table.5 Performance of Dirac for night move test sequence (60 frames). Original image 29

Reconstructed night move sequence QF = 0 30

Reconstructed night move lossless mode 31

HDTV sequence: Harbor sequence (4:2:0 format) Total No: of frames : 121 frames. Frames used: 60 frames. Width : 1280. Height: 720. Frame rate: 25fps QF Original File Size (KB) Compressed File Size (KB) Compression Ratio Bit rate (KBps) 0 81000 179 453:1 74.18 1 81000 226 358:1 94.03 4 81000 633 128:1 263.62 7 81000 2272 36:1 946.42 10 81000 8754 9:1 3647.39 lossless 81000 44424 2:1 18509 Table.6 Performance of Dirac for harbor test sequence (60 frames). Original image 32

Harbor at QF = 0 Harbor at lossless mode 33

7.2 Performance analysis of AVS-china video codec: The performance analysis of the AVS-china video codec is tabulated for Akiyo (QCIF), Tempete (CIF), night move (SDTV) and Harbor (HDTV) sequences. The chroma sub-sampling format used is YUV 4:2:0 for all the four sequences. Since, luma component carries majority of the information, only this component is used for the metrics calculation. Here, B frames were not used. The sequences Akiyo and Tempete are displayed at 45 th frame and night move and harbor at 30 th frame are displayed. QCIF sequence: Akiyo (YUV- 4:2:0) Total No: of frames : 300 frames. Frames Used : 150 Width : 176. Height: 144. Frame rate: 30fps. QP Original file size(kb) Compressed file size(kb) Compression Ratio Bit rate (KBps) Y- MSE Y- PSNR (db) Table.7 Performance of AVS for Akiyo Sequence (150 frames) Y - SSIM 63 5569 9 619:1 1.69 235.363 24.413 0.678 60 5569 10 557:1 1.91 191.958 25.299 0.705 55 5569 12 464:1 2.43 111.858 27.644 0.788 50 5569 16 348:1 3.27 63.432 30.108 0.852 40 5569 32 174:1 6.53 21.116 34.885 0.934 30 5569 67 83:1 13.57 6.555 39.965 0.975 20 5569 153 36:1 31.25 1.834 45.498 0.991 10 5569 376 15:1 76.85 0.602 50.333 0.996 5 5569 480 12:1 98.27 0.438 51.714 0.997 0 5569 984 6:1 201.47 0.056 60.629 0.999 Original image QP =0 34

QP =40 QP =63 CIF sequence: Tempete (4:2:0 format) Total No: of frames : 260 frames. Frames used: 90 frames. Width : 352. Height: 288. Frame rate: 30fps QP Original file size(kb) Compressed file size(kb) Compression Ratio Bit rate (KBps) Y- MSE Y- PSNR (db) Y - SSIM 63 13365 23 581:1 7.59 523.890 20.938 0.486 60 13365 31 431:1 10.36 408.054 22.024 0.559 55 13365 51 262:1 17.35 255.269 24.061 0.687 50 13365 86 155:1 29.04 157.281 26.164 0.790 40 13365 268 50:1 91.23 52.194 30.955 0.918 30 13365 780 17:1 266.02 14.987 36.374 0.971 20 13365 1960 7:1 668.68 3.447 42.756 0.991 10 13365 4259 3:1 1453.68 0.681 49.801 0.998 5 13365 5085 2.63:1 1735.43 0.494 51.195 0.998 0 13365 7197 2:1 2456.31 0.058 60.515 0.999 Table.8 Performance of AVS for Tempete Sequence (90 frames) Original image 35

QP =0 SDTV sequence: Night move (4:2:0 format) Total No: of frames : 800 frames. Frames used: 60 frames. Width : 704. Height: 576. Frame rate: 25fps QP =63 36

QP Original file size(kb) Compressed file size(kb) Compression Ratio Bit rate (KBps) Y- MSE Y- PSNR (db) Y - SSIM 0 35640 21506 2:1 9175.72 0.046 61.486 0.999 10 35640 13777 3:1 5877.81 0.560 50.652 0.996 25 35640 4857 7:1 2072.06 7.475 39.395 0.938 40 35640 308 116:1 131.23 32.752 32.978 0.777 55 35640 49 7227:1 20.89 91.752 28.505 0.704 63 35640 30 1188:1 12.40 190.515 25.332 0.640 Table.9 performance of AVS for night move sequence (60 frames) QP = 0 37

QP =63 HDTV sequence: Harbor sequence (4:2:0 format) Total No: of frames : 121 frames. Frames used: 60 frames. Width : 1280. Height: 720. Frame rate: 25fps QP Original file size(kb) Compressed file size(kb) Compression Ratio 38 Bit rate (KBps) Y- SNR (db) 0 81000 35779 2:1 15265.69 60.86 10 81000 19367 4:1 8263.25 50.64 25 81000 4949 16:1 2111.26 40.81 40 81000 1033 78:1 440.41 33.41 55 81000 228 355:1 97.06 26.51 63 81000 117 692:1 49.70 22.95 Table.10 Performance of AVS for harbor (60 frames)

QP =0 QP =63 39

Bit rate (KBps) Bit rate (KBps) 20000 QF Vs Bit rate for Dirac 15000 10000 5000 SDTV HDTV 0 0 1 4 7 10 Lossless QF QP Vs Bitrate for AVS-China 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 10 25 40 55 63 SDTV HDTV QP 40

7.3 Performance analysis of the AAC codec: Results: File format No: of Frames in a sequence Encoding time(seconds) Decoding time(seconds) Original Size(MB) Compressed Size(MB) Compression Ratio ADTS 6257 7.1 1.16 24.4 2.01 12:1 Length of audio sequence = 2.13 minutes. Bit rate before encoding= (24.4*8)/ 7.1= 27.492Mbps Bit rate after encoding = (2.01*8)/ 7.1= 2.265Mbps Table.11 Performance of AAC audio codec The snap shots of the encoded and decoded audio sequences are indicated below. 41

8. Conclusion The performance of Dirac was analyzed by varying the QF. The quality metrics like MSE,PSNR and SSIM were calculated. For lossless mode, we observe that MSE=0,PSNR=100 and SSIM = 1 were obtained as desired. Also, at low QF we can achieve maximum compression and less bits are required for encoding and vice-versa. Also, it is observed that the amount of artifacts present in the sequence at high resolution is less compared to low resolution images. The performance of AVS-china was analyzed by varying the quantization parameter. The quality metrics like MSE, PSNR and SSIM were calculated. We observe that at higher QP the performance is degraded and vice versa. QP is inversely proportional to QF. Here, we observe that the amount of artifacts is clearly evident at QP maximum. The performance of the AAC audio codec at constant bandwidth was analyzed. We observe that the encoding time and decoding time is less and hence it is less complex. 9. Future Work To compare Dirac/AVS-China video codecs with other competent video codecs. Also, to compare AAC audio with other audio codecs. To multiplex Dirac/AVS-china video codec with AAC audio codec, demultiplex and achieve synchronization during playback 43

10. References [1] T. Borer, and T. Davies, Dirac video compression using open technology, BBC EBU Technical Review, July 2005. [2] MPEG 2 Advanced audio coding, AAC. International Standard IS 13818 7, ISO/IEC JTC1/SC29 WG11, 1997. [3] MPEG. Information technology - Generic coding of moving pictures and associated audio information, part 4: Conformance testing. International Standard IS 13818 4, ISO/IEC JTC1/SC29 WG11, 1998. [4] M. Bosi and M. Goldberg Introduction to digital audio coding and standards, Boston: Kluwer Academic publishers, c2003. [5] A. Puri, X. Chen and A. Luthra, Video coding using the H.264/MPEG-4 AVC compression standard, Signal processing: image communication, vol. 19, issue 9, pp. 793-849, Oct. 2004. [6] K. Brandenburg, MP3 and AAC Explained, AES 17th International conference, Florence, Italy, Sep. 1999. [7] P.A. Sarginson, MPEG-2: Overview of systems layer, BBC RD 1996/2. [8] Dirac software download and source code: http://diracvideo.org/download/dirac-research/ [9] AVS-china software download: ftp://159.226.42.57/public/avs_doc/avs_software [10] H. Murugan, Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback, M.S.E.E Thesis, University of Texas at Arlington, TX May 2007. [11] AVS-China official website: http://www.avs.org.cn [12] M. Uehara, Application of MPEG-2 systems to terrestrial ISDB (ISDB-T), Proceedings of the IEEE, vol.94, pp. 261-268, Jan. 2006. [13] MSU Video Quality measurement tool: http://compression.ru/video/quality_measure/vqmt_download_en.html#start [14] A. Ravi and K.R. Rao, Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4 Part 10 AVC", Submitted to Journal of VCIR, Sept. 2009. [15] L.Fan, Mobile multimedia broadcasting standards, ISBN: 978-0-387-78263-8, Springer US, 2009. [16] Lu Yu, Sijia Chen, Jianpeng Wang, Overview of AVS-video coding standards,special issue on AVS, signal processing and image communication, vol. 24, pp. 247-262, April 2009. 44

[17] Dirac video codec - A programmer's guide: http://dirac.sourceforge.net/documentation/code/programmers_guide/toc.htm [18] Digital audio compression standard (AC-3, E-AC-3), revision B, ATSC Document A/52B, Advanced Television Systems Committee, Washington, D.C., Jun. 14, 2005. [19] Video test sequences QCIF and CIF sequences: http://trace.eas.asu.edu/yuv/index.html [20] Z. Wang, et al Image quality assessment: From error visibility to structural similarity, IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. http://www.ece.uwaterloo.ca/~z70wang/ [21] L.Yu et al., Overview of AVS-Video: Tools, performance and complexity, SPIE VCIP, vol. 5960, pp. 596021-1~ 596021-12, Beijing, China, July 2005. [22] C. C. Todd, et.al, AC-3: perceptual coding for audio transmission and storage, presented at the 96th Conv. Audio Engineering Soc., 1994, Preprint 3796. [23] Power point slides by L.Yu, chair of AVS video : http://www-ee.uta.edu/dip/courses/ee5351/ispacsavs.pdf [24] A. Ravi, "Performance analysis and comparison of Dirac video codec with H.264/ MPEG-4 Part 10 AVC", M.S.E.E Thesis, University of Texas at Arlington, Tx, Aug. 2009. [25] www.atsc.org [26]MPEG-4 Part 2, ISO/IEC 14496-2, International Organization for Standardization, http://www.iso.ch [27] Dirac developer support: Wavelet transform: http://dirac.sourceforge.net/documentation/algorithm/algorithm/wlt_transform.xht [28] K. Onthriar, K. K. Loo and Z. Xue, Performance comparison of emerging Dirac video codec with H.264/AVC, IEEE International Conference on Digital Telecommunications, 2006, ICDT apos; Vol. 06, Page: 22, Issue: 29-31, Aug. 2006. [29] T. Davies, The Dirac Algorithm : http://dirac.sourceforge.net/documentation/algorithm/, 2005. [30] H. Eeckhaut, et al, Speeding up Dirac s entropy coder, Proc. 5th WSEAS Int. Conf. on Multimedia, Internet and Video Technologies, pp. 120-125, Greece, Aug. 2005. [31] /CMPT 365 Course Slides/, School of Computing Science, Simon Fraser University,fig3: [32] T. Davies, A modified rate-distortion optimization strategy for hybrid wavelet video coding, IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006, Vol.: 2, pp.: II, Publication Date: 14-19 May 2006. 45

[33] M. Tun, K. K. Loo and J. Cosmas, Semi-hierarchical motion estimation for the Dirac video codec, 2008 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp.: 1-6, Publication Date: March 31 2008-April 2 2008. [34] M. Tun and W. A. C. Fernando, An error-resilient algorithm based on partitioning of the wavelet transform coefficients for a DIRAC video codec, Tenth International Conference on Information Visualization, 2006, IV, Vol. 5-7, pp.: 615 620, Issue : July 2006. [35] W. Gao et al., AVS - The Chinese Next-Generation Video Coding Standard, NAB, Las Vegas, 2004. 46