H.264 to VP6 Transcoder

Similar documents
White paper. H.264 video compression standard. New possibilities within video surveillance.

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Motion Estimation. Macroblock Partitions. Sub-pixel Motion Estimation. Sub-pixel Motion Estimation

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Intra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

H.264/MPEG-4 AVC Video Compression Tutorial

How To Improve Performance Of The H264 Video Codec On A Video Card With A Motion Estimation Algorithm

We are presenting a wavelet based video conferencing system. Openphone. Dirac Wavelet based video codec

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201

Video compression: Performance of available codec software

Fast Hybrid Simulation for Accurate Decoded Video Quality Assessment on MPSoC Platforms with Resource Constraints

302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009

Video Coding Technologies and Standards: Now and Beyond

X264: A HIGH PERFORMANCE H.264/AVC ENCODER. Loren Merritt and Rahul Vanam*

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl

Overview: Video Coding Standards

IMPACT OF COMPRESSION ON THE VIDEO QUALITY

WHITE PAPER. H.264/AVC Encode Technology V0.8.0

THE EMERGING JVT/H.26L VIDEO CODING STANDARD

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

The H.264/MPEG-4 Advanced Video Coding (AVC) Standard

Figure 1: Relation between codec, data containers and compression algorithms.

White paper. An explanation of video compression techniques.

JPEG Image Compression by Using DCT

Multidimensional Transcoding for Adaptive Video Streaming

How To Improve Performance Of H.264/Avc With High Efficiency Video Coding (Hevc)

Thor High Efficiency, Moderate Complexity Video Codec using only RF IPR

A Look at Emerging Standards in Video Security Systems. Chris Adesanya Panasonic Network Systems Company

Compressing Moving Images. Compression and File Formats updated to include HTML5 video tag. The DV standard. Why and where to compress

Introduction to image coding

TECHNICAL OVERVIEW OF VP8, AN OPEN SOURCE VIDEO CODEC FOR THE WEB

Peter Eisert, Thomas Wiegand and Bernd Girod. University of Erlangen-Nuremberg. Cauerstrasse 7, Erlangen, Germany

Efficient Motion Estimation by Fast Three Step Search Algorithms

*EP B1* EP B1 (19) (11) EP B1 (12) EUROPEAN PATENT SPECIFICATION

Video Coding Standards and Scalable Coding

For Articulation Purpose Only

To determine vertical angular frequency, we need to express vertical viewing angle in terms of and. 2tan. (degree). (1 pt)

Understanding Compression Technologies for HD and Megapixel Surveillance

Multiple Description Coding (MDC) and Scalable Coding (SC) for Multimedia

Parametric Comparison of H.264 with Existing Video Standards

Complexity-rate-distortion Evaluation of Video Encoding for Cloud Media Computing

Video-Conferencing System

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Technical Paper. Dolby Digital Plus Audio Coding

Using AVC/H.264 and H.265 expertise to boost MPEG-2 efficiency and make the 6-in-6 concept a reality

Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video Coding

Efficient Stream-Reassembling for Video Conferencing Applications using Tiles in HEVC

GPU Compute accelerated HEVC decoder on ARM Mali TM -T600 GPUs

Rate-Constrained Coder Control and Comparison of Video Coding Standards

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Study Element Based Adaptation of Lecture Videos to Mobile Devices

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Optimizing BrightSign Video Quality

Evaluating Wavelet Tranforms for Video Conferencing Applications. Second quarter report (Oct Dec, 2008)

Introduction to Digital Video

Image Compression through DCT and Huffman Coding Technique

An Introduction to Ultra HDTV and HEVC


Conceptual Framework Strategies for Image Compression: A Review

H.264/MPEG-4 Advanced Video Coding Alexander Hermans

AUDIO CODING: BASICS AND STATE OF THE ART

Video and Audio Codecs: How Morae Uses Them

The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions

Digital Video Coding Standards and Their Role in Video Communications

Multihypothesis Prediction using Decoder Side Motion Vector Derivation in Inter Frame Video Coding

MPEG-1 and MPEG-2 Digital Video Coding Standards

Comparison of the Coding Efficiency of Video Coding Standards Including High Efficiency Video Coding (HEVC)

Transform-domain Wyner-Ziv Codec for Video

Quick Start. Guide. The. Guide

RESEARCH PROFILE: VIDEO TECHNOLOGIES FOR NETWORKED MULTIMEDIA APPLICATIONS

How To Test Video Quality On A Network With H.264 Sv (H264)

FAQs. Getting started with the industry s most advanced compression technology. when it counts

Narrow Bandwidth Streaming Video Codec

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

Enhanced Prioritization for Video Streaming over Wireless Home Networks with IEEE e

THE PRIMARY goal of most digital video coding standards

A Tutorial on Image/Video Coding Standards

DCT-JPEG Image Coding Based on GPU

Introduzione alle Biblioteche Digitali Audio/Video

Adaptive Block Truncation Filter for MVC Depth Image Enhancement

Managing video content in DAM How digital asset management software can improve your brands use of video assets

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

Solomon Systech Image Processor for Car Entertainment Application

a basic guide to video conversion using SUPER

Digital Video: A Practical Guide

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Video Encoding Best Practices

CHAPTER 2 LITERATURE REVIEW

Dynamic Region of Interest Transcoding for Multipoint Video Conferencing

Video Streaming Primer

Fast Arithmetic Coding (FastAC) Implementations

3D: How Video Compression Technology can contribute

Advances on Video Coding Algorithms for Next Generation Mobile Applications

Transcription:

EE 5359 Multimedia Processing Summer 2008 Interim Project Report on H.264 to VP6 Transcoder Submitted by Jay R. Padia 1000 60 5145 Date: July 17, 2008

Abstract VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted video coding standards in the industry. It enables high quality video at low bitrates. So there is increasing importance of techniques which can convert video from H.264 to VP6 and thereby enable high quality video transmission over the internet using Flash. The current research shows H.263 video which is a previous generation standard of H.264 can be transcoded to VP6 and complexity can be reduced upto 50%. The similarities and dissimilarities between the two encoders are used to reduce the complexity using Dynamic Search Range and Dynamic Search Window. The success in reducing complexity in the H.263 to VP6 transcoder and the available reference material related to transcoding algorithms enables us to propose a new study to find an algorithm for transcoding H.264 coding standard to VP6 coding standard. It is proposed to explore the similarities and dissimilarities between the two standards to find the right transcoding technique.

Importance of the H.264 Standard H.264 [4] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the most widely accepted industry standards. It can provide good quality video at substantially lower bitrates compared to the previous standards. It also shows more error robustness [1] [2]. H.264 has a set of innovations which can together provide a vast improvement in performance over previous generations of video codecs. MPEG-2 [21] was the most widely used video codec before the emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate. At the same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1. H.264 provides better image quality when reaching its limits. It does not break into blocks but degrades much more smoothly, making the image softer as compression increases. H.264 is an emerging standard and over the years it can see an improvement over the current performance. It can be expected of H.264 to improve over the years, just as other standards have improved in quality and performance [3]. Table 1. H.264 data rate at various resolutions [3] Overview of H.264 Standard H.264 introduces many new features that are significantly different from the previous generation codecs. These new features make it vastly different from the existing codecs and make it much more effective. Given below is an overview of the features of H.264 video codec. Profiles and levels Like any comprehensive standard, the H.264 standard defines a set of profiles and levels to set points of conformance for various classes of applications and services. In each profile, specific encoding tools are permitted to best meet the needs of the intended scenario. H.264 includes six profiles as shown in figure 1 [4]: Baseline. Intended for low-complexity applications such as video conferencing and mobile multimedia. Main. Intended for the majority of general uses such as the Internet, mobile multimedia, and stored content. Extended. Intended for streaming applications, where stream switching technologies can be beneficial. Three High profiles (also known as Fidelity Range Extension or FRExt). Consists of three separate High profiles (High, High 10, and High 4:2:2), intended for high-end professional uses [3] [5].

Fig 1. H.264 profile levels [3] 4x4 integer transform. H.264 is designed to operate on much smaller blocks of pixels than other common codecs, which mitigates blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine detail. Because the transform is a precisely specified integer transform, it provides bit-precise reconstruction (that is, exact-match decoding) rather than statistically generated reconstruction. As a result, there can be no drift among various decoder implementations, so any compliant H.264 decoder will decode the video exactly as the content author intended it to look [3] [6]. X = input matrix; C f XC f T = core 2D transformation for X; E f = matrix formed by scaling factors a, b, c Increased precision in motion estimation. H.264 also benefits from increased precision in motion estimation, which is the process of simplifying redundant data across a series of frames. By expressing information to 1/4-pixel resolution (fig 2) as opposed to 1/2-pixel resolution like most other codecs, H.264 represents both fast- and slow moving scenes more precisely. So objects in motion are more crisply reconstructed during decode, providing a better representation of the source material [7].

Fig 2. Motion vectors in H.264 [7] Flexible block sizes in motion estimation. During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly used 16x16 to as small as 4x4 as shown in fig 3, which helps to code complex motion in areas of high detail. The ability of H.264 to perform its processing on a variety of block sizes means that scenes with complicated motion are more expressively described, providing higher quality in lower data rates [7]. Fig 3(a). Macroblock partitions 16x16, 16x8, 8x16 & 8x8 [7] Fig 3(b). Macroblock sub-partitions 8x8, 8x4, 4x8 & 4x4 [7] Intraframe prediction. H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of frames, but also within a single frame, a technique called intraframe prediction (figure 4). The H.264 encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve stunning quality at much lower data rates [3] [8].

Fig 4. 4x4 block intra prediction modes in H.264 [8] Adaptively tuned deblocking filter. H.264 also features a robust deblocking filter as observed in figure 5, which operates on 4x4 block boundaries to remove jagged blocking artifacts. Its filtering is adaptively tuned per block boundary, making it a very effective smoothing filter during the decoding of a finished bit stream. In addition to making smoother pictures for display, this filter is used during the encoding process to provide a more coherent reference picture for subsequent frames, which helps to improve image quality. This advanced filter technology effectively eliminates blocking artifacts, resulting in a smooth, clean picture [9]. Fig 5(a). H.264 Encoder Basic encoding structure

Fig 5(b). H.264 Decoder Basic encoding structure VP6 Coding standard TrueMotion VP6 [10] is a new compression technology from On2 Technologies Inc. Macromedia has licensed it for its Flash suite of products [12]. It features as the main codec for Flash 8 and onwards. It has interesting features as it gives a very good quality at very high compression. TrueMotion VP6 is among the best video codecs on the market today. It offers better image quality and faster decoding performance than Windows Media 9 [22], Real 9 [23], H.264 [4], and QuickTime MPEG- 4 [10]. In internal testing at On2 Technologies Inc, TrueMotion VP6 could beat many H.264 implementations, Windows Media 9 and Real Networks 10 in PSNR comparisons using standard MPEG- 2 test source clips [10]. The VP6 clips were more detailed and contained fewer artifacts than Windows Media 9 and maintained more texture and detail than Real or H.264 [10]. VP6.2, the latest version of TrueMotion VP6, features a drastic increase in performance from the previous versions of VP6 [10]. Emerging Importance of VP6 Coding Standard Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred solution for providing video services online over Windows Media Player, Apple Quicktime and Real Networks Real Player [11]. The advantages of Flash Player over its rivals are its small size and its completeness as a website development package. Its ability to support multiple platforms has made it popular [11]. Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding standard for its Flash player in 2005. It listed quality, portability, stability, low memory usage and performance as the main criteria for selecting VP6 [12]. It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6). It provides better performance with low contrast video images, removes color oversaturation and also provides a smoother picture true to the original by removing blockiness in the old format [10].

Improvement in Performance on using VP6 Figure 6 compares the performance of Flash Video using VP6 with Flash MX, the older version which used the Sorrenson Spark codec which was based on H.263. The images in Figure 6 (with the exception of the cartoons) are excerpts from a 12:30 minute video of coral reef exploration. The original source was shot on DVCAM and was stored using photo-jpeg compression. The only tool used for compressing this video was Flix Professional, using default settings. The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All preprocessing was performed in Flix Professional. In all the comparisons listed, the image on the left side is from VP6 video. Fig 6(a). Over-saturation of colors in MX (right). [10] Fig 6(b). Blockiness can be observed in MX (right) [10]

Fig 6(c). Artificial details can be observed in MX (right) [10] Fig 6(d). Block artifacts in presence of low contrast background. VP6 performs quite well here [10] Fig 6(e). Absolute mess with MX (right) in low contrast images [10] It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the Flash MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is gaining importance as a coding standard. This creates the need to find a transcoding technique to convert video from H.264 video coding standard to VP6 video standard.

Comparison of H.264 and VP6 It would be most interesting to observe how VP6 would fare against H.264. A comparative study of Hulu s 360p (VP6 based) and 480p (H.264 based) was done (fig 7). The 360p content is VP6 at 700kbps with a screen resolution of 480 360, while the 480p is H.264 at 1000kbps (or 1Mbps) with a resolution of 640 480. Some of the screenshots of the video played side by side is shown in figure 7. Fig 7 (a). Comparison of Hulu s 360p (VP6 based) and 480p (H.264 based) videos [13]

Fig 7(b). Comparison of Hulu s 360p (VP6 based) and 480p (H.264 based) videos [13]

It can be observed that H.264 with its 480p resolution offers better quality than VP6 at 360p. But also can be found that at lower resolution and much less bitrate VP6 does not lose any information in the images. It also shows less blockiness. The color resolution on 480p outscores the lower resolution significantly. Another observation on 5 second clip in Quicktime (H.264) 640 x 480 and Flash (VP6) 720 x 540 shows that at similar resolutions, VP6 can give very high compression gains with insignificant loss in visual quality. Snapshots from each of the clips are shown in figure 8. The size of the.flv clip (5s) is 610 kbytes over the size of quicktime clip (5s) is 4223 kbytes [14]. It can be observed that VP6 gives significant compression gain at very less loss of visual quality, making it an excellent choice for video streaming applications. Fig 8(a). 720x540 flash clip Significantly small in memory size [14]

Fig 8(b). 640x480 H.264 Clip on Quicktime [14]

Existing Research work A transcoding technique to convert from the previous generation H.263 standard to VP6 standard has been proposed [15]. The transcoder has been designed on the basis of the similarities and dissimilarities between the two standards. Comparison can be found in table 2. Table 2. Comparison of H.263 and VP6 features [15] This research particularly holds importance considering the older standard Sorrenson Spark codec used in Flash MX was based on the H.263 standard. With the increasing importance of VP6 in streaming media over the internet this algorithm assumes particular importance. This research also was important in converting old Flash video formats into VP6 based new video formats. The transcoding algorithms reuse the information from the H.263 decoding stage and accelerate the VP6 encoding stage. Experimental results show that the proposed algorithms are able to reduce the encoding complexity by up to 52% while reducing the PSNR by at most 0.42 db in the worst case [15]. The goal is to effectively reuse the information gathered during the H.263 decoding stage and speed up the VP6 encoding stage. The effectiveness of this reuse depends on the similarities and differences between the input and output video formats. The differences in H.263 and VP6 make it complex to use transform domain transcoding and pixel domain transcoding was employed by the authors [15]. Transcoder H.263 to VP6 VP6 is also a hybrid codec that uses motion compensated transform coding at its core. The codec has Intra and Inter pictures similar to MPEG video codecs. Intra pictures are coded independent of other coded pictures and Inter pictures use previously coded pictures for prediction. Motion compensation supports 16x16 and 8x8 blocks similar to H.263 but the Inter 8x8 macro blocks can have mixed blocks; i.e., one or more 8x8 blocks can be coded in Intra mode without using any prediction. The Inter MBs in VP6 can be coded using 9 different modes. The modes are characterized by the number of motion vectors (1 vs. 4), reference frame used, whether motion vectors are coded. Where motion vectors are not coded, the motion vectors are predicted from previously decoded MBs. The VP6 codec uses 8x8 Integer DCT for transform coding and de-blocking filter is applied at the block boundaries [15]. It can be observed that many features in VP6 are different from H.263 but are similar to H.264. A comparison between the two standards is presented again later. The similarities and differences between H.263 and VP6 provide opportunities for reusing H.263 MB coding mode details for reducing the transcoder complexity. The fact that both H.263 and VP6 support 1

MV and 4 MV modes means that motion vectors can be reused to some extent. However, the fact that VP6 supports large number of MB modes compared to H.263 means that the H.263 MB mode and motion vectors cannot be used directly. The differences in the codecs meant that an Inter 16x16 MB in H.263 is not necessarily coded as an Inter 16x16 MB. Table 3 shows the typical example of MB coding modes when encoding H.263 decoder output using VP6. For this example, a Foreman video sequence at 352x288 resolution and 297 frames is encoded using H.263 at 384 Kbps and then transcoded to VP6 using full re-encoding at 291 Kbps. The full details of VP6 modes are not given here due to space considerations. In brief, Nearest and Near MB modes do not code motion vectors and derive their MVs from previously coded MBs; Golden frames are long term reference frames, and Inter 0,0 forces the use of a 0,0 motion vector. Each row corresponds to a H.263 MB coding mode and the columns give the VP6 mode used to code those MBs. For example, of all the MBs that are coded as Inter 4V in H.263, 3% were coded as Inter 0,0 mode, 1% coded as Intra, 30% coded as Inter+MV, 11% nearest, 7% near, and 47% are coded as Inter 4V MBs. Thus, if an Inter 4V MB in H.263 is mapped to Inter 4V in VP6, it is likely to map correctly only in 50% of the cases. Thus direct mode mapping will lead to poor results and more efficient algorithms are necessary [15]. Table 3. MB mode mapping H.263 to VP6 in [15] The large mismatch of MB coding modes will create poor RD performance if direct mapping of motion vectors is used. In [15] the patterns which allow them to restrict H.263 modes are evaluated. Near and Nearest are computationally inexpensive to evaluate and are allowed in all cases. Inter 4V, on the other hand, takes significant computation and is evaluated only when input MB is also in the Inter 4V mode. The transcoding algorithms thus reduce the complexity by placing constraints on MB modes evaluated and further reduce the complexity by using: 1) Dynamic search range and 2) Dynamic search window. Complexity Reduction Using Dynamic Search Range The dynamic search range approach sets the search range used for motion estimation for each MB. Typically this range is fixed throughout the encoding process and is set to 15 in the experiments. With the knowledge of motion vectors in H.263, the search range no longer has to be fixed. The search range is changed based on the maximum motion vector component for the current MB. Figure 9 shows the dynamic search range selection based on H.263 motion vectors. The RD performance is compared to the baseline transcoder. The results for three of the sequences evaluated are shown and the performance of the algorithm closely tracks the RD performance of the baseline transcoder. The PSNR drop is higher for the Stefan sequence because of large motion in the sequence [15].

Fig 9(a). Dynamic Search Range [15] Fig 9(b). Dyanamic Search Window [15] Complexity Reduction Using Dynamic Search Window Using a dynamic refinement window further reduces the complexity by reusing the H.263 motion vectors. Unlike the dynamic search range method where window location is fixed and the window size or search range is varied, the dynamic search window approach uses the H.263 motion vectors to determine the position of the fixed sized window. Window sizes of 1x1 and 3x3 for the new motion vector search were evaluated by the authors (fig 9(b)). This approach reduced the complexity more than the dynamic range approach due to an even smaller search space. This reduction in complexity comes at a slight increase in PSNR loss. Figure 9(b) shows the dynamic window derived based on the H.263 motion vectors of a MB. Figure 10(b) shows a RD plot comparing the dynamic window approach to the baseline approach [15]. In [15] the TMN 3.2 H.263 encoder from University of British Columbia which is based on Telenor's H.263 implementation was used. The input video is coded at 384 Kbps in baseline profile with advanced motion options and one I frame (first frame). A decoder based on the same H.263 implementation is used in the decoding stage of the transcoder. The VP6 encoding stage is based on the optimized VP6 encoder software provided by On2 Technologies. The VP6 video is encoded with I frame frequency of 120 and at multiple bitrates to assess the RD performance of the transcoder. The results are compared with the baseline transcoder that performs full encoding in the VP6 stage. Fig 10(a). RD performance -Dynamic Search [15]

Fig 10(b). RD performance - Dynamic Window [15] The results show that the proposed transcoder is able to reduce the complexity by more than 50% without a significant loss in PSNR. Given that the VP6 implementation used is highly optimized, the resulting savings of 50% is considered significant. Transcoders based on this approach will be able to transcode at least 50% more streams for the same hardware configuration. Comparison of H.264 with the current research work The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The similarities and dissimilarities in the two codecs help design the right transcoder for the application. On the same lines, a similar comparison is provided in Table 4. Its compares the VP6 features with H.264 baseline features. Certain features in H.264 which are available in Main and High profiles of H.264 are not included here. It can be observed that there are a lot of similarities between the VP6 and H.264 baseline profile, especially in the features where H.264 differs with other codecs. VP6 supports the use of integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel accuracy in the motion vectors. Feature H.263 Baseline VP6 H.264 Baseline Picture type I, P I, P I, P Transform Size 8x8 8x8 4x4 Transform DCT Integer DCT Integer DCT Intra Prediction None None Yes Motion Compensation Block Size 16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 Total MB Modes 4 10 7 inter + (9 + 4) intra Motion Vectors ½ pixel ¼ pixel ¼ pixel Deblocking filter None Yes Yes Reference Frames 1 Max 2 Multiple Table 4. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile

Various Transcoding techniques and their applications in H.264 transcoding A review paper on various techniques and research issues (fig 11) [16] involved in video transcoding compares the Open-Loop and Closed Loop Transcoder architectures. Fig 11. Selection of transcoding function for various applications [16] Open-Loop Transcoding architecture Open-Loop transcoding Architecture is the most straightforward transcoding architecture. Here a decoder and encoder are directly cascaded as shown in figure 12(a). The incoming video stream is fully decoded and re-encoded into target video with desired bit rate or format. So we find little degradation in visual quality due to transcoding. However here, decoding of a transcoded video would result in errors if the predictors of the decoder are different from those in the original encoder. These errors would accumulate through the whole group of pictures (GOP). The error accumulation resulting from encoder / decoder predictor mismatch is called drift error. Open loop transcoders contain no feedback loop in the transcoding architecture for compensating the drift error. Closed-Loop transcoders contain a feedback loop in the transcoding architecture in order to correct the transcoding distortion by compensating the drift in the transcoder [16] [17]. Fig 12(a). Cascaded decoder and encoder transcoder [16]

Fig 12(b). Cascaded decoder and encoder transcoder [16] Hybrid Domain Closed-Loop Transcoding Architecture Various transcoding algorithms provide tradeoff between the computational complexity and reconstructed video quality. In order to reduce the computational complexity while maintaining the reconstructed video quality, ME should be omitted and DCT/IDCT should be avoided if possible. One of the architecture uses MC for P frames only. I frames are intra coded, which need no ME and MC, and thus, IDCT/DCT for I frames can be omitted in principle. But since I frames are the anchors for subsequent P and B frames, the IDCT at the decoder stage, inverse quantization and IDCT at the encoder stage for I frames are still needed to reconstruct the reference frames, while DCT at the encoder stage can be omitted. Since P frames are also the anchors for the following P and B frames, MC, DCT, and IDCT cannot be omitted. Transcoding delay can be further reduced without degrading the video quality in this architecture. P frames with frequent scene changes and rapid motion may contain a large number of INTRA blocks. One can further omit the IDCT/DCT and MC operation of these INTRA blocks in P frames. In other words, blocks of I and B pictures and INTRA blocks of P pictures are transcoded in frequency-domain, the spatial-domain motion compensation is done only when the block is inter block in P frames. This transcoding architecture is known as hybrid domain transcoding architecture (HDTA), as shown in Fig. 13. Heterogenous Transcoder A heterogenous transcoder provides conversion between various standards (fig 14). A heterogeneous transcoder needs a syntax conversion module, and may change the picture type, picture resolution, directionality of MVs, and picture rate. A heterogeneous transcoder must adjust the features of the incoming video to enable the features of the outgoing video. Due to spatial-temporal subsampling, and different encoding format of the output sequence, the encoder and decoder motion compensation loops in a heterogeneous transcoder are more complex [17].

Fig 13. Hybrid domain closed-loop transcoder [16] Generic Heterogeneous Transcoder A generic heterogeneous transcoder is shown in Fig 14. In this architecture, syntax conversion (SC) is needed to convert the syntax of source video to that of the target video. A higher resolution decoder decodes the incoming bitstream. The extracted MVs are then post-processed according to the desired output encoding structure, and if required, they are properly scaled down to suit the lower spatialtemporal resolution encoder. In case post-processing is not sufficient, the extracted MVs are refined to improve the encoding efficiency. The decoded pictures are accordingly down-sampled spatially or temporally, and the down-sampled images are encoded with the new MVs. Since the incoming MVs are re-employed and other encoding decisions, such as macroblock types can be extracted from the incoming bitstream, the architecture of this transcoder can be further simplified. In this architecture, the MVs of the incoming bitstream are employed in the outgoing one; the extracted MVs have to be converted to be compatible with the encoding nature of the output bitstream. Note that the nature of extraction of the MVs and their usage depend on the picture type. The algorithm assumes the motion between the pictures is uniform, such that the forward and the reverse MVs are images of each other; or an inter-frame MV is a scaled version of a larger picture distance and so on. In case no MV is found, one might either use a (0, 0) MV or in the worst-case encode the underlying macroblock using intra-frame coding. The incoming motion parameters of a sub GOP of up to multiple frames can produce several candidate MVs for the outgoing picture. All the MVs estimated are compared, and the one that gives the least coding error in terms of sum of absolute differences (SAD) can be chosen. The best MV can then be refined to produce near-optimum results.

Fig 14. Heterogenous video transcoder [16] Analysis of current topic based on available literature The main issues related to H.264 trancoding to/from other standards is due to the differences of H.264 from previous generation standards. VP6 has many features which are similar to H.264 (table 4). One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of the DCT. The DCT based codecs have lower precision value and residual losses due to the loss of precision to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like the H.264 [15] (table 4). The main issue with selection of the block transform is the presence of 4x4 integer DCT in H.264 vs 8x8 integer DCT in VP6. In [24] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT conversion can be obtained in DCT domain (fig 15). This could reduce the computational complexity significantly as shown in table 5. A similar approach can be used in the current scenario to perform the conversion in DCT domain itself. The conversion in [24] could be achieved as shown in figure 15. Fig 15. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [24]

Table 5. Reduction in number of operations on using proposed method as shown in fig 15 [24] M = multiplication operation; A = addition operation The DCT conversion can be obtained in a multitude of steps as shown B i = L i * B * R i B: 8 x 8 DCT Matrix; B i : 4 x 4 Matrix; i = 0, 1, 2, 3 L 0 = L 1 = ( I 4x4, 0 4x4 ) 4x8 R 0 = R 2 = ( I 4x4 ; 0 4x4 ) 8x4 L 2 = L 3 = ( 0 4x4, I 4x4 ) 4x8 R 1 = R 3 = ( 0 4x4 ; I 4x4 ) 4x8 Using the distributive property of the DCT If H is the matrix used for getting the integer DCT from DCT, we have However to got our H.264 coefficients we need the modiefied H matrix H For modified H matrix H, we have So the H.264 transform coefficients can be obtained as below Thus obtained is the 4x4 integer DCT coefficient matrix used in H.264 standard from and 8x8 DCT. A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with slight change.

Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the various transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of the deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition and source code due to the licensing problem delays the study. The availability of the deblocking filter in H.264 for VP6 transcoding will be investigated. H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not come up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6 standard. H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It would be interesting to study the reuse of the reference frames and selection of up to a maximum of 2 reference frames. Research in [18] shows that the use of multiple reference frames and the use of quarter pel accuracy achieve similar RD-results. It is observed that it is not necessary to use multiple reference frames if quarter-pel accuracy interpolation is used. Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel resolution. However difference in block size and presence of a large number of block size combinations makes it difficult to reuse the motion vectors. The techniques used in the [15] for H.263 to VP6 transcoding can be useful to search the motion vectors based on available motion vectors and thereby enable complexity reduction. The dynamic window search technique and dynamic range search technique used in [15] to reuse the MV information to encode VP6 is discussed earlier. The research described in [19] and [20] also provides a basis of making decision on MB modes and motion vectors in the context of the present problem. [20] explains block type conversion and motion vector mapping as shown in the next section. It discusses the transcoding from H.264 to MPEG-4. A similar approach can be used in the context of the current problem. Block Type Conversion and Motion Vector Mapping Performing brute-force ME and mode decision for each MB causes a transcoder to have high computational complexity. To reduce this computational complexity, the incoming motion vectors are used for motion vector mapping. In the given transcoder in [20], the MPEG-4 encoder utilizes the motion vectors and MB information contained in each MB in the H.264 bitstream. Table 6 lists the MB modes in H.264 and MPEG 4 and how they are converted when a pixel domain cascade transcoder is used. Table 6. MB mode conversions observed in cascaded pixel domain H.264 to MPEG-4 transcoder [20]

Fig 15. Block type conversion and motion vector mapping from H.264 to MPEG-4 [20] This information is used to decide the MB mode conversion in [20]. Fig 15 shows conversion criteria used and the conversion of MB modes from H.264 to MPEG 4. Similar criteria for decision making can be used in the proposed transcoder.

H.264 supports intraprediction as shown in figure 3, which however is not supported in VP6 like most other transforms. According to the study by authors in [18] however, during intra-coding, the most probable modes in H.264 are vertical, horizontal and dc. This information can be leveraged in designing the transcoder. The available references and study of various transcoding algorithms will help design the transcoder to convert H.264 video to VP6 video. With the license agreement being completed and the availability of the algorithm for VP6 codec, comparison between H.264 and VP6 would be easier. A new transcoding algorithm can be proposed by making use of the results available in the literature and making inferences to apply various techniques to the present problem. VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department, University of Texas at Arlington is in the process of acquiring an evaluation license on VP6 from On2 Technologies, Inc for research on H.264 to VP6 transcoder.

References: 1. S. Kwon, A. Tamhankar and K. R. Rao, Overview of H.264 / MPEG 4 Part 10, J VCIR, vol 17, pp 186-216, April 2006 2. I. Richardson, V-Codex, White Paper An overview of H.264 Advanced Video Coding, www.vcodex.com, 2007 3. Apple Inc., Technology Brief Quicktime and MPEG-4, http://www.apple.com, 2008 4. ITU-T Recommendation H.264 Advanced Video Coding for Generic Audio-Visual services 5. G. J. Sullivan, P. Topiwala and A. Luthra, The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions, SPIE Conference on Applications of Digital Image Processing XXVII, vol 5558, pp 53-74, Special Session on Advances in the New Emerging Standard: H.264/AVC, August, 2004 6. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization, 2007, www.vcodex.com. 7. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction, 2007, www.vcodex.com. 8. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction, 2007, www.vcodex.com. 9. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction Loop Filter, 2007, www.vcodex.com. 10. On2 Technologies, Inc., White Paper On2 VP6 for Flash 8 Video, http://www.on2.com, September 12, 2005 11. J. Emigh, New Flash Player rises in the Web-Video Market IEEE Computer 39, 14 16 (2006) 12. T. Uro, The quest for a new video codec in Flash 8, http://www.kaourantin.net/2005/08/quest-fornew-videocodec-in-flash-8.html, August 13, 2005 13. A. Beach, Real World Video Compression, realworldvideocompression.com. 14. A. Hall, alexandtia.com. 15. C. Holder and H. Kalva, H.263 to VP6 Video Transcoder, SPIE, vol. 6822 (VCIP), pp 68222B- 68222B San Jose, CA, Jan. 2008 16. I. Ahmad, et al, Video Transcoding: An Overview of Various Techniques and Research Issues, IEEE Transactions on Multimedia, vol 7, pp 793-804, October 2005 17. J. Xin, C. Lin and M. Sun, Digital Video Transcoding, Proceedings of the IEEE, Vol 93, pp 84-96, January 2005 18. J. Bialkowski, M. Barkowsky and A. Koup, Overview of Low-Complexity Video Transcoding from H.263 to H.264, IEEE Conference on Multimedia and Expo 2006, vol 9, pp 49-52, July 2006 19. S. Kim, J. Han and J. Kim, Efficient Motion Estimation Algorithm for MPEG-4 to H.264 Transcoder, IEEE Conference on Image Processing, ICIP 2005, vol 3, pp 656-659, September 2005 20. J. Hur and Y. Lee, H.264 to MPEG-4 Transcoding using Block-Type Information, IEEE Region 10 TENCON 2005, pp 1-6, November 2005 21. S. Eckart and C. Fogg, ISO-IEC MPEG-2 software video codec, SPIE Proceedings, vol. 2419, pp 100-109, Oct 2004 22. J. Loomis and M. Wasson, VC-1 Technical Overview, http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft Corporation, Oct 2007 23. Real Video 10 Technical Overview, version 1.0, Real Networks, http://docs.real.com/docs/rn/rv10/rv10_tech_overview.pdf, 2003

24. J. Lee and K. Chung, DCT Block Conversion for H.264/AVC Video Transcoding, Euro-Par 2005, LNCS 3648, pp 919-927, 2005