EE 5359 Multimedia Processing Summer 2008 Interim Project Report on H.264 to VP6 Transcoder Submitted by Jay R. Padia 1000 60 5145 Date: July 17, 2008
Abstract VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted video coding standards in the industry. It enables high quality video at low bitrates. So there is increasing importance of techniques which can convert video from H.264 to VP6 and thereby enable high quality video transmission over the internet using Flash. The current research shows H.263 video which is a previous generation standard of H.264 can be transcoded to VP6 and complexity can be reduced upto 50%. The similarities and dissimilarities between the two encoders are used to reduce the complexity using Dynamic Search Range and Dynamic Search Window. The success in reducing complexity in the H.263 to VP6 transcoder and the available reference material related to transcoding algorithms enables us to propose a new study to find an algorithm for transcoding H.264 coding standard to VP6 coding standard. It is proposed to explore the similarities and dissimilarities between the two standards to find the right transcoding technique.
Importance of the H.264 Standard H.264 [4] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the most widely accepted industry standards. It can provide good quality video at substantially lower bitrates compared to the previous standards. It also shows more error robustness [1] [2]. H.264 has a set of innovations which can together provide a vast improvement in performance over previous generations of video codecs. MPEG-2 [21] was the most widely used video codec before the emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate. At the same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1. H.264 provides better image quality when reaching its limits. It does not break into blocks but degrades much more smoothly, making the image softer as compression increases. H.264 is an emerging standard and over the years it can see an improvement over the current performance. It can be expected of H.264 to improve over the years, just as other standards have improved in quality and performance [3]. Table 1. H.264 data rate at various resolutions [3] Overview of H.264 Standard H.264 introduces many new features that are significantly different from the previous generation codecs. These new features make it vastly different from the existing codecs and make it much more effective. Given below is an overview of the features of H.264 video codec. Profiles and levels Like any comprehensive standard, the H.264 standard defines a set of profiles and levels to set points of conformance for various classes of applications and services. In each profile, specific encoding tools are permitted to best meet the needs of the intended scenario. H.264 includes six profiles as shown in figure 1 [4]: Baseline. Intended for low-complexity applications such as video conferencing and mobile multimedia. Main. Intended for the majority of general uses such as the Internet, mobile multimedia, and stored content. Extended. Intended for streaming applications, where stream switching technologies can be beneficial. Three High profiles (also known as Fidelity Range Extension or FRExt). Consists of three separate High profiles (High, High 10, and High 4:2:2), intended for high-end professional uses [3] [5].
Fig 1. H.264 profile levels [3] 4x4 integer transform. H.264 is designed to operate on much smaller blocks of pixels than other common codecs, which mitigates blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine detail. Because the transform is a precisely specified integer transform, it provides bit-precise reconstruction (that is, exact-match decoding) rather than statistically generated reconstruction. As a result, there can be no drift among various decoder implementations, so any compliant H.264 decoder will decode the video exactly as the content author intended it to look [3] [6]. X = input matrix; C f XC f T = core 2D transformation for X; E f = matrix formed by scaling factors a, b, c Increased precision in motion estimation. H.264 also benefits from increased precision in motion estimation, which is the process of simplifying redundant data across a series of frames. By expressing information to 1/4-pixel resolution (fig 2) as opposed to 1/2-pixel resolution like most other codecs, H.264 represents both fast- and slow moving scenes more precisely. So objects in motion are more crisply reconstructed during decode, providing a better representation of the source material [7].
Fig 2. Motion vectors in H.264 [7] Flexible block sizes in motion estimation. During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly used 16x16 to as small as 4x4 as shown in fig 3, which helps to code complex motion in areas of high detail. The ability of H.264 to perform its processing on a variety of block sizes means that scenes with complicated motion are more expressively described, providing higher quality in lower data rates [7]. Fig 3(a). Macroblock partitions 16x16, 16x8, 8x16 & 8x8 [7] Fig 3(b). Macroblock sub-partitions 8x8, 8x4, 4x8 & 4x4 [7] Intraframe prediction. H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of frames, but also within a single frame, a technique called intraframe prediction (figure 4). The H.264 encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve stunning quality at much lower data rates [3] [8].
Fig 4. 4x4 block intra prediction modes in H.264 [8] Adaptively tuned deblocking filter. H.264 also features a robust deblocking filter as observed in figure 5, which operates on 4x4 block boundaries to remove jagged blocking artifacts. Its filtering is adaptively tuned per block boundary, making it a very effective smoothing filter during the decoding of a finished bit stream. In addition to making smoother pictures for display, this filter is used during the encoding process to provide a more coherent reference picture for subsequent frames, which helps to improve image quality. This advanced filter technology effectively eliminates blocking artifacts, resulting in a smooth, clean picture [9]. Fig 5(a). H.264 Encoder Basic encoding structure
Fig 5(b). H.264 Decoder Basic encoding structure VP6 Coding standard TrueMotion VP6 [10] is a new compression technology from On2 Technologies Inc. Macromedia has licensed it for its Flash suite of products [12]. It features as the main codec for Flash 8 and onwards. It has interesting features as it gives a very good quality at very high compression. TrueMotion VP6 is among the best video codecs on the market today. It offers better image quality and faster decoding performance than Windows Media 9 [22], Real 9 [23], H.264 [4], and QuickTime MPEG- 4 [10]. In internal testing at On2 Technologies Inc, TrueMotion VP6 could beat many H.264 implementations, Windows Media 9 and Real Networks 10 in PSNR comparisons using standard MPEG- 2 test source clips [10]. The VP6 clips were more detailed and contained fewer artifacts than Windows Media 9 and maintained more texture and detail than Real or H.264 [10]. VP6.2, the latest version of TrueMotion VP6, features a drastic increase in performance from the previous versions of VP6 [10]. Emerging Importance of VP6 Coding Standard Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred solution for providing video services online over Windows Media Player, Apple Quicktime and Real Networks Real Player [11]. The advantages of Flash Player over its rivals are its small size and its completeness as a website development package. Its ability to support multiple platforms has made it popular [11]. Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding standard for its Flash player in 2005. It listed quality, portability, stability, low memory usage and performance as the main criteria for selecting VP6 [12]. It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6). It provides better performance with low contrast video images, removes color oversaturation and also provides a smoother picture true to the original by removing blockiness in the old format [10].
Improvement in Performance on using VP6 Figure 6 compares the performance of Flash Video using VP6 with Flash MX, the older version which used the Sorrenson Spark codec which was based on H.263. The images in Figure 6 (with the exception of the cartoons) are excerpts from a 12:30 minute video of coral reef exploration. The original source was shot on DVCAM and was stored using photo-jpeg compression. The only tool used for compressing this video was Flix Professional, using default settings. The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All preprocessing was performed in Flix Professional. In all the comparisons listed, the image on the left side is from VP6 video. Fig 6(a). Over-saturation of colors in MX (right). [10] Fig 6(b). Blockiness can be observed in MX (right) [10]
Fig 6(c). Artificial details can be observed in MX (right) [10] Fig 6(d). Block artifacts in presence of low contrast background. VP6 performs quite well here [10] Fig 6(e). Absolute mess with MX (right) in low contrast images [10] It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the Flash MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is gaining importance as a coding standard. This creates the need to find a transcoding technique to convert video from H.264 video coding standard to VP6 video standard.
Comparison of H.264 and VP6 It would be most interesting to observe how VP6 would fare against H.264. A comparative study of Hulu s 360p (VP6 based) and 480p (H.264 based) was done (fig 7). The 360p content is VP6 at 700kbps with a screen resolution of 480 360, while the 480p is H.264 at 1000kbps (or 1Mbps) with a resolution of 640 480. Some of the screenshots of the video played side by side is shown in figure 7. Fig 7 (a). Comparison of Hulu s 360p (VP6 based) and 480p (H.264 based) videos [13]
Fig 7(b). Comparison of Hulu s 360p (VP6 based) and 480p (H.264 based) videos [13]
It can be observed that H.264 with its 480p resolution offers better quality than VP6 at 360p. But also can be found that at lower resolution and much less bitrate VP6 does not lose any information in the images. It also shows less blockiness. The color resolution on 480p outscores the lower resolution significantly. Another observation on 5 second clip in Quicktime (H.264) 640 x 480 and Flash (VP6) 720 x 540 shows that at similar resolutions, VP6 can give very high compression gains with insignificant loss in visual quality. Snapshots from each of the clips are shown in figure 8. The size of the.flv clip (5s) is 610 kbytes over the size of quicktime clip (5s) is 4223 kbytes [14]. It can be observed that VP6 gives significant compression gain at very less loss of visual quality, making it an excellent choice for video streaming applications. Fig 8(a). 720x540 flash clip Significantly small in memory size [14]
Fig 8(b). 640x480 H.264 Clip on Quicktime [14]
Existing Research work A transcoding technique to convert from the previous generation H.263 standard to VP6 standard has been proposed [15]. The transcoder has been designed on the basis of the similarities and dissimilarities between the two standards. Comparison can be found in table 2. Table 2. Comparison of H.263 and VP6 features [15] This research particularly holds importance considering the older standard Sorrenson Spark codec used in Flash MX was based on the H.263 standard. With the increasing importance of VP6 in streaming media over the internet this algorithm assumes particular importance. This research also was important in converting old Flash video formats into VP6 based new video formats. The transcoding algorithms reuse the information from the H.263 decoding stage and accelerate the VP6 encoding stage. Experimental results show that the proposed algorithms are able to reduce the encoding complexity by up to 52% while reducing the PSNR by at most 0.42 db in the worst case [15]. The goal is to effectively reuse the information gathered during the H.263 decoding stage and speed up the VP6 encoding stage. The effectiveness of this reuse depends on the similarities and differences between the input and output video formats. The differences in H.263 and VP6 make it complex to use transform domain transcoding and pixel domain transcoding was employed by the authors [15]. Transcoder H.263 to VP6 VP6 is also a hybrid codec that uses motion compensated transform coding at its core. The codec has Intra and Inter pictures similar to MPEG video codecs. Intra pictures are coded independent of other coded pictures and Inter pictures use previously coded pictures for prediction. Motion compensation supports 16x16 and 8x8 blocks similar to H.263 but the Inter 8x8 macro blocks can have mixed blocks; i.e., one or more 8x8 blocks can be coded in Intra mode without using any prediction. The Inter MBs in VP6 can be coded using 9 different modes. The modes are characterized by the number of motion vectors (1 vs. 4), reference frame used, whether motion vectors are coded. Where motion vectors are not coded, the motion vectors are predicted from previously decoded MBs. The VP6 codec uses 8x8 Integer DCT for transform coding and de-blocking filter is applied at the block boundaries [15]. It can be observed that many features in VP6 are different from H.263 but are similar to H.264. A comparison between the two standards is presented again later. The similarities and differences between H.263 and VP6 provide opportunities for reusing H.263 MB coding mode details for reducing the transcoder complexity. The fact that both H.263 and VP6 support 1
MV and 4 MV modes means that motion vectors can be reused to some extent. However, the fact that VP6 supports large number of MB modes compared to H.263 means that the H.263 MB mode and motion vectors cannot be used directly. The differences in the codecs meant that an Inter 16x16 MB in H.263 is not necessarily coded as an Inter 16x16 MB. Table 3 shows the typical example of MB coding modes when encoding H.263 decoder output using VP6. For this example, a Foreman video sequence at 352x288 resolution and 297 frames is encoded using H.263 at 384 Kbps and then transcoded to VP6 using full re-encoding at 291 Kbps. The full details of VP6 modes are not given here due to space considerations. In brief, Nearest and Near MB modes do not code motion vectors and derive their MVs from previously coded MBs; Golden frames are long term reference frames, and Inter 0,0 forces the use of a 0,0 motion vector. Each row corresponds to a H.263 MB coding mode and the columns give the VP6 mode used to code those MBs. For example, of all the MBs that are coded as Inter 4V in H.263, 3% were coded as Inter 0,0 mode, 1% coded as Intra, 30% coded as Inter+MV, 11% nearest, 7% near, and 47% are coded as Inter 4V MBs. Thus, if an Inter 4V MB in H.263 is mapped to Inter 4V in VP6, it is likely to map correctly only in 50% of the cases. Thus direct mode mapping will lead to poor results and more efficient algorithms are necessary [15]. Table 3. MB mode mapping H.263 to VP6 in [15] The large mismatch of MB coding modes will create poor RD performance if direct mapping of motion vectors is used. In [15] the patterns which allow them to restrict H.263 modes are evaluated. Near and Nearest are computationally inexpensive to evaluate and are allowed in all cases. Inter 4V, on the other hand, takes significant computation and is evaluated only when input MB is also in the Inter 4V mode. The transcoding algorithms thus reduce the complexity by placing constraints on MB modes evaluated and further reduce the complexity by using: 1) Dynamic search range and 2) Dynamic search window. Complexity Reduction Using Dynamic Search Range The dynamic search range approach sets the search range used for motion estimation for each MB. Typically this range is fixed throughout the encoding process and is set to 15 in the experiments. With the knowledge of motion vectors in H.263, the search range no longer has to be fixed. The search range is changed based on the maximum motion vector component for the current MB. Figure 9 shows the dynamic search range selection based on H.263 motion vectors. The RD performance is compared to the baseline transcoder. The results for three of the sequences evaluated are shown and the performance of the algorithm closely tracks the RD performance of the baseline transcoder. The PSNR drop is higher for the Stefan sequence because of large motion in the sequence [15].
Fig 9(a). Dynamic Search Range [15] Fig 9(b). Dyanamic Search Window [15] Complexity Reduction Using Dynamic Search Window Using a dynamic refinement window further reduces the complexity by reusing the H.263 motion vectors. Unlike the dynamic search range method where window location is fixed and the window size or search range is varied, the dynamic search window approach uses the H.263 motion vectors to determine the position of the fixed sized window. Window sizes of 1x1 and 3x3 for the new motion vector search were evaluated by the authors (fig 9(b)). This approach reduced the complexity more than the dynamic range approach due to an even smaller search space. This reduction in complexity comes at a slight increase in PSNR loss. Figure 9(b) shows the dynamic window derived based on the H.263 motion vectors of a MB. Figure 10(b) shows a RD plot comparing the dynamic window approach to the baseline approach [15]. In [15] the TMN 3.2 H.263 encoder from University of British Columbia which is based on Telenor's H.263 implementation was used. The input video is coded at 384 Kbps in baseline profile with advanced motion options and one I frame (first frame). A decoder based on the same H.263 implementation is used in the decoding stage of the transcoder. The VP6 encoding stage is based on the optimized VP6 encoder software provided by On2 Technologies. The VP6 video is encoded with I frame frequency of 120 and at multiple bitrates to assess the RD performance of the transcoder. The results are compared with the baseline transcoder that performs full encoding in the VP6 stage. Fig 10(a). RD performance -Dynamic Search [15]
Fig 10(b). RD performance - Dynamic Window [15] The results show that the proposed transcoder is able to reduce the complexity by more than 50% without a significant loss in PSNR. Given that the VP6 implementation used is highly optimized, the resulting savings of 50% is considered significant. Transcoders based on this approach will be able to transcode at least 50% more streams for the same hardware configuration. Comparison of H.264 with the current research work The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The similarities and dissimilarities in the two codecs help design the right transcoder for the application. On the same lines, a similar comparison is provided in Table 4. Its compares the VP6 features with H.264 baseline features. Certain features in H.264 which are available in Main and High profiles of H.264 are not included here. It can be observed that there are a lot of similarities between the VP6 and H.264 baseline profile, especially in the features where H.264 differs with other codecs. VP6 supports the use of integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel accuracy in the motion vectors. Feature H.263 Baseline VP6 H.264 Baseline Picture type I, P I, P I, P Transform Size 8x8 8x8 4x4 Transform DCT Integer DCT Integer DCT Intra Prediction None None Yes Motion Compensation Block Size 16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 Total MB Modes 4 10 7 inter + (9 + 4) intra Motion Vectors ½ pixel ¼ pixel ¼ pixel Deblocking filter None Yes Yes Reference Frames 1 Max 2 Multiple Table 4. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile
Various Transcoding techniques and their applications in H.264 transcoding A review paper on various techniques and research issues (fig 11) [16] involved in video transcoding compares the Open-Loop and Closed Loop Transcoder architectures. Fig 11. Selection of transcoding function for various applications [16] Open-Loop Transcoding architecture Open-Loop transcoding Architecture is the most straightforward transcoding architecture. Here a decoder and encoder are directly cascaded as shown in figure 12(a). The incoming video stream is fully decoded and re-encoded into target video with desired bit rate or format. So we find little degradation in visual quality due to transcoding. However here, decoding of a transcoded video would result in errors if the predictors of the decoder are different from those in the original encoder. These errors would accumulate through the whole group of pictures (GOP). The error accumulation resulting from encoder / decoder predictor mismatch is called drift error. Open loop transcoders contain no feedback loop in the transcoding architecture for compensating the drift error. Closed-Loop transcoders contain a feedback loop in the transcoding architecture in order to correct the transcoding distortion by compensating the drift in the transcoder [16] [17]. Fig 12(a). Cascaded decoder and encoder transcoder [16]
Fig 12(b). Cascaded decoder and encoder transcoder [16] Hybrid Domain Closed-Loop Transcoding Architecture Various transcoding algorithms provide tradeoff between the computational complexity and reconstructed video quality. In order to reduce the computational complexity while maintaining the reconstructed video quality, ME should be omitted and DCT/IDCT should be avoided if possible. One of the architecture uses MC for P frames only. I frames are intra coded, which need no ME and MC, and thus, IDCT/DCT for I frames can be omitted in principle. But since I frames are the anchors for subsequent P and B frames, the IDCT at the decoder stage, inverse quantization and IDCT at the encoder stage for I frames are still needed to reconstruct the reference frames, while DCT at the encoder stage can be omitted. Since P frames are also the anchors for the following P and B frames, MC, DCT, and IDCT cannot be omitted. Transcoding delay can be further reduced without degrading the video quality in this architecture. P frames with frequent scene changes and rapid motion may contain a large number of INTRA blocks. One can further omit the IDCT/DCT and MC operation of these INTRA blocks in P frames. In other words, blocks of I and B pictures and INTRA blocks of P pictures are transcoded in frequency-domain, the spatial-domain motion compensation is done only when the block is inter block in P frames. This transcoding architecture is known as hybrid domain transcoding architecture (HDTA), as shown in Fig. 13. Heterogenous Transcoder A heterogenous transcoder provides conversion between various standards (fig 14). A heterogeneous transcoder needs a syntax conversion module, and may change the picture type, picture resolution, directionality of MVs, and picture rate. A heterogeneous transcoder must adjust the features of the incoming video to enable the features of the outgoing video. Due to spatial-temporal subsampling, and different encoding format of the output sequence, the encoder and decoder motion compensation loops in a heterogeneous transcoder are more complex [17].
Fig 13. Hybrid domain closed-loop transcoder [16] Generic Heterogeneous Transcoder A generic heterogeneous transcoder is shown in Fig 14. In this architecture, syntax conversion (SC) is needed to convert the syntax of source video to that of the target video. A higher resolution decoder decodes the incoming bitstream. The extracted MVs are then post-processed according to the desired output encoding structure, and if required, they are properly scaled down to suit the lower spatialtemporal resolution encoder. In case post-processing is not sufficient, the extracted MVs are refined to improve the encoding efficiency. The decoded pictures are accordingly down-sampled spatially or temporally, and the down-sampled images are encoded with the new MVs. Since the incoming MVs are re-employed and other encoding decisions, such as macroblock types can be extracted from the incoming bitstream, the architecture of this transcoder can be further simplified. In this architecture, the MVs of the incoming bitstream are employed in the outgoing one; the extracted MVs have to be converted to be compatible with the encoding nature of the output bitstream. Note that the nature of extraction of the MVs and their usage depend on the picture type. The algorithm assumes the motion between the pictures is uniform, such that the forward and the reverse MVs are images of each other; or an inter-frame MV is a scaled version of a larger picture distance and so on. In case no MV is found, one might either use a (0, 0) MV or in the worst-case encode the underlying macroblock using intra-frame coding. The incoming motion parameters of a sub GOP of up to multiple frames can produce several candidate MVs for the outgoing picture. All the MVs estimated are compared, and the one that gives the least coding error in terms of sum of absolute differences (SAD) can be chosen. The best MV can then be refined to produce near-optimum results.
Fig 14. Heterogenous video transcoder [16] Analysis of current topic based on available literature The main issues related to H.264 trancoding to/from other standards is due to the differences of H.264 from previous generation standards. VP6 has many features which are similar to H.264 (table 4). One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of the DCT. The DCT based codecs have lower precision value and residual losses due to the loss of precision to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like the H.264 [15] (table 4). The main issue with selection of the block transform is the presence of 4x4 integer DCT in H.264 vs 8x8 integer DCT in VP6. In [24] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT conversion can be obtained in DCT domain (fig 15). This could reduce the computational complexity significantly as shown in table 5. A similar approach can be used in the current scenario to perform the conversion in DCT domain itself. The conversion in [24] could be achieved as shown in figure 15. Fig 15. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [24]
Table 5. Reduction in number of operations on using proposed method as shown in fig 15 [24] M = multiplication operation; A = addition operation The DCT conversion can be obtained in a multitude of steps as shown B i = L i * B * R i B: 8 x 8 DCT Matrix; B i : 4 x 4 Matrix; i = 0, 1, 2, 3 L 0 = L 1 = ( I 4x4, 0 4x4 ) 4x8 R 0 = R 2 = ( I 4x4 ; 0 4x4 ) 8x4 L 2 = L 3 = ( 0 4x4, I 4x4 ) 4x8 R 1 = R 3 = ( 0 4x4 ; I 4x4 ) 4x8 Using the distributive property of the DCT If H is the matrix used for getting the integer DCT from DCT, we have However to got our H.264 coefficients we need the modiefied H matrix H For modified H matrix H, we have So the H.264 transform coefficients can be obtained as below Thus obtained is the 4x4 integer DCT coefficient matrix used in H.264 standard from and 8x8 DCT. A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with slight change.
Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the various transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of the deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition and source code due to the licensing problem delays the study. The availability of the deblocking filter in H.264 for VP6 transcoding will be investigated. H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not come up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6 standard. H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It would be interesting to study the reuse of the reference frames and selection of up to a maximum of 2 reference frames. Research in [18] shows that the use of multiple reference frames and the use of quarter pel accuracy achieve similar RD-results. It is observed that it is not necessary to use multiple reference frames if quarter-pel accuracy interpolation is used. Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel resolution. However difference in block size and presence of a large number of block size combinations makes it difficult to reuse the motion vectors. The techniques used in the [15] for H.263 to VP6 transcoding can be useful to search the motion vectors based on available motion vectors and thereby enable complexity reduction. The dynamic window search technique and dynamic range search technique used in [15] to reuse the MV information to encode VP6 is discussed earlier. The research described in [19] and [20] also provides a basis of making decision on MB modes and motion vectors in the context of the present problem. [20] explains block type conversion and motion vector mapping as shown in the next section. It discusses the transcoding from H.264 to MPEG-4. A similar approach can be used in the context of the current problem. Block Type Conversion and Motion Vector Mapping Performing brute-force ME and mode decision for each MB causes a transcoder to have high computational complexity. To reduce this computational complexity, the incoming motion vectors are used for motion vector mapping. In the given transcoder in [20], the MPEG-4 encoder utilizes the motion vectors and MB information contained in each MB in the H.264 bitstream. Table 6 lists the MB modes in H.264 and MPEG 4 and how they are converted when a pixel domain cascade transcoder is used. Table 6. MB mode conversions observed in cascaded pixel domain H.264 to MPEG-4 transcoder [20]
Fig 15. Block type conversion and motion vector mapping from H.264 to MPEG-4 [20] This information is used to decide the MB mode conversion in [20]. Fig 15 shows conversion criteria used and the conversion of MB modes from H.264 to MPEG 4. Similar criteria for decision making can be used in the proposed transcoder.
H.264 supports intraprediction as shown in figure 3, which however is not supported in VP6 like most other transforms. According to the study by authors in [18] however, during intra-coding, the most probable modes in H.264 are vertical, horizontal and dc. This information can be leveraged in designing the transcoder. The available references and study of various transcoding algorithms will help design the transcoder to convert H.264 video to VP6 video. With the license agreement being completed and the availability of the algorithm for VP6 codec, comparison between H.264 and VP6 would be easier. A new transcoding algorithm can be proposed by making use of the results available in the literature and making inferences to apply various techniques to the present problem. VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department, University of Texas at Arlington is in the process of acquiring an evaluation license on VP6 from On2 Technologies, Inc for research on H.264 to VP6 transcoder.
References: 1. S. Kwon, A. Tamhankar and K. R. Rao, Overview of H.264 / MPEG 4 Part 10, J VCIR, vol 17, pp 186-216, April 2006 2. I. Richardson, V-Codex, White Paper An overview of H.264 Advanced Video Coding, www.vcodex.com, 2007 3. Apple Inc., Technology Brief Quicktime and MPEG-4, http://www.apple.com, 2008 4. ITU-T Recommendation H.264 Advanced Video Coding for Generic Audio-Visual services 5. G. J. Sullivan, P. Topiwala and A. Luthra, The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions, SPIE Conference on Applications of Digital Image Processing XXVII, vol 5558, pp 53-74, Special Session on Advances in the New Emerging Standard: H.264/AVC, August, 2004 6. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization, 2007, www.vcodex.com. 7. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction, 2007, www.vcodex.com. 8. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction, 2007, www.vcodex.com. 9. I. Richardson, V-Codex, White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction Loop Filter, 2007, www.vcodex.com. 10. On2 Technologies, Inc., White Paper On2 VP6 for Flash 8 Video, http://www.on2.com, September 12, 2005 11. J. Emigh, New Flash Player rises in the Web-Video Market IEEE Computer 39, 14 16 (2006) 12. T. Uro, The quest for a new video codec in Flash 8, http://www.kaourantin.net/2005/08/quest-fornew-videocodec-in-flash-8.html, August 13, 2005 13. A. Beach, Real World Video Compression, realworldvideocompression.com. 14. A. Hall, alexandtia.com. 15. C. Holder and H. Kalva, H.263 to VP6 Video Transcoder, SPIE, vol. 6822 (VCIP), pp 68222B- 68222B San Jose, CA, Jan. 2008 16. I. Ahmad, et al, Video Transcoding: An Overview of Various Techniques and Research Issues, IEEE Transactions on Multimedia, vol 7, pp 793-804, October 2005 17. J. Xin, C. Lin and M. Sun, Digital Video Transcoding, Proceedings of the IEEE, Vol 93, pp 84-96, January 2005 18. J. Bialkowski, M. Barkowsky and A. Koup, Overview of Low-Complexity Video Transcoding from H.263 to H.264, IEEE Conference on Multimedia and Expo 2006, vol 9, pp 49-52, July 2006 19. S. Kim, J. Han and J. Kim, Efficient Motion Estimation Algorithm for MPEG-4 to H.264 Transcoder, IEEE Conference on Image Processing, ICIP 2005, vol 3, pp 656-659, September 2005 20. J. Hur and Y. Lee, H.264 to MPEG-4 Transcoding using Block-Type Information, IEEE Region 10 TENCON 2005, pp 1-6, November 2005 21. S. Eckart and C. Fogg, ISO-IEC MPEG-2 software video codec, SPIE Proceedings, vol. 2419, pp 100-109, Oct 2004 22. J. Loomis and M. Wasson, VC-1 Technical Overview, http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft Corporation, Oct 2007 23. Real Video 10 Technical Overview, version 1.0, Real Networks, http://docs.real.com/docs/rn/rv10/rv10_tech_overview.pdf, 2003
24. J. Lee and K. Chung, DCT Block Conversion for H.264/AVC Video Transcoding, Euro-Par 2005, LNCS 3648, pp 919-927, 2005