Performance Comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo

Performance Comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo Trac D. Tran, Lijie Liu, and Pankaj Topiwala FastVDO, LLC, Columbia, MD 210 {trac, lijie, pankaj} @ fastvdo.com ABSTRACT This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and H.264/AVC High Profile 4:4:4 I frame coding for high-resolution still images and high-definition (HD) 1080p video sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2]. Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to 1 db in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode). Keywords: H.264, AVC, High Profile, JPEG2000, Microsoft HD Photo, image coding, video coding. 1. INTRODUCTION H.264, or MPEG-4 Part 10, is an international video coding standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT) [3, 4]. H.264 technology, also known as Advanced Video Coding (AVC), is designed to provide good video quality at substantially lower bit rates comparing to previous standards. It is also intended to have a reasonable level of computational complexity and to be flexible enough for a wide range of applications, from broadcasting, DVD storage, teleconferencing, to wireless multimedia communications. On the other hand, JPEG2000 is the wavelet-based still image compression standard [5, 6,7] whose aim is not only to improve coding performance over the original DCT-based JPEG standard [8] but also to add or improve features such as scalability, editability, and lossless coding capability. Another recent addition to this area is Microsoft HD Photo [9], a new still-image compression algorithm and file format recently submitted to JPEG ISO/IEC WG1 for standardization consideration. Although H.264, JPEG2000, and Microsoft HD Photo are developed for different targets and even for different signals, there are several application areas where they overlap: video applications requiring fast, frequent, and convenient frame access for editing purposes, for instance, Digital Cinema; high-quality high-resolution medical and satellite imaging; video applications requiring real-time simple encoding, etc. There have been several performance comparisons evaluating JPEG2000 and H.264/AVC I frame coding [10, 11]. The rate-distortion performance of Motion-JPEG2000 and H.264/AVC Main Profile (MP) intra coding was first reported by Marpe et al. in [11]. Using a set of test video sequences with different resolutions, [11] showed that H.264/AVC intra coding has around 0.2 ~ 2 db PSNR gains over JPEG2000 for the low and middle resolution sequences, e.g., CIF and ITU-R 601 720x576i (25Hz) sequences. However, in their opinion, JPEG2000 has an advantage over H.264/AVC MP for test sequences with very high-resolution content, e.g., 1080p sequences, while for 720p HD sequences, it is reported that both perform virtually at the same level [10, 11]. By contrast, our previous work [1, 2] presented in the past two

years at SPIE showed that for 720p, AVC High Profile (HP) holds a clear advantage over JPEG2000, while for 1080p, the results are more even, or a slight advantage for AVC. Recently, it has also been shown that H.264/AVC Fidelity Range Extensions (FRExt) amendment [13] HP provides a major breakthrough in compression efficiency over MP generally. Consistent R-D gains in favor of H.264/AVC FRExt intra coding over JPEG2000 for a set of monochrome ISO/IEC images are also reported in [10]. In this paper, we reinvestigate the intra-frame coding performance of H.264/AVC High Profile 4:4:4 intra coding [14] in comparison with JPEG2000 (Part 1) [5] and Microsoft HD Photo [9] for six very high-resolution still color images and four 1080p (1920x1080-pixel resolution) HD sequences. The chosen test images and sequences cover three different classes of still frames based on their level of spatial contents, ranging from smooth to moderate to high. We confirm that although primarily developed for video coding, H.264/AVC High-Profile intra coding still consistently offers competitive coding performances in the rate-distortion sense compared to two powerful image codecs on almost the entire test data set. Key differences in this paper over our previous publications [1, 2] include: (i) we have added Microsoft HD Photo in the comparative study; (ii) we are testing the three codecs on an entirely new data set, all of which are of high-resolution, high-quality, and in the raw unprocessed RGB color space; (iii) we have tried to be as neutral as possible to all three competitors, i.e., no specialized parameter optimization was carried out for any of the three codecs (all three mostly operate on their default high-quality setting) (iv) we offer a comparison in the lossless coding case as well. In fact, H.264/AVC High-Profile has so many encoder parameter settings, even for Intra coding, that optimizing it parametrically for best coding performance is itself a fine art. Furthermore, since we are using the latest released source code for H.264 (JM12.4 [14], which supports 4:4:4 RGB coding mode, instead of JM11), there might be an inadvertent effect on code-base immaturity for AVC. The important test conditions, coding tools, and coding parameters used in the study are described in more detail in the experimental results section. The organization of the paper is as follows. To make this paper self-consistent, we provide a quick overview of the three codecs first in Section 2. Our evaluating methodology including various JPEG2000, H.264 and HD Photo settings and our high-definition test images and video sequences are discussed in Section 3, followed by our experimental results and discussions of RD performance in Section 4. Finally, Section 5 concludes the paper with a few remarks on future work. 2. DESCRIPTION OF EVALUATING COMPRESSION ALGORITHMS We provide in this section a brief description of three image compression algorithms under common performance evaluation: JPEG2000, H.264/AVC Main Profile intra-frame coding, and H.264/AVC High Profile FRExt intra-frame coding. All three compression schemes are based on the classic three-stage transform-coding paradigm, consisting of a signal decomposition stage, followed by scalar quantization, and context-adaptive entropy coding. Our tests dispense with the Main Profile (MP) in favor of High Profile (HP), since HP is a superset of MP, and in fact superior. We discuss MP mainly to illustrate HP better. 2.1. JPEG2000 Unlike its predecessor JPEG [8], which is based on the 8x8 block DCT decomposition, JPEG2000 relies on the wavelet transform as its main de-correlation engine. This multi-resolution transform with length-varying basis functions decomposes an input image into wavelet coefficients grouped by sub-bands, representing different spatial-frequency components. The set of resulting wavelet coefficients are furthers split into small coding units called code-blocks, which are independently processed by a coding scheme called Embedded Bitplane Coding with Optimal Truncation (EBCOT) followed by adaptive context-based binary arithmetic coding. JPEG2000 has a few distinctive features that we did not enable in this evaluation. Most notable is the scalability feature, allowing one to extract different regions, components, images of different fidelities and/or spatial resolutions out of one single compressed bit-stream. The drawback of scalability is its adverse effect on rate-distortion performance. To maximize its performance, our comparisons are conducted with the non-scalable single-layer mode. Another feature that we also elect to disable is the tiling mode, which partitions the input image into non-overlapped rectangular tiles to be encoded independently. The tiling feature, intended for lower-complexity and parallel processing, also most likely lowers R-D performance.

2.2. H.264/AVC Main Profile Intra-Frame Coding Both based on the transform-coding paradigm, the main difference between H.264/AVC Main Profile intra coding and JPEG2000 is at the transformation stage. Other differences in the quantization and entropy coding stage are dictated by the characteristics of the produced transform coefficients. While JPEG2000 employs the global wavelet transform (tiling is its only option for image partitioning), H.264 follows the block coding philosophy, which is more in line of the blocktranslational motion model employed in its inter-frame coding framework. Unlike all of its video coding standard predecessors, H.264 s transform block size is reduced from 8x8 to 4x4. As a pre-processing step, H.264 relies on spatial prediction using neighboring pixels from previously encoded blocks to take advantage of inter-block spatial correlation. The residual prediction error is de-correlated by a 4x4 low-complexity multiplier-less integer transform that approximates the original 4x4 DCT well but can be implemented in 16-bit fixedpoint architectures. The DC coefficients of neighboring blocks are collected into 4x4 blocks and then further processed using the same 4x4 integer transform (2x2 blocks and 2x2 Hadamard transform are used in the chrominance space). The combination of spatial prediction and the wavelet-like 2-level transform iteration has proven to be very effective in smooth image regions one reason why H.264 can stay competitive with JPEG2000 in high-resolution high-quality applications whereas the block-coding based JPEG is not. This H.264 R-D performance result is rather consistent with a few recent reports that the block DCT coding framework can be very competitive with the global wavelet coding framework if inter-block correlation is properly taken into account coupled with appropriately designed context-adaptive entropy coding [15, 16]. After transformation, the H.264 transform coefficients are scalar quantized, zig-zag scanned, and entropy coded by Context-based Adaptive Binary Arithmetic Coding (CABAC). Another entropy coding choice that provides a faster simpler implementation but sacrifices some coding efficiency is called Context-Adaptive Variable-Length Coding (CAVLC), switching from different VLC tables designed from exponential-golomb codes based on locally available contexts collected from neighboring blocks. 2.3. H.264/AVC FRExt High Profile Intra-Frame Coding The JVT completed the development of some extensions to the original H.264 standard in July, 2003. The resulting codec is known as H.264 Fidelity Range Extensions (FRExt), also known as the High Profile [13, 14]. Amongst the extensions as expected from the naming are support for higher-fidelity video pixel resolution (including 10-bit and 12-bit video samples) and support for higher-resolution color spaces such as YUV 4:2:2 and YUV 4:4:4. The main FRExt feature that improves coding efficiency (our top criterion in this paper) is the addition of the 8x8 integer transform another DCT approximation and all coding modes as well as prediction schemes associated with the adaptive selection between the 4 4 and 8 8 integer transforms. The addition of the larger block size of 8x8 is critical in high-resolution high bit-rate applications as shown in later sections. 2.4. Microsoft HD Photo HD Photo is a still-image coding algorithm and file format for continuous-tone photographic images developed by Microsoft as part of the Windows Media family [9]. It is officially launched in March 2007 and was formerly known under a couple of different names Windows Media Photo and Photon. The main target of HD Photo is the high-quality consumer digital photography market. Hence, compression capability aside, HD Photo, like JPEG2000, does offer many advanced features for current and future digital-imaging applications: lossy to lossless compression, bit-rate scalability, high-fidelity pixel format support and editing, region-of-interest decoding, integer implementation without division, etc. Microsoft has submitted HD Photo to the JPEG standardization committee. In fact, JPEG recently announced a new work item for the standardization of a HD Photo as a new file format called JPEG XR (short for extended range, its most desirable feature high-dynamic-range image coding and processing). HD Photo is a block-based image coder sharing many familiar features in the traditional image-coding paradigm: color conversion, transform, coefficient scanning, scalar quantization, and entropy coding. The two central components of HD Photo are the transformation stage and the coefficient-encoding stage. As its de-correlation engine, HD Photo employs a reversible integer-to-integer-mapping lapped bi-orthogonal transform (LBT) [17] implemented via 4x4 pre- /post-filtering operators [18] (called the overlap operators) in conjunction with a traditional 4x4 DCT-like block transform (called the core transform). HD Photo s encoding scheme of transform coefficients contains many adaptive elements: adaptive coefficient scanning, flexible quantization, inter-block coefficient prediction, adaptive VLC table switching, etc.

2.5. Major Differences At a high level, all three coders still strictly follows the classic transform coding paradigm with three basic building blocks: transformation, quantization, and entropy coding. All three chooses a simple scalar quantization strategy for the transform coefficients and all employs context-based adaptive entropy coding as the final compression stage. All three coders are capable of delivering lossless compression. However, at the time of testing, H.264 s lossless coding algorithm has a bug (only the luminance channel gets encoded; the two chrominance channels are ignored), so we are not able to provide a lossless coding comparison. The main difference between the three coders is at the transformation stage. JPEG2000 de-correlates image data via the global discrete wavelet transform (DWT) or the more general decomposition of wavelet packet while H.264 and HD Photo chooses the block-based coding framework with the same 16x16 macro-block size and a core 4x4 block transform that is very similar to the discrete cosine transform (DCT). The major difference between H.264 s and HD Photo s transformation stage is the way the two coders handle inter-block de-correlation. While H.264 relies heavily on adaptive spatial prediction of the current block from its neighbors, HD Photo employs an overlap operator which performs preprocessing [18] of pixels along the block boundaries before feeding them into the core DCT-like 4x4 block transform. Equivalently, the combination of the overlap operator and the core block transform generates a lapped transform, which has been studied extensively in the past [19]. Similarly to JPEG2000, the entire transform step of HD Photo is constructed with dyadic-rational lifting steps such that it maps integers to integers with perfect reversibility, allowing a unifying lossless to lossy coding framework. On the contrary, H.264 achieves lossless compression from residue coding. Another obvious difference is at the entropy coding stage where each coder tunes its context-based adaptive model to take advantage of the specific behavior of its transform coefficients and/or parameters. 3. EVALUATING METHODOLOGY 3.1. Test Images In our performance evaluation, we select six high-resolution images captured by current prosumer to professional digital cameras. All of the test images are in the raw 24-bit RGB color format. These six test images represent popular types of image content usually seen in today s digital photography world with various levels of spatial contents: Harbor: image of Santa Monica s harbor with many sailboats, 2268 x 12 resolution Bridge: image of an old covered bridge in Vermont, 2268 x 12 resolution Dog: close-up snapshot of a dog on a dirt road in Ferndale, 64 x 2304 resolution Woman: portrait of a young smiling lady on the beach of Santa Monica, 3008 x 2000 resolution Boy: snapshot of a boy named Anthony holding balloons in Malibu, 3264 x 24 resolution Building: image of a glass skyscraper in Tokyo, 2128 x 2832 resolution Due to H.264 frame size restriction of 96 x 2304, some of the images may have to be down-sampled and cropped. For verification purposes, all actual images that we employed in our comparison are available for download on our web site at www.fastvdo.com. To illustrate in general the diverse nature of the image content used in this comparative study, we present in Figure 1 the thumbnails of all six test images. 3.2. Video Test Sequences To evaluate the performance of the three codecs in intra-frame video coding, we select four high-quality progressivescan high-definition video sequences (60Hz) at 1080p resolution. All four test sequences are in the original RGB 4:4:4 color format: Traffic: slow panning sequence of a busy road with many trucks and cars passing by; relatively smooth spatial details except at the end, when the camera pans into several flowery bushes by the road side Ducks: sequence of several wild ducks swimming and then taking off from a pond; moderate spatial details Old Town: a typical landscape sequence of a Swedish old town in a cloudy day; high spatial details Running Crowd: sequence of numerous people participating in a running competition; very high spatial details. Since we are conducting I-frame coding, we feel that 30 frames for each sequence are adequate to establish the trend. We have also decided to have a quarter of a second delay between encoded frames; in other words, we only test frames whose indices are divisible by 15. Again, all original sequences are available for download on our web site at www.fastvdo.com.

Figure 1: Six test images used in our comparative study. From left to right, top row: Harbor and Bridge; Middle row: Dog and Woman; Bottom row: Boy and Building.

Figure 2: The first frame of the four test video sequences used in our comparative study. From left to right, top row: Traffic and Ducks; Bottom row: Old Town and Running Crowd. 3.3. Codec Settings In our coding experiments, we use publicly available software implementations of JPEG2000, Microsoft HD Photo, and H.264/AVC. The latest release of the reference software (JM 12.4) [14] is used for H.264/AVC encoder, and each frame of the test sequences is coded in the intra coding mode. For JPEG2000 coding, D. Taubman s "Kakadu" (version 2.2) software [7] is used to code each frame to reach the target bit rates. Note that we have turned off the visual weighting. For HD Photo, we use the standard porting kit [9], which is available for download and testing at http://www.microsoft.com/whdc/xps/hdphotodpk.mspx The "Kakadu" JPEG2000 encoder [7] is driven in default mode (except visual weighting is turned off): One tile per frame (no tiling) 9/7-tap biorthogonal Daubechies wavelet filters (default floating-point transform) for lossy compression 5/3-tap biorthogonal Daubechies wavelet filters (default integer transform) for lossless compression 5 levels of wavelet decomposition Single-layer mode (no scalability option) Code-block size of 64x64 wavelet coefficients EBCOT encoding scheme R-D optimization for a given target bit rate. No_weights on (in particular, visual weighting is off). For Microsoft HD Photo [9], all options are set to their default values with the only control coming from the quality factor setting: No tiling One-level of overlap in the transformation stage No color space sub-sampling Spatial bit-stream order All sub-bands are included without any skipping.

The configuration of the H.264/AVC JM12.4 encoder [14] is chosen as follows: High-profile Frext 4:4:4 coding mode Activate intra coding profile for Frext Activate RGB coding mode Fast Chroma intra mode decision No Chroma offset 8x8 transform mode: enabled, allowing adaptive choice between 4x4/8x8 transform and all associated prediction modes CABAC: enabled R-D optimization: enabled De-blocking filter: enabled. 3.4. Evaluating Criteria To compare the objective performance, we illustrate the curves of average PSNR values of each RGB component as well as the average of all 3 over all encoded images or video frames versus the final bit rate (represented by compression ratio in the image case and bit rate in Mbps in the video case). For each experiment, H.264/AVC 4:4:4 HP codes each image/frame in intra mode with one fixed quantization step size. Also, the quantization values for all three color components are the same (which is the default mode of HP profile). For JPEG2000 and HD Photo, we code each frame with the target bit rate derived from the set total bit rates, frame rate, and sequence resolution such that the RD curves mostly match those of H.264/AVC. Since this experiment concerns high-bit-rate scenarios, we choose the test PSNR points to cover the range between ~ 55dB. 4. EXPERIMENTAL RESULTS 4.1. Image Coding Comparison Our lossless coding result is tabulated in Table 1 below where all file sizes are listed in bytes. We only have two participants in this comparison since H.264/AVC JM12.4 lossless coding mode for 4:4:4 RGB was not ready by the time of testing. JPEG2000 is clearly superior in lossless coding and its average bit rate saving over Microsoft HD Photo is around 13%. We observe that JPEG2000 offers significantly higher compression ratio when the input image contains large regions of smoothness with low spatial content (for example, Dog, Woman, and Building). For images with highly complex spatial content (for instance, Harbor and Bridge), HD Photo is much more competitive. Fig. 3 to Fig. 5 depicts the rate-distortion curves as the outcome of our lossy coding experiments for each test image. Overall, JPEG2000 still yields the best coding performance whereas HD Photo is a close second in both PSNR results and asymptotic behavior. H.264/AVC HP claims a distant third place where the PSNR gap between H.264 and JPEG2000 is around 2-3 db. This surprising finding is contradictory with our past results [1, 2] where H.264/AVC HP has been shown to consistently outperform JPEG2000 across all luminance and chrominance components and across a wide range of bit rates (especially at the high bit rate regions). In our opinion, there are a few possible explanations for the difference: (i) previous comparison was carried out with YUV 4:2:0 data which H.264/AVC is well-tuned for; (ii) we have not tested H.264/AVC at this high resolution before; in fact, a few of the test images are right at H.264 s resolution limit; (iii) JM12.4 code-base is not quite mature to handle RGB 4:4:4 at the time of testing. We plan to carry out further comparison with input data in the YUV 4:4:4 format in the near future. Table 1: Lossless image coding comparison between JPEG2000 and Microsoft HD Photo JPEG2000 HD Photo Original Bitmap File Size Compressed File Size Compression Ratio Compressed File Size Compression Ratio Harbor 10287704 10588 2.27 888 2. Bridge 10287704 4 2.1210 91302 2.0206 Dog 28017272 96701 2.9074 1108 2.04 Woman 180054 218 3.2801 623 2.7714 Boy 270872 92279 2.5978 1028 2.33 Building 180795 72028 3.55 66863 2.70

54 H arbor - R C om ponent 0 10 20 30 H 264 A V C - I F ram e 32 Bridge - R C om ponent 0 5 10 15 20 H 264 AVC I Fram e 54 H arbor - G C om ponent 0 10 20 30 H 264 A V C - I F ram e 32 B ridge - G C om ponent 0 5 10 15 20 54 H arbor - B C om ponent 0 10 20 30 H 264 A V C - I F ram e 32 Bridge - B C om ponent 0 5 10 15 20 54 H arbor - A l3 C om ponents 0 10 20 30 H 264 A V C - I F ram e 32 B ridge - A l3 C om ponents 0 5 10 15 20 Figure 3: R-D curves for each color components as well as the average of all three of the Harbor test image (left) and the Bridge test image (right) comparing JPEG2000, Microsoft HD Photo, and H.264/AVC High-Profile 4:4:4 intra coding.

D og - R C om ponent 0 5 10 15 20 25 30 H 264 A V C I fram e W om an - R C om ponent 0 10 20 30 60 54 D og - G C om ponent 0 5 10 15 20 25 30 W om an - G C om ponent 0 10 20 30 60 D og - B C om ponent 0 5 10 15 20 25 30 W om an - B C om ponent 0 10 20 30 60 33 D og - A l3 C om ponents 0 5 10 15 20 25 30 H 264 A V C I F ram e W om an - A l3 C om ponents 0 10 20 30 60 H 264 A V C I F ram e Figure 4: R-D curves for each color components as well as the average of all three of the Dog test image (left) and the Woman test image (right) comparing JPEG2000, Microsoft HD Photo, and H.264/AVC High-Profile 4:4:4 intra coding.

33 B oy - R C om ponent 0 5 10 15 20 25 30 55 B uilding - R C om ponent 0 5 10 15 20 25 30 H 264 A V C I-Fram e 33 B oy - G C om ponent 0 5 10 15 20 25 30 55 B uilding - G C om ponent 0 5 10 15 20 25 30 H 264 A V C I-Fram e 33 B oy - B C om ponent 0 5 10 15 20 25 30 H 264 A V C I F ram e 55 B uilding - B C om ponent 0 5 10 15 20 25 30 H 264 A V C I-F ram e 33 B oy - A l3 C om ponents 0 5 10 15 20 25 30 H 264 A V C I F ram e 55 B uilding - A l3 C om ponents 0 5 10 15 20 25 30 H 264 A V C I-F ram e Figure 5: R-D curves for each color components as well as the average of all three of the Boy test image (left) and the Building test image (right) comparing JPEG2000, Microsoft HD Photo, and H.264/AVC High-Profile 4:4:4 intra coding.

4.2. Intra-Frame Video Coding Comparison Fig. 6 and Fig. 7 show the rate-distortion curves of all three coders in our comparative study in intra-frame compression with four 1080p video sequence. In this experiment, except the peculiarity in the Traffic sequence, the results are much more in-line with those in our previous publications [1, 2]. All three coders offer very similar RD performances with the Duck, Running Crowd, and Old Town sequences. Since H.264/AVC HP JM11 has consistently outperformed JPEG2000 on 1080p YUV 4:2:0 data in the past [2], we speculate that proper encoder parameter tuning for JM12 for RGB 4:4:4 will improve H.264 coding performance further. Similarly to the image coding experiment, we also plan to conduct future tests extensively with YUV 4:4:4 data as well. 5. CONCLUSION This comparative study points out the objective RD-performances of two leading image codecs, JPEG2000 and Microsoft HD Photo, versus that of the latest H.264/AVC I-frame coding scheme of the High Profile in high-resolution high-bit-rate image/video coding applications where fast and convenient frame access is of highest priority. Along with benchmarks in [1, 2], on our experiment again confirms that as far as R-D performance is concerned, H.264/AVC High Profile, despite being designed specifically for video compression, is very competitive in still image compression as well. At the 1080p resolution in raw RGB 4:4:4 mode, H.264 already offers similar performances in peak signal-to-noise ratio with JPEG2000 and Microsoft HD Photo. Furthermore, we remark that modern codecs, especially when used in high quality, high bandwidth applications, are more memory constrained rather than processor constrained, especially memory bandwidth. In this context, since the wavelet transform is a global transform (whereas AVC and HD Photo use block transforms), the memory bandwidth requirements of JPEG2000 far exceed those of AVC and HD Photo. When tiling is used in JPEG2000 to constrain memory bandwidth (e.g., for 128x128 tiles, 1080p would have 16x8=128 tiles), H.264 HP and Microsoft HD Photo may in fact be superior. For future work, we are planning a much more comprehensive comparison of the three codecs with a more diverse set of test images and video sequences across a wider range of bit rates, data resolution, bit depth, and color format. We are particularly interested in the high-quality 4:4:4 setting with 10-bit and 12-bit video data for high-definition applications. Lossless coding performance of H.264, scalability performance of JPEG2000 versus HD Photo, and complexity as well as memory requirement 6. ACKNOWLEDGEMENTS We would like to thank Microsoft Corporation, especially Dr. Sridhar Srinivasan and Dr. Chengjie Tu, for providing us the HD Photo porting kit with the latest encoder/decoder. We also would like to thank Karsten Sühring, Dr. Gary Sullivan, Dr. Haoping Yu, and Dr. Alexis Tourapis for discussion on appropriate encoder settings for H.264/AVC JM12.4 intra coding.

T raffic - R C om ponent H 264 A V C I F ram e D uck - R C om ponent H 264 A V C 0 20 60 B itrate (M bps) 65 85 105 125 B it rate (m bps) Traffic - G C om ponent H 264 A V C IFram e D uck - G C om ponent H 264 A V C 0 20 60 B itrate (M bps) 65 85 105 125 T raffic - B C om ponent H 264 A V C I F ram e D uck - B C om ponent H 264 A V C 0 20 60 B itrate (M bps) 65 85 105 125 Traffic - A ll3 C om ponents H 264 A V C IFram e D uck - A l3 C om ponents M icrosoft H D Photo H 264 A V C 0 20 60 B itrate (M bps) 65 85 105 125 B itr ate (M bps) Figure 6: R-D curves for three components as well as the average of all three of the 1080p Traffic sequence (left) and the Duck sequence (right) comparing JPEG2000, Microsoft HD Photo, and H.264/AVC High-Profile 4:4:4 intra coding.

C row dr un - R C om ponent 58 78 98 H 264 A V C I fram e O ldtow n - R C om ponent 54 64 74 84 94 104 M icrosoft H D Photo H 264 A V C IFram e C row dr un - G C om ponent 54 58 78 98 H 264 A V C I fram e 55 O ldtow n - G C om ponent 54 64 74 84 94 104 M icrosoft H D Photo H 264 A V C IFram e C row dr un - B C om ponent 58 78 98 H 264 A V C I fram e O ldtow n - B C om ponent 54 64 74 84 94 104 H 264 A V C IFram e C row dr un - A l3 C om ponents 58 78 98 H 264 A V C I fram e O ldtow n - A l3 C om ponents 54 64 74 84 94 104 H 264 A V C IFram e Figure 7: R-D curves for three components as well as the average of all three of the 1080p Crowd sequence (left) and the Old Town sequence (right) comparing JPEG2000, Microsoft HD Photo, and H.264/AVC High-Profile 4:4:4 intra coding.

REFERENCES [1] P. Topiwala, Comparative Study of JPEG2000 and H.264/AVC FRExt I-Frame Coding on High Definition Video Sequences, Proc. SPIE Int l Symposium, Digital Image Processing, San Diego, Aug. 2005. [2] P. Topiwala, T. Tran, W. Dai, "Performance Comparison of JPEG2000 and H.264/AVC High Profile Intra Frame Coding on HD Video Sequences", Proc. SPIE Int l Symposium, Digital Image Processing, San Diego, Aug. 2006. [3] ITU-T Recommendation H.264 and ISO/IEC 1-10 MPEG-4 Part 10, Advanced Video Coding (AVC), 2003. [4] I. E. G. Richardson, H.264 and MPEG-4 Video Compression, John Wiley & Sons, Sep. 2003. [5] ITU-T Rec. T.800 and ISO/IEC 154-1, JPEG2000 Image Coding System: Core Coding System (JPEG2000 Part 1), 2000. [6] ISO/IEC 154-3, Motion-JPEG2000 (JPEG2000 Part 3), 2002. [7] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, standards, and Practice, Kluwer Academic Publishers, 2001. [8] W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data Compression Standard, Kluwer Academic, Jan. 1993. [9] Microsoft HD Photo Specification: http://www.microsoft.com/whdc/xps/wmphotodwn.mspx [10] D. Marpe, V. George, and T. Wiegand, Performance comparison of intra-only H.264/AVC HP and JPEG2000 for a set of monochrome ISO/IEC test images, JVT-M014, 18-22 Oct., 2004. [11] D. Marpe, V. George, H. L. Cycon, and K. U. Barthel, Performance Evaluation of Motion-JPEG2000 in Comparison with H.264 / AVC Operated in Intra Coding Mode, Proc. SPIE Int l Symposium, vol. 66, pp. 129-1, Feb. 2004. [12] M. Ouaret, F. Dufaux, and T. Ebrahimi, "On comparing JPEG2000 and Intraframe AVC", Proc. SPIE Int l Symposium, Digital Image Processing, San Diego, Aug. 2006. [13] G. J. Sullivan, P. N. Topiwala, and A. Luthra, The H.264/AVC Advanced Video Coding standard: overview and introduction to the Fidelity Range Extensions, Proc. SPIE, Aug. 2004. [14] H.264/AVC Latest Reference Software (JM12) Website: http://iphome.hhi.de/suehring/tml/download/ [15] C. Tu and T. D. Tran, Context based entropy coding of block transform coefficients for image compression, IEEE Trans. on Image Processing, vol. 11, pp. 1271-1283, Nov. 2002. [16] W. Dai, L. Liu, and T. D. Tran, Adaptive block-based image coding with pre-/post-filtering, IEEE Data Compression Conference, pp. 73-82, Snowbird, UT, Mar. 2005. [17] H. S. Malvar, Biorthogonal and nonuniform lapped transforms for transform coding with reduced blocking and ringing artifacts, IEEE Trans. Signal Processing, pp. 10 10, Apr. 1998. [18] T. D. Tran, J. Liang, and C. Tu, "Lapped transform via time-domain pre- and post-filtering," IEEE Trans. on Signal Processing, vol., pp. 1557-1571, Jun. 2003. [19] H. S. Malvar, Signal Processing with Lapped Transforms, Boston, MA: Artech House, 1992.