White Paper An Introduction to High Dynamic Range (HDR) and Its Support within the H.265/HEVC Standard Extensions By Raul Diaz, Vanguard Video CEO December 17, 2014 Vanguard Video, LLC. 974 Commercial Street, Suite 200 Palo Alto, CA 94303 (650) 961-3098 (voice) (650) 292-2340 (fax) ask@vanguardvideo.com www.vanguardvideo.com
TABLE OF CONTENTS Introduction... 3 Improving video quality... 3 Increasing the Spatial Resolution... 3 Increasing the Temporal Resolution... 4 Increasing the Information Carried by Each Pixel... 5 Color and Brightness... 5 The Emerging Standards for High-dynamic Range Color and Brightness... 8 More Colors... 8 Brighter Images... 9 Dolby Vision and Dual-layer HDR... 11 Emerging Standards for Single-layer HDR... 12 HDR Related SEI Messages in HEVC... 13 Using the HEVC HDR SEI Messages... 14 Conclusion... 14 2 Copyright 2014 Vanguard Video, LLC. All rights reserved.
INTRODUCTION At the Society of Motion Picture and Television Engineers (SMPTE) annual conference in October 2014, I presented a paper co-authored with Dolby Laboratories outlining some challenges and solutions for the integration of Dolby Vision and H.265/HEVC video compression. Dolby Vision is a system that provides for the encoding and display of High Dynamic Range (HDR) video. In that same presentation, I briefly explained that the H.265/HEVC standard extensions approved by the JCT-VC 1 in July 2014 also provide for compression and signaling of HDR video content. In this white paper, I provide more details about the support for HDR within the H.265/HEVC standard. Before exploring how H.265/HEVC supports HDR, I would like to describe what HDR is, what benefits it provides, and some of the challenges of managing HDR video. IMPROVING VIDEO QUALITY There are three ways that one can improve the perceived quality of video displayed to the human eye. 1. Increase the spatial resolution Add more pixels to each displayed frame of video, such as increasing from HD 1920x1080 to UHD or 4K at 3840x2160 pixels. 2. Increase the temporal resolution Display more frames every second. This is particularly helpful for high-speed motion such as sports. 3. Increase the information carried by each pixel This one is harder to explain and what this article is all about. Increasing the amount of data delivered by each pixel involves increasing the minimum and maximum absolute brightness, as well as displaying a wider range of colors. INCREASING THE SPATIAL RESOLUTION If you have more pixels in each frame, you can display more detailed, life-like images. The addition of more pixels per frame has been going for as long as moving pictures and television have existed. Traditional film used larger film negatives to add resolution going from 35mm to 70mm film, for example. In television, analog television displayed at approximately 540 horizontal pixels increased in the transition to digital television to 1920 horizontal pixels. The corresponding number of vertical lines also increased. Most movies and many television shows today 1 The two international standards bodies, ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG), established the Joint Collaborative Team on Video Coding (JCT-VC) to develop the HEVC standard 3 Copyright 2014 Vanguard Video, LLC. All rights reserved.
are already filmed in high-resolution of about 4000 vertical pixels, and 4K displays are becoming available at increasingly low prices. 2 Low resolution High resolution Figure 1: Increasing spatial resolution INCREASING THE TEMPORAL RESOLUTION It may not be obvious at first glance how increasing the number of frames displayed every second improves video quality, but consider a sports game such football or a car race. In these sports, objects such as the ball in the game or the cars in the race, are moving very fast. If they are filmed at higher frame rates, then you are more likely to see the actual moment that a ball is caught or that a car crosses the finish line, all with more clarity and less blurriness. In effect, the video is sending more pixels to the eye every second with the effect of more information delivered to the eye. Figure 2: Increasing temporal resolution: few frames per second to many frames per second Sports casts and even movies are being filmed at higher frame rates. The technology is readily available, but the infrastructure required to deliver higher frame rates to movie cinemas and to the homes needs to be updated to handle the higher frame rate video. 2 See for example, http://www.whas11.com/story/tech/2014/11/26/discounted-4k-tvs-among-early-holiday-shoppingdeals/19562203/ 4 Copyright 2014 Vanguard Video, LLC. All rights reserved.
INCREASING THE INFORMATION CARRIED BY EACH PIXEL To transmit the color and brightness information from a camera or recorded source to a display, each pixel to be displayed is described by three numbers representing the brightness of three primary colors, normally red, green and blue. When these numbers are received by the display, the corresponding pixel on the screen has chemicals that, when illuminated, reproduce the primary colors. Since the chemicals for the primary colors are very close together for each pixel, the human eye perceives them as one combined color and brightness level, allowing the display to represent a wide range of colors and brightness levels. Figure 3: A single pixel's primary colors grouped closely together In practice most video distributed today, including that in DVDs, Blu-ray discs, television broadcasts, and Internet streaming, describes each primary color with an 8-bit value, with three bytes representing the pixel. Some professional systems use 10-bits per color, and theaters with digital cinema equipment use 12-bits per color. The more bits per pixel available, the more information color and brightness information can be sent. However, the cameras and displays must interpret the numbers in the same way to represent the absolute colors and brightness levels. COLOR AND BRIGHTNESS To accommodate all combinations of cameras and displays, the numbers used to represent the pixel values must be standardized so that every display shows the same colors and brightness levels. In practice, displays cannot be calibrated perfectly to a specific standard, so there is some variation, but it is relatively minor, and we tend to be satisfied with the color and brightness reproduction. As a result, for movies and videos to be displayed properly, every device and processing step in the creation and delivery of the movie or video must agree on the following properties: 1. The exact colors of each primary color in the pixel 2. The absolute, real-world brightness for each pixel when the number representation of each color is the maximum 3. The absolute, real-world brightness (or darkness, in this case) for each pixel when the number representation is the minimum (generally zero) The motion pictures shown at the cinema and the video delivered to the home today rely on standards based on compromises made decades ago on the how moving pictures could be captured, prepared, delivered and displayed at the time the standards were created, and, sometimes, updated. 5 Copyright 2014 Vanguard Video, LLC. All rights reserved.
The first standards were created for the display of emulsion film through a light projector. Because the light bulbs in projectors also produce heat, a maximum bulb brightness was needed to prevent the film from melting as it spooled through the projector. The colors in film were based on the chemical processes used to create and develop the film. When color television was invented, the pixel colors used rare earth phosphors in the cathode ray tube (CRT), which were only able to represent a relatively restricted range of colors. This is the reason that colors are more vibrant at the cinema than on your television. On the other hand, television CRTs are brighter than cinema movies. As a result of these mid-20 th century standards, maximum film brightness was set at 55 nits (1 nit = 1 candle/m 2 ) and television and video was set to a maximum brightness of 120 nits (in practice most CRTs were calibrated for a maximum brightness of 80 to 100 nits). 3 For comparison, a bright sunny day (but not looking directly at the sun) is more than 1,000,000 nits. Film for cinemas can display quite a wide range of colors, but it is still a subset of all the colors that the human eye can see. Television and video colors were, and continue to be, much more restricted to a fairly small subset of the colors that the human eye can see. For comparison, consider the color space diagram below. Figure 4: CIE 1931 color space showing REC 709 color space 4 The ellipsoid shaped area of color represents the total color spectrum visible to the human eye at optimal viewing conditions. The small triangle labeled with D65 white point inside is the REC 709 5 color standard which defines the 3 See SMPTE RP 166: http://standards.smpte.org/content/978-1-61482-180-9/rp-166-1995/sec1.body.pdf+html?sid=4be88866-39c0-4b26-9b35-ff55b9b6732f 4 See http://www.cie.co.at/index.php/publications and obtain a detailed summary at http://en.wikipedia.org/wiki/rec._709 5 REC 709 is the commonly used term for the ITU-R Recommendation BT.709 which can be found at http://www.itu.int/rec/r- REC-BT.709/en. Image is from Wikipedia http://en.wikipedia.org/wiki/rec._709 6 Copyright 2014 Vanguard Video, LLC. All rights reserved.
color primaries and white point (i.e. the point where each primary color value is equal). This standard was created for CRTs and updated for HDTV in the 1990s. This limited color range is the color range used to produce all television and video including broadcast television, analog and digital cable television, satellite television, streaming video, DVDs and Blu-ray discs. Today s display technology far surpasses these mid-20 th century standards. Modern digital cinema and emerging laser light projectors can emit much brighter light. Liquid crystal displays (LCDs) are brighter even than CRTs and can display richer colors. The sensors in modern digital cameras have also evolved dramatically and can capture a tremendous range of light and color, far in excess of the capabilities of displays and, in some cases, even the human eye. To really understand how limiting the old brightness standards are consider the following minimum and maximum brightness (or luminance) values for human beings and a range of technologies. Figure 5: Luminance dynamic range for various technologies. 6 The human eye uses the pupil to allow it to see darker or brighter images. If the pupil gets smaller, then less light enters the eye. This means that the eye can look at brighter areas without overwhelming the eye with too much light. But this comes with a tradeoff: when the pupil is small, less light means that the eye cannot see dark areas as well. The total range from darkest to brightest visible at the same time does not change significantly for the eye, but the eye can adjust the pupil to optimize the overall viewing conditions and adjust to varying amounts of light. Cameras do much the same thing by adjusting the aperture of the camera, which is a mechanism that adjusts the 6 Image from a presentation by Vanguard Video LLC at the SMPTE 2014 conference on Oct 21, 2014 in Hollywood, California in reference to a co-authored paper by Vanguard Video and Dolby Laboratories to be published by SMPTE. Presentation available upon request from ask@vanguardvideo.com. 7 Copyright 2014 Vanguard Video, LLC. All rights reserved.
size of the hole that allows light to enter the camera. So eyes and cameras can see almost the whole range of images available to us in our everyday experience, but not at the same time. They must adjust the pupil or the aperture to the overall illumination of a scene. Historically, cameras and displays could only capture a small part of the dark-to-light range in a scene, i.e. the dynamic luminance range, or just dynamic range for short. Modern cameras rival, and in some cases exceed, the capabilities of the human eye, so that they are able to capture a very wide luminance range in scenes, i.e. high dynamic range or HDR. Modern displays, while still not able to display as wide a dynamic range as cameras, have also improved dramatically in the last decade, and they are now able to display approximately the dynamic range of the human eye at a fixed pupil size. So far we have discussed brightness dynamic range and color dynamic range separately, and, in general, separate standards are used to govern these two ranges. However, these ranges are somewhat related. The absolute colors are set by the chemicals used that determine the primary colors, but the absolute brightest possible color for each primary color, and all their maximum combinations of max red, max green and max blue, etc., require knowledge of the absolute physical maximum brightness that the maximum pixel value for that color represents. Furthermore, the numerical representations from minimum brightness to maximum brightness are not linear because the human eye itself does not perceive brightness linearly. Another non-linearity arises when displaying a wider captured dynamic range (say from a digital camera) on a lower dynamic range display (such as an LCD). This is a very common situation which requires non-linear dynamic range compression. These non-linearities are managed using a gamma correction curve. We will generally not refer to this level of detail in the discussion of HDR, but it is important to understand that these subjects are addressed in most of the standards discussed, which provide for these effects and provide for transformations in dynamic range and gamma between different standards. As a result, you may see references to elements such as white points, maximum per color brightness, and gamma curves. THE EMERGING STANDARDS FOR HIGH-DYNAMIC RANGE COLOR AND BRIGHTNESS Once it became evident that the technology of cameras and displays had advanced and was able to deliver a wider dynamic range, new standards were needed to be able to reliably represent the wider luminance range and the greater chrominance range, i.e. the greater number of colors that can be represented. MORE COLORS New color standards have already been created to address a wider color range, which itself requires more bits. By way of example consider digital cinema which relies on the CIE XYZ color space at 12 bits per component. The CIE XYZ color space is much, much larger than REC 709, and even larger than what the human eye can see. 7 In effect, 7 See ISO (http://www.iso.org) and CIE (http://www.cie.co.at) standard ISO 11664-1:2007(E)/CIE S 014-1/E:2006. Also, see especially ISO 11664-2:2007(E)/CIE S 014-2/E:2006 and ISO 11664-3:2012(F)/CIE S 014-3/F:2011. For an in-depth introduction, see http://en.wikipedia.org/wiki/cie_1931_color_space#meaning_of_x.2c_y.2c_and_z 8 Copyright 2014 Vanguard Video, LLC. All rights reserved.
CIE XYZ can represent every possible color visible to a person. For display monitors, including LCDs, a new color standard was created, which is generally referred to as REC 2020 8. This standard also provides for more bits per component, supporting 10 bits and 12 bits per component. Figure 6: REC 2020 triangle in the CIE 1931 color space 9 As can be seen comparing Figure 4 to Figure 6, REC 2020 allows display of more of the colors that the eye can see than what REC 709 makes possible. These new color standards share one element in common: they require more bits to represent additional color information, which allows for a higher color dynamic range. BRIGHTER IMAGES If you examine Figure 5, modern LCDs can display a dynamic range just smaller than the human eye. There are laboratory displays that can exceed the dynamic range of the eye. Unfortunately, LCDs are still optimized for the maximum 120 nit standard defined by SMPTE. LCDs can display much brighter levels in the hundreds of nits and even over 1,000 nits. Professional LCDs can display several thousand nits. Since LCDs can already display brighter whites than CRTs, LCDs often have a manual or automatic dynamic range expansion. The expansion is needed because the video that the LCDs display is still exclusively encoded as per REC 709. 8 https://www.itu.int/rec/r-rec-bt.2020-1-201406-i/en 9 Wikipedia: http://en.wikipedia.org/wiki/rec._2020 9 Copyright 2014 Vanguard Video, LLC. All rights reserved.
Today s brightness flow from creation to display proceeds generally in this manner: Figure 7: Workflow and delivery of standard dynamic range to LCDs In Figure 7, the camera can capture a very high dynamic range and the LCD can display a reasonably high dynamic range. Unfortunately, the processing of the video during workflow and transmission still follows historical REC 709 standards using 8-bits per component and a maximum brightness level of 120 nits. The expansion of brightness levels performed by the LCD will yield brighter colors, but it cannot add any of the information lost when the dynamic range was compressed into the REC 709 standard. The expansion can even yield unwanted artifacts such as color banding. Color banding happens when the gradations between one color brightness and the next representable color brightness is too large and the viewer sees a step color transition instead of a smooth color shading. If the LCD monitor Figure 8: Example of color and luminance banding effect tries to expand the color dynamic range, then even worse effects may become visible with actual color distortions from the original intended colors. Based on the bottleneck of dynamic range compression between the captured images and the display, a new standard for carrying additional dynamic range is needed. Standards such as REC 2020 are helpful for providing additional color information, but that standard and others like it deal almost exclusively with colorimetry and formats, and they do not address luminance dynamic range. 10 Copyright 2014 Vanguard Video, LLC. All rights reserved.
DOLBY VISION AND DUAL-LAYER HDR At the Consumer Electronics Show in January 2014 in Las Vegas, Dolby Laboratories demonstrated their new Dolby Vision technology privately, and at the International Broadcast Conference in September 2014 in Amsterdam 10, Dolby demonstrated Dolby Vision publicly. Dolby Vision is a proprietary format that contains standard dynamic range information in one layer and additional dynamic range information in another layer. 11 This dual-layer HDR format has the advantage of working transparently with existing, non-hdr capable equipment. Figure 9: Dolby Vison dual-layer HDR encoder. Metadata pathways not shown. 12 10 See, for example, press announcements at http://www.display-central.com/free-news/display-daily/dolby-reveals-detailsbehind-dolby-vision-2/ and http://www.display-central.com/trade-shows/infocomm-2014/imaging/pat-griffiths-dolby-explainsdolby-vision-ibc-2014/. 11 Dolby Vision technical information was first described publicly at the SMPTE 2014 annual conference in Hollywood, California on October 21, 2014 by Vanguard Video. See footnote 6 for additional information. 12 Image from presentation and paper described in footnote 11 11 Copyright 2014 Vanguard Video, LLC. All rights reserved.
Figure 10: Dolby Vision dual-layer HDR decoder. Metadata pathways not shown. 13 EMERGING STANDARDS FOR SINGLE-LAYER HDR Other companies and organizations are working on single-layer HDR format proposals. 14 At this point in time as of December 2014, there is no HDR standard format approved by an international standards organization. However, MPEG and VCEG have approved a set of extensions that provide the necessary tools to support HDR functionality within the H.265/HEVC standard. These tools were approved in July 2014 at the 109 th MPEG meeting in Saporo, Japan. 15 The HDR toolset are part of a set of new extensions to the HEVC standard, sometimes referred to as HEVC version 2. These extensions cover a broad range of subjects, including higher bit depth profiles which carry more information pixel, scalable video quality (temporal, spatial, etc.), multi-view extensions including support for 3D, and a number of new metadata messages. 16 The higher bit depth profiles were described previously in Vanguard Video s blog in a white paper titled, An Evaluator s Guide to Understanding the Newly Approved H.265/HEVC Range Extensions and Profiles. 17 Several of the metadata messages, called Supplemental Enhancement 13 Ibid. 14 See, for example, http://www.bbc.co.uk/rd/blog/2014/07/bbc-r-d-at-the-ebu-dvb-high-dynamic-range-workshop and http://www.4-traders.com/philips-6289/news/philips--graphics-processing-for-high-dynamic-range-video-in-patent- Application-Approval-Process-18481721/ 15 http://mpeg.chiariglione.org/meetings/109 16 Boyce, Jill, et. al, Edition 2 Draft Text of High Efficiency Video Coding (HEVC), Including Format Range (RExt), Scalability (SHVC), and Multi-View (MV-HEVC) Extensions available at http://phenix.itsudparis.eu/jct/doc_end_user/current_document.php?id=9466 17 http://vanguardvideo.com/blog/what-you-need-to-know-about-the-h-265hevc-range-extensions-and-new-profilesapproved/ 12 Copyright 2014 Vanguard Video, LLC. All rights reserved.
Information or SEI messages, carry information about how to manage the higher bit depth profiles to signal that the component data is HDR data. HDR RELATED SEI MESSAGES IN HEVC 18 HEVC version 2 includes three SEI messages related to HDR data. The first one is called the Chroma resampling filter hint SEI message. This message contains information necessary to map from one color space to another. 19 For example, if the compressed video is encoded in REC 2020 which may have 10- or 12-bits per component, and the display monitor can only render REC 709 8-bits per component video, then some transformation must be specified. In this example, data from the source will be lost through a compression of the color space. To minimize artifacts and to respect the source s colors as much as possible, this SEI message can be used to specify a desirable transformation that is optimal to the source s content. Multiple messages of this type may be included to handle a wide range of different displays. Another message is the Knee function information SEI message. This message contains information about how to transform from one brightness or luminance dynamic range to another. For example, if the compressed source contains HDR data of, say, 12-bits per component with a maximum luminance level of 500 nits, but the display can only display 120 nits, then some type of dynamic range compression function needs to be specified to minimize contrast distortions. Since the human eye perceives contrast differently in bright areas as compared to darker areas, the transformation of luminance dynamic range will generally be non-linear. To simplify transformation functions, the concept of knees is introduced so that multiple line segments can be joined together to approximate a non-linear curve. Figure 11: Non-linear curve and piecewise linear approximation The final message is Mastering display colour volume SEI message. This message provides information on the color primaries and the luminance dynamic range of the display that was used to author the source. This information is necessary to build the Knee function information SEI message to map the source luminance and chrominance intent to the best possible match on an output display. 18 Ibid Boyce 19 Andrivon, Pierre, et al., SEI message for Colour Mapping Information, http://phenix.itsudparis.eu/jct/doc_end_user/current_document.php?id=8878 13 Copyright 2014 Vanguard Video, LLC. All rights reserved.
USING THE HEVC HDR SEI MESSAGES The HEVC HDR SEI messages allow the construction of practically any arbitrary HDR workflow from content creation to final display. However, for these SEI messages to become useful and actively used, everyone in the content creation and delivery pipeline must agree on a specific set of HDR mappings that should be commonly available, i.e. a new industry-wide HDR standard must be developed. The display manufacturers, in particular, need to support the additional decoding complexity of the HDR signals. This complexity is not limited to managing a consistent and agreed-upon set of HDR SEI message settings and configurations, but also requires that the HEVC decoder handle the HEVC range extensions in H.265/HEVC version 2.0. Those range extensions offer formats that handle not only higher bit depth video components, such as 10- and 12- bits per color, but also offer lower chroma decimation formats such as 4:2:2 and 4:4:4 which preserve more color detail in the video signal. 20 As a result, HDR-capable displays will certainly need additional HEVC decoder capabilities. Even Dolby Vision, which uses a dual-layer approach, requires at least a second decoder for the HDR layer, and so Dolby Vision also requires updated HDR-capable displays. Most professional cameras can already output raw HDR video data. However, the workflow tools to create the encoded and distributable HDR video content are still in development. These tools will need to support new HDR standards that emerge, and they will need to integrate new HDR encoders with the appropriate range extensions needed to manage the HDR source and to produce the HDR compressed output. CONCLUSION Video quality improvements are already arriving in the form of higher resolution such as 4K displays and increased temporal resolution in the form of more frames per second to capture more detailed high-speed action. HDR standards are being proposed actively by key industry companies and international organizations to provide a richer, more vibrant video experience. For those who had the good fortune to see HDR video on an HDR display already know the breathtaking beauty that they offer. The entire video industry is coming together to make HDR video technology widely available for professional production and for consumer viewing. Vanguard Video is actively involved in this effort with publicly announced support for Dolby Vision, internal development to support the HEVC HDR SEI messages, and involvement on additional HDR initiatives. 20 For reference, see footnote 17 14 Copyright 2014 Vanguard Video, LLC. All rights reserved.
About Vanguard Video Founded in 1995, Vanguard Video is a supplier of professional, broadcast quality H.265/HEVC and H.264 codec SDKs to top tier customers around the world. With deep codec expertise, unparalleled performance and quality, and world class support, Vanguard Video has helped its customers capitalize on many first to market opportunities by pioneering top tier encoding solutions including the release of the world s first commercially deployed H.265/HEVC service. Vanguard Video codecs support a wide range of platforms including software implementations for x86 and ARM microprocessors as well as OpenCL acceleration for GPUs. For more information about our technologies please visit our website www.vanguardvideo.com Corporate Headquarters 974 Commercial Street, Suite 200 Palo Alto, CA, 94303 USA Phone: +1 (650) 961-3098 Fax: +1 (650) 230-4904 www.vanguardvideo.com 15 Copyright 2014 Vanguard Video, LLC. All rights reserved.