Taos - A Revolutionary H.264 Video Codec Architecture For 2-Way Video Communications Applications

WHITE PAPER Taos - A Revolutionary H.264 Video Codec Architecture For 2-Way Video Communications Applications Introduction The Taos H.264 video codec architecture addresses crucial requirements for latency, multi-channel, video resolution and video quality in 2-way video communications applications, such as video conferencing, video telephony and telepresence. Taos implements unique features, such as zero latency, flexible channel resource allocation and HD video quality, which solve the most demanding requirements in these applications. In addition, Taos also addresses equally important system-level issues, such as noise filtering, optimal network bandwidth usage and error resiliency and concealment. Latency and Zero Latency Defined Simply put, video codec latency is defined here as the time lapse between the first pixel of video appearing in the source and the first pixel of decoded video appearing at the destination. Latency sensitive video applications require that the time lapse between source and decoded video is extremely small. How small depends on the application, but as a guideline, keeping latency Source Video Latency Decoded Video Latency between source and decoded video Abstract As the video conferencing and telepresence market continues to make the transition from CIF and D1 resolutions to full blown HD, due to broadband proliferation and declining HD display prices, OEMs are realizing that their only option in this transition is H.264/MPEG-4 AVC (Part 10). However, traditionally this has come at a price high costs, high power dissipation, high encode-decode latencies and low channel densities. The Taos H.264 video codec architecture addresses all these issues. It implements unique features, such as zero latency, high channel-density and HD video quality, while keeping costs and power dissipation competitive with or below incumbent solutions. In addition, Taos also addresses equally important systemlevel issues that are inherent to digital video communications, such as noise filtering, optimal network bandwidth usage and error resiliency and concealment. Taos builds upon 1st generation WW10K lowdelay, multi-channel and HD codec chipset from W&W Communications. As such, Taos is a tried and proven video-codec architecture for practical solutions in real-time video systems. Author Kishan Jainandunsing, PhD VP Marketing

2 W&W Communications down to sub 10ms is a good idea. For convenience we will call such low latency zero latency. This is in contrast with the orders of magnitude higher latency found in non latency-sensitive applications. True Multi-Channel Defined True multi-channel is defined here as independently encodable and decodable video streams. Each video stream is encoded with its own set of encoding parameters. Changing parameters for one stream does not affect other streams and can be done dynamically during the encoding process. Similarly, decoding one stream does not affect the decoding of other streams, including error propagation and concealment. True HD Defined The HD, or High-Definition, moniker is used in the video industry for resolutions of 1280x720 and upwards. The term true HD refers here to 1920x1088 resolution at 60 frames per second in progressive scan mode. This represents the highest resolution defined [at present] for high-definition video. Video resolution chart The Taos Architecture A high-level block diagram representation of the Taos architecture is shown below. At the heart of the architecture is the multi-stream, zero latency, high-definition H.264 codec. The I/O subsystem supports eight physical video ports, which can be all eight inputs, all eight outputs or a combination of inputs and outputs. Each video port supports multiplexed video streams. This allows Taos to support up to 32 independent video streams simultaneously in encode or decode mode. Dual DDR controllers support external DDR-2 memory and provide sufficient memory bandwidth and storage capacity to support 32 independent video streams at up to 1920x1088 resolution. A video pre-processor subsystem supports several functions, such as de-multiplexing of input video streams, frame rate adaptation, content-adaptive noise filtering, duplication and downscaling. A video post-processor subsystem provides several functions, such as multiplexing several decoded streams onto a single video display port and on-screen display (OSD) support. An I2C master interface allows video peripheral circuits to be controlled, such as PAL/NTSC encoders and decoders, HDMI receivers and transmitters, CMOS and CCD sensors. A flash memory interface controller provides support for flash devices over a serial interface for storage of Taos configuration settings. A 32-bit/66MHz PCI bus and a 32-bit generic host bus provide communication with an external host processor for network connectivity, audio, driver, operating system and application software support. A high-performance, multi-channel DMA controller handles high-speed data transfers of encoded streams between the codec and external host processor s memory. The DMA engine supports scatter/gather data transfers, significantly reducing overhead on the host side. Zero Latency Encode-Decode In mainstream implementations the encoding process starts when a complete frame of video is present, introducing at least 33ms of latency into the encoder and another 33ms at the decoder. Together with multi-pass motion estimation, multi-pass rate control and framebased source filtering, traditional implementations can easily exhibit in excess of 200ms encode-decode latency. In contrast, Taos implements fine-grain pipelining at the macro-block level, advanced bit rate prediction and in-loop source filtering. The encoding process starts as soon as the first lines of video are available in a frame. In this way the encoder does not need to wait for an a) Frame-based pipelining, high latency implementation b) Fine-grain pipelining, zero latency implementation Taos high-level architecture block diagram Affecting latency through implementation choices Taos A Revolutionary H.264 Video Communications Codec Architecture

W&W Communications 3 entire frame to be present before it starts encoding. This comes with the extra benefit of very little memory needed for buffering. In addition, Taos performs single pass motion estimation, single-pass rate control and in-loop contentadaptive motion compensated temporal filtering. This, in combination with the macro-block level fine-grain pipelining, results in sub 2ms encode-decode latency for 1080p30 video and sub 4ms latency for D1 video at 30 frames/second. Higher frame rates result in proportionately lower latencies and vice versa, since the latency is mainly dependent on the pixel clock of Taos video ports. For instance the latency drops to sub 2ms for a D1 stream at 60 frames/second and to sub 1ms for a 1080p60 video stream. Vice versa, the latency for a D1 stream at 15 frames/second increases to sub 8ms and for a 1080p15 stream to sub 4ms. Higher latencies at lower frame rates can be avoided by down sampling the frame rate inside Taos, prior to encoding, using Taos frame rate adaptation functionality. Finally, operation in Baseline, Main or High Profile does not affect latency and video quality. Two-way video communications applications are highly sensitive to latency. In case of noticeable delay a conversation becomes impossible, unless a walky-talky like protocol is strictly followed, but in this case not just for speech but also for motion. This makes the conversation unnatural and cumbersome. With Taos zero latency, a video conferencing or video telephony session can progress spontaneously and naturally, without the need for awkward and artificial communication protocols between the participants. Requirements of sub 33ms latency are necessary in this case. a) Same resolution & frame size, same frame rates b) Same resolution & frame size, different frame rates Temporal multiplexing of video streams In case of spatial multiplexing a single stream is created by multiplexing the frames of the streams into single frames. In this case the frames may be of different resolution and size, but they must be of the same frame rate. In this mode a single input port can support multiplexing of up to 16 CIF streams, 4 D1 streams, one 720p stream, or one 1080i/p stream. The aggregate of streams across all 8 ports must not exceed 32 streams. a) High latency implementation Unnatural and non-spontaneous conversations b) Taos zero latency implementation Natural and spontaneous conversations Implications of latency on video conferencing applications Multi-channel Encoding The Taos input video ports support temporal and spatial multiplexed streams. Through temporal multiplexing, a single stream is created by time division multiplexing of frames of individual streams. In this case resolution and frame size of all streams must be the same. Only the frame rate may be different between streams. In this mode a single input port can support multiplexing of 32 separate video streams. Alternatively, the 32 streams can be distributed across the eight ports. Spatial multiplexing of video streams The high channel density and large amount of video ports makes DVR and video server implementations based on Taos very cost effective. Video capture subsystems can be kept very simple for 8 port systems. Temporal and spatial multiplexing can be performed with readily available standard-of-the-shelf video decodermultiplexers. The high number of channels specifically benefits video conferencing applications. The number of cameras can be expanded drastically compared to existing solutions. This allows participants their own camera, which in turn promotes a better overall experience by the participants. August 2007

4 W&W Communications Input Stream Duplication and Scaling A video input port can duplicate its stream, scale it down and compress it simultaneously with the original stream. For instance, a D1 or 720p30 stream can be copied, scaled down to CIF or QCIF and sub sampled down to 15 frames/second. It is then subsequently compressed separately from the original stream. This function allows OEMs to offer highly innovative features. For instance, a high-resolution video stream can be simultaneously transmitted with a scaled down copy of itself at lower resolution and frame rate. This copy can be transmitted via a mobile telephony network to a remote participant s cell phone. Stream duplication for simultaneous cell phone transmission Multi-channel Decoding Taos output video ports support multiplexing of decoded video streams. Several modes are supported, such as picture-by-picture (PxP), picture-in-picture (PiP) and picture-on-picture (PoP). Multiplexing of up to 16 video streams per port is possible, with a maximum of 32 streams in aggregate over 8 video output ports. In case of PxP (tiling) an integer number of multiplexed streams must fit within the output resolution. For instance, 4 D1 or 16 CIF frames in a 1920x1088 frame. The multi-stream display functions of Taos drastically simplify system design. Additional OSD (on-screen display) functions further enhance the functionality and simplify system design. HD Encoding and Decoding Taos has the horsepower to encode or decode HD video up to 1080p60, which satisfies the most demanding applications. The quality of the video lies within 2 to 5% of the theoretical performance delivered by the JVT (Joint Video Team) JM (Joint Model) H.264 reference codec. Continuously increasing broadband coverage, video processing horsepower and image sensor resolutions against continuously falling prices is causing OEMs to rapidly incorporate HD resolutions in video conferencing and telepresence equipment. HD conferencing and telepresence provides a much more gratifying experience to participants than VGA or SD resolution. Return on investments for corporations are therefore more likely to be higher than otherwise. Frame Rates and Resolutions Each video port may operate at different frame rates and resolutions, completely independent from each other. The earlier mentioned ability to handle up to 32 streams can be distributed across video ports. The relationship between frame rate and number of streams at a given resolution is given in the table below for 1080, 720, D1 and CIF resolutions, where n is the number of streams. Relationship Between Frame Rate and Streams By Resolution Resolution Frame Rate (n = streams) 1080i/p 60/n, n 32 720p 120/n, n 32 D1 300/n, n 32 CIF 1200/n, n 32 Two conditions apply to multi-stream, multi-port distribution for different frame rates and resolutions: 1. The total number of frames per second cannot exceed the equivalent of one 1080p60 stream or the equivalent of 1200 CIF frames/second. 2. The total number of streams cannot exceed 32. In this example the conversion factor used between CIF and the other resolutions is according to the table below. CIF Conversion Factor Between Resolutions Resolution CIF Conversion Factor CIF 1 DI 4 720p 8 1080i/p 16 Taos HD video quality compared to JVT JM results Taos A Revolutionary H.264 Video Communications Codec Architecture

W&W Communications 5 Example of frame rate and resolution distribution across Taos video ports Frame rates and resolutions can be changed dynamically, not exceeding the maximum processing capacity provided by Taos. Various error resiliency techniques supported by Taos Changing resolutions and frame rates dynamically Dynamic control of resolution and frame rate allows a video conferencing or telepresence system to increase the resolution and frame rate on the fly for a camera feed which priority has been increased, at the expense of video streams which priority has been lowered. Error Resiliency and Concealment Taos provides a series of powerful error resiliency features. Among these are variable GoP (Group of Pictures) size, I-frame forcing, macro-block intra-refresh and multiple slices. Variable GoP size can be used to make transmission of the compressed video more robust under noisy channel conditions. I-frame forcing can be used for reasonably noise-free transmission channels, which permit very long or infinite GoP sizes. The few times packets are corrupted or dropped, the decoder requests the encoder to transmit an I-frame, so that the decoder can recover from the problem. Macro-block intra-refresh allows an I-frame to be distributed across multiple frames, thus smoothing out bit rate peaks in I-frame forcing and making I-frames more robust under noisy channel conditions an error occurring in an I-frame slice does not corrupt an entire I-frame in this case, but only the slice in which it occurred. Multiple slices are another method to contain and recover from errors quickly. By dividing up frames into multiple slices, an error in a slice does not propagate across the slice s boundary and is thus contained. Multiple slices and macro-block intra-refresh have both the effect of lowering overall bit error rates. On the decode side the decoder can either freeze on the frame immediately preceding the corrupted frame, or substitute corrupt macro-blocks with skips to cover them up. Taos provides support for the implementation of H.241 protocols on a host processor for communication between the encoder and decoder. Through this, the decoder can signal the encoder to change GoP size, force an I-frame, change macro-block intra-refresh and multiple-slices parameters. H.241 protocol support for error resiliency These error resiliency and concealment features are very important in two-way video communications applications. The tolerance for errors is very low in these applications and recovery must happen fast. August 2007

6 W&W Communications Bit Rate Control Taos implements constant bit rate (CBR) control for network transmission applications as well as variable bit rate (VBR) control for storage applications. Bit rate control does not affect zero latency. The variance of the bit rate in case of VBR can be set, so as not to exceed available bandwidth of the storage interface. Motion Information Access to motion information ads another dimension of innovation to video conferencing and telepresence systems. This information can be used for instance to detect gesturing by a participant whereupon the camera can promptly zoom in, for instance. In another example, where each participant has its own camera, the frame rate and resolution can be instantly increased for a video feed upon detection of gesturing by the participant on that feed. Taos provides raw motion information in two ways. One is by providing motion vector statistics (average, minimum, maximum and variance) across definable regions and the other is by providing complete motion vector maps and SAD (Sum of Absolute Differences) information for entire frames. Regions are allowed to overlap. Both motion information methods are highly computeintensive. Taos therefore off-loads an external host CPU from performing such calculations. Instead, the host may run OEM specific algorithms on the raw motion information, which interpret whether or not motion is occurring, what relevance the motion has and what action to undertake. Noise Filtering Taos implements in-loop, content-adaptive motioncompensated temporal filtering (CA-MCTF). This re-duces noise levels in the source video with filter strengths adaptively changing based on the content. Subjective quality greatly improves by leaving fine detailed features in the video unaffected, while removing random noise. Sharpness and clarity of the video is maintained, while encoder bit rates are reduced by up to 45%. The singlepass, in-loop operation of the filter maintains zero encode-decode latency. 100% Bit Rate Noisy Source 55% Bit Rate CA-MCTF for noise filtering CA-MTCF Filtered Network Efficiency The Taos encoder takes into consideration maximum transmission unit (MTU) size. Slices can be defined as a function of the number of bytes that optimally fits in the MTU. This avoids fragmentation and segmentation. The result is that network bandwidth is not being wasted unnecessarily, but instead is optimally used, without the need for expensive over-provisioning. Programmability and Time-To-Revenue Taos strikes a good balance between programmability and hardwired functionality. Its rich register set provides extensive control over many of the video processing and system interface functions. Thus, developers do not have to take on the arduous, time-consuming and expensive task of application software porting and programming of video compression algorithms, as is the case with integrated host CPU and programmable DSP architectures. This in turn means low risk development and quick time-to-revenue for OEMs. Low Power Dissipation and Cost Taos is designed with low power dissipation in mind. Total power dissipation in single channel 1080p30 mode is sub 500mW, or sub 25mW in single channel CIF mode at 30 frames/second. This addresses the most stringent power dissipation requirements. At the same time the Taos architecture has been designed with low cost in mind. This is achieved through a combination of efficient logic implementation, a 90nm silicon process and high channel densities. The result is the most competitive cost per channel in the industry. Conclusions Taos is a revolutionary H.264 codec architecture, which provides video-processing functionality highly optimized for two-way video communications applications. Its zero latency, true multi-channel and true HD capabilities meet the most difficult-to-satisfy requirements in these applications. The zero latency capabilities address the fundamental problem of real-time operation in communications applications. Features such as extracting motion information, stream copying and scaling, and dynamic channel resolution and frame rate changing open up opportunities for OEMs to innovate on top of these features. Beyond video compression and decompression, Taos addresses important system aspects as well, such as error resiliency, error concealment, noise filtering and network bandwidth utilization. Taos builds on the legacy of the proven W&W Communications WW10K H.264 HD codec chipsets for multichannel, HD and zero latency applications. This makes Taos a sure bet for OEMs in the two-way video communications market. For More Information For more information on Taos contact W&W Communications at www.wwcoms.com or write an email to info@wwcoms.com. Taos A Revolutionary H.264 Video Communications Codec Architecture

W&W Communications, Inc. reserves the right to make changes to its products and product specifications at any time without notice. W&W Communications is a trademark of W&W Communications, Inc. All other trademarks and registered trademarks are property of their respective holders. Copyright 2001-2006 W&W Communications, Inc. All rights reserved. USA & International Europe China W&W Communications, Inc. 2903 Bunker Hill Lane, Suite 107 Santa Clara, CA 95054, USA Tel: +1.408.481.0264 Fax: +1.408.213.2951 Email: info@wwcoms.com W&W Communications, Inc. Gran Via 6, 4 Madrid, 28013, Spain Tel: +34.91.524.7467 Fax: +34.91.524.7499 Beijing WWComs Info Technology Ltd. Shangdi DongLu #5-1 JingMeng GaoKe Bldg. A, Suite 201 Beijing, China 100085 Tel: +86.10.6296.8780 Fax: +86.10.6296.5943 www. wwcoms.com ww-wp-taos-vc-r-1