Frame accurate Compressed Domain Splicing

Frame accurate Compressed Domain Splicing Dr Kevin W. Moore (Mediaware International) Abstract Is the seamless frame accurate splicing of compressed digital video a myth or is it madness? Many believe that it is impossible to seamlessly and frame accurately splice together two streams of long GOP MPEG 2 or MPEG 4 AVC video without taking both streams back to baseband and re encoding or rendering to all I frames, since both these formats rely heavily on temporal compression. In this paper we show how splicing can occur on any frame not just I frames, how to fix frame dependencies which are broken during splicing by frame type conversion and re coding, and how to ensure that video buffer verifier (VBV) constraints are preserved by massaging the video bit rate in a region around the splice point. Compressed domain splicing technology offers many potential benefits to broadcasters including improvements to video and audio quality, streamlining technical architecture and workflow, and a significant reduction in cost when properly implemented. To show how compressed domain splicing can be used in a real world broadcast network, a case study is presented describing how Prime Television deployed a compressed domain splicing solution to enable them to deliver locallytargeted advertising and region specific content in each of their markets for both their SD and HD services. Introduction Compressed domain splicing is the process of digitally switching from one compressed digital video signal to another without decoding either signal back to base band (uncompressed). Seamless frame accurate compressed domain splicing is desirable in a commercial broadcast environment because it can reduce costs, streamline operations and improve the quality of the digital broadcast. Compressed domain video splicing can occur between two live video streams, between a live stream and a file stored on a server, and between two files stored on a server (during play out). The diagrams below show several basic of the basic configurations. The perception that MPEG 2 and MPEG 4 AVC video streams can t be spliced or edited frame accurately at any frame has been a popularly held belief since their adoption despite the fact that

theoretical papers describing how long GOP MPEG 2 could be natively edited have been around since 1995, shortly after the MPEG 2 standard was published. Theory aside, frame accurately splicing MPEG video streams seamlessly without going back to baseband is not easy and there are many pitfalls. While compressed domain editing has been around commercially since the late 90's, the technology has not become available in main stream products until recent years. Even today, most Digital Program Insertion (DPI) systems on the market require the original content to be encoded with special encoders that include ad insertion markers in the MPEG 2 transport stream to trigger splicing at pre determined frames in the video stream. So why is it hard? It is hard because unlike base band video signals, MPEG 2 and MPEG 4 employ interframe coding. They exploit the temporal redundancies in the video frames by storing the differences between frames rather than storing every frame in its entirety. These dependencies make it difficult to splice at any frame since by doing so may break the encoding structure and prevent the clean decoding of frames around the splice point. It is hard because the MPEG standards define buffer models that place constraints on the dynamic behavior of the video stream. These buffer models are required because the number of bytes used to encode each frame within the video stream may vary significantly depending on frame type and scene content and therefore must be buffered appropriately by the decoder. When splicing or editing MPEG streams, if the dynamic buffering behavior is not properly handled, downstream decoder s may exhibit jerky playback when their buffers overflow or underflow. This paper reviews a number of different solutions for dealing with frame dependencies and managing buffer levels in a hope to convince the reader that frame accurate compressed domain splicing of MPEG video is possible. To convince the reader that it is also a practical and cost effective alternative to base band switching, a case study with Prime Television will be presented. This case study will describe how Prime Television integrated a HD compressed domain splicing solution into their existing regional broadcast network. Frame-accurate Splicing Unlike base band video signals where every video frame is independently decodable and discrete, both MPEG 2 and MPEG 4 encoding employ interframe coding where different frames are stored as difference frames from their past and/or future frames (in display order). In MPEG 2, three frame types are defined: Intra Coded Pictures (I Pictures) are coded without reference to other pictures and only provide moderate compression. Predictive Coded Pictures (P Pictures) are coded more efficiently using motion compensated prediction from a past I or P frame. I B B P

Bidirectionally predictive Coded Pictures (B Pictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation. I B B P I B B P For MPEG 2, the typical frame pattern uses a half second GOP (12 frames for 25fps video, 15 for 29.97/30fps video) as shown. GOP (display order) P B B I B B P B B P B B P B B P B B I Splicing or cutting the video at either a P or B frame will break the dependencies. The only safe place to splice in an MPEG stream is at the I frame beginning a closed 1 Group of Picture (GOP). If the splice points are known in advance then a process known as I frame insertion can be used. In I frame insertion, the encoder is told which frame the splice will occur at ahead of time, and will alter the frame structure of the MPEG Video during encoding to ensure that an I frame and closed GOP occurs at the splice point. Markers or flags in the transport stream pre empt the splice point to ensure the splice takes place at the appropriate time. The main limitation of this approach is that the encoder has to be under the control of the system inserting the interstitial content and that the splice points are known in advance, which is not always the case. When splice points are not known in advance, a common solution is to wait and switch at the next available I frame. The I Frame switching model is not highly desirable as it does not guarantee that the switch is exactly on a program or scene boundary. While these approaches are able to replace a base band switching system under some circumstances, their limitations prevent them from being more widely deployed. When splicing two arbitrary but compatible streams together, the most likely case is that the splice points do not line up with I frames as per the following diagram: 1 2 B P B B P B B P B B I B B P B B P B B P B B P B B I B B B I B B P B B P B B P B B P B B I B B P B B P B B P Splice point 1 A closed GOP means that frames from the current GOP cannot reference any frames from the previous GOP

If the streams are spliced at this point, then the resulting stream will contain a number of broken frames as shown in the following diagram. Broken Frames B P B B P B B P B B I B B B P B B I B B P B B P B B P Splice point There are a number of published approaches that range from straight re encoding the broken frames in the region of the splice point, to smart techniques using frame type conversion that re use many of the original compression information to minimize image degradation. The following diagram shows several solutions to the splice problem above. Re coded/structured Frames B P B B P B B P B B I P P I P B B I B B P B B P B B P B P B B P B B P B B I B B PB B P I B B P B B P B B P B P B B P B B P B B I B B I B B P I B B P B B P B B P While any decoding and re encoding introduces loss, a properly designed splicing engine will minimize the image degradation around the splice point, and frames outside the region of the splice will suffer no loss of image quality; unlike base band splicing solutions which require full decode and re encode of the streams. One of the main unspoken issues with re coding splice points is that the coding efficiency of the new region is invariably reduced from the original structure. Any new structure requires the creation of a new I frame and the creation of shorter GOPs. Since B and P frames are more efficient than I frames, reducing the ratio of B and P frames to I frames reduces the coding efficient of the video in the region of the splice. Increasing the bits used to represent the new GOP will counter the reduction in coding efficiency however, as the next section shows, increasing the bits may cause buffer issues in downstream decoders which result in playing issues.

Splice Point 1 2 Shorter less efficient GOPS Video Buffer Constraints The MPEG 2 standard defines the Video Buffer Verifier (VBV) requirement on the video bit stream and the MPEG 4 AVC standard defines a similar Hypothetical Reference Decoder (HRD) model. In both standards, the video bit stream s dynamic behavior must conform to the constraints imposed by these buffer models. Bytes are fed into these buffers at a certain rate (bit rate) and all the bytes for each video frame are removed from the buffer at the decoding time. If the buffer never overflows or underflows over the duration of the video sequence then the video is well defined. If overflow or underflow does occur, then downstream decoders may suffer from playback issues. Normally the rate control mechanism of the encoder is in effect for the entire duration of the video sequence to ensure that the buffer model parameters are followed. When splicing two video streams together from possibly different encoders however, the integrity of the original rate control mechanism may be lost. For MPEG 2, I frames are typically several times larger than P frames, which in turn are several times larger than B frames. For scenes which very little movement, the best coding quality is obtained by using larger I frames and smaller P and B frames. For scenes with lots of movement, using smaller I frames and larger P and B frames produces the best quality. The following diagram shows a typical decoder buffer level for an MPEG 2 video stream using an IBP structure and GOP size of 12. I Frame P Frame Buffer Level B Frame Time

If this sequence is spliced into another sequence mid way through the GOP then the decoder buffer may underflow as shown in the diagram below, if the next sequence starts with a large I frame. Second sequence causes underflow of the buffer Buffer Level Time By massaging the bit rate of the frames around the splice point, this underflow can be avoided, as shown below. Massage the bit rate of some neighborhood of frames to prevent underflow Buffer Level Time Whether the entire splice region is re encoded or smart frame restructuring is performed, it is critical that the VBV/HRD video buffer levels are taken into account and managed. One additional note is if the size of the GOP resulting from the splice is small, the reduction in coding efficiency may mean that any attempt at reducing the bit rate results in undesirable quantization artifacts. Under these conditions, the short GOP can be merged with the neighbouring GOP (turning that GOP s I frame into a P frame) which will improve the coding efficiency and resulting image quality. Splicing in a commercial environment For a frame accurate compressed domain splicing system to be successfully deployed into a broadcast environment it must address a number of engineering issues. If the coding profiles and bit rates of the splicing sources do not match the system has to be able to convert one source. Typical conversions include: Aspect ratio conversion; changing SD to HD or HD to SD

Matching bit rates (transrating) Transcoding and replicating audio streams If either source carries ancillary streams or service information these must be spliced or passed through as required. For instance, the teletext stream can be passed through from the primary source, follow the spliced source, or both (e.g. the Teletext captions can be spliced from the current spliced source and the remaining Teletext channels spliced from the primary source). It must be able to interface with existing automation systems so they can control precisely when they splice and to receive status. It must be able to support the workflows of existing base band switching systems. Prime HD Television Case Study To demonstrate how compressed domain splicing can be practically used, a case example will be described; a splicing system was installed at Prime Television in late 2008 to provide a cost effective move to regionalised High Definition. The business objectives for Prime were to add high definition services in line with their centralised operation and existing market regionalisation capability; without significantly impacting operational cost or operational workflow. The key requirements for project success were: To improve the quality of the Direct To Home Service with the addition of regionalised HD services To provide a simple mechanism for the addition of new commercial markets To add minimal additional operational expense Prime s original architecture performed ad insertion by first decoding the content, switching in baseband, and subsequently re encoding for transmission. The major draw back with this approach is that the equipment required is costly; requiring the presence of decoders, encoders, video servers, and base band up converters in each market. In addition, the distribution feed has to be encoded at a higher bit rate than necessary for broadcast in order to offset the quality loss due to the decode and re encode cycle. The combined expense of distribution bandwidth, compression hardware and baseband infrastructure limited the use of local ad insertion to only those markets that could justify

the cost. These costs are multiplied for each new SD service introduced and higher again for new HD services. Prime however would not consider a compressed domain splicing architecture a viable option unless the insertion and switching paradigm which their operators were use to working with could be supported. In simple terms, the splicing solution had to replicate their existing base band operation to be a success. To meet the projects objectives, frame accurate compressed domain splicers were deployed into all Prime s local markets near the transmitters. Conversion servers were deployed to take the ad content from the existing SD Video Server/Library and conform it to match the HD Broadcast profile. A centrally controlled schedule is used to coordinate the splicing of local ads and region specific linear content directly into the main network feed. Individual splicers are controlled through a TCP/IP interface using proprietary protocol (ASCP Asynchronise Splicer Control Protocol) derived from traditional Video Disk Control Protocol (VDCP) & GVG TenXL switcher protocols. The transcoding of the protocols is b directional thereby providing feedback into the automation using the traditional protocols without any automation changes required. ASCP provides a command set to initiate splices and return the status and contents of the local library of interstitial content. Splice commands are referenced against the SMPTE Timecode carried in the video, allowing the splicer to frame accurately insert material into the primary stream. ASI IP Existing SD Video Server/library HD Service MPTS SD to HD Conversion Server Splicer SPTS Automation Interface HD TS Files This solution allowed Prime to architect their network in such a way as to deliver simulcast locallytargeted advertising and region specific content in each of their markets for HD services matching the SD service enabling them to maintain continuity without compromising their operational flexibility and workflow. For viewers this approach has provided a traditional feel when watching HD with seamless delivery of program and short from material.

Prime, based on the experience of the HD solution, have now commenced deployment and validation of a new platform to replace the SD baseband insertion that will introduce both MPTS and multiple ASI switching. Future automation integration with the splicers will be via direct ASCP, removing the need to have protocol translators from traditional broadcast devices. Conclusion This paper has identified and presented solutions to the key challenges in performing seamless frame accurate compressed domain splicing of MPEG 2 and MPEG 4 AVC streams. It has shown how frame dependencies which are broken during the splice can be corrected by re coding and restructuring GOPs at the splice point. It has highlighted the potential problems with video buffer levels and shown how these can be addressed by massaging the bit rate in a region around the splice point. To further show that seamless frame accurate compressed domain splicing is a practical and cost effective solution in a commercial broadcast environment; a case study was presented showing how Prime Television deployed a splicing solution to enable them to deliver locally targeted advertising and region specific content in each of their markets for both their SD and HD services. Bibliography [1] ISO/IEC 13818 MPEG 2 Standards [2] ISO/IEC 14496 MPEG 4 Standards [3] Jianhao Meng, Shih Fu Chang, Buffer Control Techniques for Compressed Domain Video Editing, IEEE Proc. International Symposium on Circuits and Systems, ISCAS Vol.2 pp. 600 603, 1996. [4] R. Egawa, A. A. Alatan, and A. N. Akansu, Compressed domain MPEG 2 video editing with VBV requirement, IEEE Proc. Internal Conference on Image Processing, ICIP, Vol. 1, pp 1016 1019,2000 [5] Akio Yoneyama, Yasuhiro Takishima and Yasuyuki Nakajima, A Fast Frame accurate H.264/MPEG 4 AVC Editing Method, IEEE International Conference on Multimedia and Expo, ICME Vol. 6, pp 1298 1301, 2005 [6] K. Talreja, P. V Rangan, Editing techniques for MPEG multiplexed streams, IEEE International Conference on Multimedia Computing and Systems '97, pp 278 285, 1997 [7] P. J. Brightwell, S. J Dancer and M. J Knee, Flexible Switching and Editing of MPEG 2 Video Bitstreams, International Broadcasting Convention (IBC 97), IEE Conference Publication pp 547 552.