Full nteractive Functions in MPEG-based Video on Demand Systems Kostas Psannis Marios Hadjinicolaou Dept of Electronic & Computer Engineering Brunel University Uxbridge, Middlesex, UB8 3PH,UK Argy Krikelis Aspex Technology Ltd Abstract: - n this paper, we present an efficient method for supporting full functions of MPEG- video stream. This method is based on storing multiple different encoded versions of the same movie at the server. A normal version is used for normal playback, while several versions are used for Fast/Slow/Jump Play. nteractive versions are produced by encoding every N-th frame of the original uncompressed video using different GOP pattern of the normal version. The effectiveness of our approach in terms of reducing the allocated bandwidth of the stream is evaluated through extensive simulations. Moreover the visual quality during the mode is analyzed and the effect of re-qunatization error is examined. Key-ords: - MPEG- Full nteractive Functions/Operations --VCR control operation. ntroduction The maturing of video compression technologies, magnetic storage subsystems, and broadband networking has made Video On Demand (VOD) over computer networks more visible than ever. To improve the marketability of VOD services and accelerate their wide-scale deployment, these services must support user interactivity at affordable cost. Attempts to make television are almost as old as television itself. nteractive television has different meanings for different people. Some people see the choice from many different channels already as interactivity. On the other hand some other would require the selections of a local video store to be available in real time, ondemand, through a broadband connection. t is generally believed that program is a program or service in which the content itself, or the presentation manner of the content or even the presentation order of the content can be affected by the viewer. Near nteractive Video On Demand service support limited number of functions, reducing the Quality of Services (QoS) of clients interactivity. To improve the marketability of VOD service, the client should interact with the content of the presentation deciding the viewing This research was supported by British Council Corresponding Author: K E Psannis Email: Kpsannis@hotmail.com schedule with the full range of functions. This gives the desired multilevel QoS to the client. Full range of functions include play/resume, stop, pause, Jump Forward (JF)/ Jump Backward (JB), Fast Forward (FF)/ Fast Rewind FR), Slow down (SD), and Slow Reverse(SR), Rewind. The difficulty of supporting interactivity varies form one function to another. A stop or pause followed by resume are relatively easy to support since they don t require more bandwidth than what is already allocated for normal playback. On the other hand, Fast Scanning {(FF), (FR), (JF), (JB)) involves displaying frames at multiple times the normal rate. Transporting and decoding frames at such rate is prohibitively expensive and is not feasible with today s hardware decoders. n addition, the implementation of the full VCR functionality with the MPEG coded video in not a trivial task. MPEG video compression is based on motion compensated predictive coding. The MPEG structure [] allows straightforward realization of the forward normal play, but imposes several constraints to other trick mode. Several approaches have been proposed to support interactivity in a VOD system. nteractive functions can be supported by dropping parts of the original MPEG- video stream [], [3]. Typically, dropping is performed after compression and aims to reduce the transport and decoding requirements with
causing significant degradation in video quality. Alternatively functions can also be supported using separate copies of the movie that are encoded at lower quality of the normal playback copy [4], [5]. Other conventional schemes that support functions either display frames at rate much higher than the normal playback, for example 90 fps [6] or involves downloading the video data in a player device (not real time playdownloading can be done prior to viewing) located at the customer premises so that the customer can view with further intervention form the network [7]. Moreover functions can be supported by transmitting additional data of the same movie at the server using additional storage at the customer premises [8], [9]. Rewind operations can be supported by transcoding the predicted coded at the client [0]. Note that none of the methods mentioned above support full operations with minimum additional resources. n this work, we introduce an efficient approach for supporting full range of functions in a Video on Demand (VoD) system. Our method is based on storing multiple different encoded versions of the same movie at the server. nteractive versions are produced by encoding every N-th frame of the original video using different GOP pattern of the normal version. The rest of the paper is organized as follows.. n Section we describe the generation of the bitstream. Section 3 presents the rate control schemes for the bitstream. Section 4 analyzes the network bandwidth allocation for the normal playback and the mode. Section 5 describes the storage overhead. n Section 6 the full range of functions are presented. n section 7 the visual quality is analyzed through extensive simulations. Section 8 presents the effects of re-quantization error. Finally, conclusions and some points for open research are given in Section 9. Preprocessing of Video Movies To support functions, the server maintains multiple, different encoded versions of each movie. One version, which is referred to as the normal version, is used for normal-speed playback. The other versions, which are referred to as version, are used for fast play. Each version is used to support Fast/Jump Forward/Backward Slow Down/Reverse and Reverse at a given speedup. The server switches between the various versions depending on the requested function. Note that only one version is transmitted at a given instant time. The corresponding version is obtained by encoding every N-th (i.e., uncompressed) frame of the original movie a sequence of - P(M)- frames ( N = variable, M = ). Effectively this results in repeating the previous -frame in the decoder, enhancing the visual quality during the mode. t is necessary to perform the alternatively stream in GoP structure in an independent fashion. This GoP should perform the proper bit rate and frame rate in order to confirm with the goal that there is neither additional bit rate during the mode nor an increased in the complexity of the decoder module respectively. 3 Rate Control Schemes A common approach to control the size of an MPEG frame is to vary the quantization factor on a perframe basis. This result in variable video quality during the Fast Forward/Rewind mode (however the quality is still constant during the normal version). The amount of quantization may be varied. This is the mechanism that provides constant quality rate control. The quantized coefficients QF( u, are computed from the DCT coefficients, the quantization scale MQUANT, and a quantization_matrix ( u,, according to the following equation 6 F( u, QF( u, = () MQUANT ( u, The quantization step makes many of the values in the coefficient matrix zero, and it makes the rest smaller. The result is a significant reduction in the number of coded bits with no visual apparent differences between the decoded put and the original source data. The quantization factor may be varied in two ways Varying the quantization scale ( MQUANT ) Varying the quantization matrix ( ( u, ) To bound the size of transcoded -frames of the mode, the encoder uses two predefined (higher-lower) thresholds. An -frame is re-encoded such that its size is between lower upper Threshold transcoded Threshold () After an -frame of a scan version has been reencoded as an -frame the encoding algorithm
checks whether the size of the compressed frame is between the two pre-defined thresholds. f it is not, then the quantization factor for the corresponding frame is increased/decreased and the frame is reencoded. Two different schemes can be used to initialise the quantization factor when an -frame is to be reencoded. n the first scheme, when an -frame is to be re-encoded for the first time the encoder starts with a fixed quantization factor. The main problem with this method is that the quantization value might be kept unnecessarily high resulting in low quality during the operations. Moreover the encoder in this scheme uses one predefined upper threshold ( transcoded Threshold ). n the second scheme, the encoder tries to track the nominal quantization value, which was used in encoding the same type of frame in the normal version. n the first encoding attempt, the encoder starts with the nominal quantization value that was used to encode the same -frame of the normal version. After the first encoding attempt, if the resulting frame size is between the two pre-defined thresholds (), the encoder proceeds to the next frame. Otherwise, the quantization factor {quantization matrix ( ( u, } varies and the same frame is re-encoded. The quantization matrix can be modified by maintaining the same value at the near-dc coefficients but with different slope towards the higher frequency coefficients. This procedure is repeated until the size of the compressed frame is between the two predefined thresholds (). The advantage of this scheme is that it tries to produce operations with the same constant quality of normal playback, but when it is not possible it minimizes the fluctuation in video quality during the mode. 4 Network Bandwidth Allocation Since the network bandwidth usually is the highest concern, the video during the normal version is coded with all -P-B frames in order to achieve high compression ratios for the transport over the network with minimum bandwidth resources. n addition it is desirable to support operations with minimum requirement in the network bandwidth. 4. Normal Playback Assume that the MPEG- file has a playback rate of 30 frames per second. However, there are no hardware or software MPEG decoders that can run faster than 30 frames per second. For our experiment we used an MPEG- encoder developed by the Software Simulation Group (SSG) []. e encoded 80 frames of the football clip. The Group of Pictures (GoPs) format was N=5 and M=3. Therefore we had a larger - or P- frame, followed by two smaller B-frames. Figure depicts the bit rate allocation for the football sequence. Size (bytes) 35000 P-frames 30000 B-frames 5000 0000 5000 0000 5000 0 0 50 00 50 00 Frames Number -frames Figure : Bit rate allocation for the football sequence According to Figure the bandwidth during the normal playback is given by 30fps Average(PB)Size 8bits =. 98MBps byte 4. nteractive mode The total number of -P(M) frames in the mode can be computed as follows TF = N nteractive, P( ineractive (3) TF is the number of -frames ( = ) N N = 5, are the new ( = re-encoding parameters. The bit rate allocation during the mode depends on the proposed encoding scheme. Figure depicts the bit rate allocation for the proposed rate control schemes. Size (bytes) 6000 4000 000 0000 8000 6000 4000 000 0 st-scheme nd-scheme 0 0 0 30 40 50 60 Frame Number Figure : Frame sizes of bitstream
st Scheme { w ( u, = fixed } Threshold = higher transcoded The bandwidth is given by 30fps Average(P) Average(P) Size = N Size 8bits = 0.53MBps byte + P nd Scheme { w ( u, = variable} 9 transcoded 0 The bandwidth is given by 30fps Constant (P) x ( M i N ) i Size 8bits =.3MBps byte 5 Storage Overhead at the server Generate separate copies for operations come at the expense of extra storage at the server. Let TF be the number of frames in the normal version. The storage requirements of the normal version and the mode are given by server = TF Average( PB) Size (4) nteractive = TF ( P( Size (5) ExtraStora ge, P( ) Average{P(M)} ( P( ) Size = Constant ( P(M) ), st - algorithm nd - algorithm Using (4) and (5) the following ratio can be derived. server ExtraStorage N Average( PB) = (6) N ( P( )Size Table shows the increased ratio using equation (6) and typical characteristics of MPEG- compressed video (Figure 5). server ExtraStorage st scheme nd scheme 6.68 5.66 Table :ncreased ratio for the two rate control schemes. Note that for n- versions the relative increase in the storage overhead at the server is given by n i= ) ExtraStorage ( N normal 6 Full range of nteractive Functions To improve the marketability of VOD service, the client should interact with the content of the presentation deciding the viewing schedule with the full range of functions. Fast Forward /Rewind is an operation in which the client browses the presentation in the forward /backward direction with normal sequences of pictures. This function is supported by abstracting all the -frames in the forward /backward order of the normal stream Jump Forward /Backward is an operation in which the client jumps to a target time of the presentation in the forward /backward direction with normal sequence of pictures. Therefore the users jump directly to a particular video location. Jump Forward /Backward operation is supported by skipping forward / backward some -frames of the normal stream. Rewind operation can be supported by abstracting all the -frames in the reverse order and generate P(M) frames as many as P- and B- frames in a Group of Pictures (GoP) of normal playback ( N = N ). Slow Down (SD) is an operation in which the video sequence is presented forward with a lower playback rate. This function can be supported by abstracting all the - frames in the forward order and generate P(M) frames as many as P- and B- frames in a Recording Ratio (RR) of normal playback ( N = RR ).
Slow Reverse is an operation in which the video sequence is presented backward with lower playback rate. This function can be supported by abstracting all the -frames in the backward order and generate P(M) frames as many as P- and B- frames in a double Group of Pictures (GoP) of normal playback ( N = N ). P(M) frames are originated by implementing motion compensation between same frames, generating Marionette P(M)-frames (0 bits). P( Thus the put rate R and R are given by R = Q[ DCT( xn )] (7) P( R = Q [ DCT( e )] (8) n 7 Analyze the Visual Quality e measure the visual quality of the mode using the Peak Signal-to-Noise ratio (PSNR). The two approaches can be constrained with respect to video quality using the (PSNR). e use the PSNR of the Y-component of a decoded frame. The PSNR is obtained by comparing the original raw frame with its decoded version with encoding being done using the proposed schemes. Figure 3 depicts the PSNR values for motor race movie. Both schemes achieve acceptable visual quality because the PSNR is sufficiently large. The PSNR value for the 60- frames is 46.03 db for the first scheme and 47.06 db for the second scheme, i.e., the quality is slightly better under the second scheme. PSNR (db) 5 49 47 45 43 4 39 Motor Race Trace 0 0 0 30 40 50 60 Frame ndex Figure 3: PSNR for encoded frames in the mode st-scheme nd-scheme 8 Re-quantization error f the server has only the forward bitstream (i.e., the original sequence is unavailable), we can decode the forward bitstream and re-encode (every N-th) the video sequence using different encoding structure (- P(M)- frames). The -frames (N-th frames) are transcoded (decode/re-encode), using the simplified transcoder (Figure 4). ts lower complexity is achieved by two factors. -frames are encoded and decoded with reference to any another frame. Figure 4: Proposed simplified transcoder The decoded x n pictures and the corresponding prediction error en are given by x = DCT[ Q ( )] (9) n R in e ( n = xn M P xn M ) (0) Considering the ortho-normality of DCT following equations can be derived R = Q[ Q ( Rin )] () P( R = Q[ Q ( Rin ) DCT( M P ( xn M )) () M ( P x n is the motion compensation function. Since the first quantization is performed independently of subsequent re-quantization, then the re-quantization error cannot be avoided []. The effect of the re-quantization error is depicted in Figure 5. The figure shows the difference PSNR of - P (M)- frames between the following bit streams Encoded-only {Encode every N-th frame of the original uncompressed video sequence as a sequence of - P(M)- frames using the two rate control schemes} Transcoded {Transcode (encode/decode/reencode) the intra coded frames as a sequence of - P (M) frames using equations () and () and the two rate control schemes}
Difference PSNR (db),9,8,7,6,5,4,3,, Re-quantization Error st-scheme nd-scheme 0 0 0 30 40 50 60 Frame ndex Figure 5: PSNR difference due to re-quantization error for the two rate control schemes From Figure 5 it can be concluded that the requantization error leads to a drop in picture quality of ab.4 db in all - P(M)- frames for the st rate control scheme and approximately.7db for the nd rate control scheme. 9 Conclusions n this paper, we presented an approach for supporting operations in a Video On Demand System. Full range of operations are supported by storing multiple different encoded versions of the same movie at the server. The corresponding version is obtained by encoding every N-th (i.e., uncompressed) frame of the original movie a sequence of - P(M)- frames. Effectively this results in repeating the previous - frame in the decoder. Two rate control algorithms have been proposed to bound the size of the stream. Simulation results show that the visual quality of the secondary bit stream is in acceptable quality for the two rate control algorithms. Future research will focus on applications of the proposed concept to MPEG-4 stream. References: [] nternational Standard 388-.nformation Technology- Genetic Coding of Moving Pictures And Associated Audio: Video [] Banu Ozden, Alexandros Biliris, Rajeen Rastogi, Avi Silberschatz. A Low-Cost storage Server for Movie on Demand Databases. n: Proc. of the 0 th VLD Conference, Santiago, Chile, 994.pp 594-605 [3] HJ Chen, A. Krishnamurthy, TDC Little, and D. Venkatesh, "A Scalable Video-on Demand Service for the Provision of VCR-Like Functions," Proceedings EEE Multimedia, 995, p. 65-7. [4] Marwan Krunz, George Apostolopoulos. Efficient Support for scanning operations in MPEG-based video on video on demand. Multimedia Systems, vol.8, no., Jan. 000, pp.0-36 [5] Prashant J.Shenoy, Harrick M.Vin. Efficient Support For Scan Operations n Video Servers. n: Proc. ACM Multimedia Conference, November 995. ACM Press, San Francisco, CA, pp 3-40 [6] Jayanta K Dey-Sircar, James D.Salehi, James F Kurose, Don Towsley. Providing VCR Capabilities in Large-Scale Video Servers n: Proc. ACM Multimedia Conference, Oct 994, San Francisco, CA, pp 5-3 [7] M-S.Chen, D.D. Kandlur. Downloading and Stream Conversion: Supporting nteractive Play of Video in a Client Station. n: Proc EEE Multimedia Conference (995). [8] Kostas Psannis, Marios Hadjinicolaou Transmitting Additional Data of MPEG- Compressed Video to Support nteractive Operations. n: Proc. EEE SMP0, Hong Kong, May 00 pp 308-3 [9] Kostas Psannis Marios Hadjinicolaou A New Methodology for mplementing nteractive Operations of MPEG- Compressed Video, n Proc EEE Second nternational orkshop on Digital and Computational Video, Tampa, USA, Feb00, pp -8 [0] Kostas Psannis, Marios Hadjinicolaou Support Backward nteractive Functions of MPEG Stream, n Proc: SES Evolution Computation, Feb00, Switzerland. [] MPEG Software Simulation Group, MPEG- Encoder/ Decoder Version. ftp://ftp.mpeg.org/pub/mpeg/mssg [] P. Assuncao and M. Ghanbari, "A frequencydomain video transcoder for dynamic bit-rate reduction of MPEG- bitstreams, " EEE Trans. Circuits Syst. Video Technol., Dec. 998