DISCOVER Monoview Video Codec Fernando Pereira Instituto Superior Técnico, Portugal on behalf of the DISCOVER project DISCOVER Workshop on Recent Advances in Distributed Video Coding November 6, 007, Lisboa - Portugal
Outline. DVC Before DISCOVER. PRISM Solution (Univ. Berkeley). Feedback-channel based Solution (Univ. Stanford). DISCOVER Threats and Opportunities. Distributed Video Coding: The Challenges. Promising Applications. DISCOVER Monoview Video Codec. Architecture. Problems to Address. Encoder Modules 4. Decoder Modules 4. DISCOVER Performance 5. DISCOVER Future
DVC Before DISCOVER
The DVC World in 004 PRISM (Power-efficient, Robust, high compression Syndrome based Multimedia coding) solution developed at Univ. Berkeley by Prof. Ramchandran s team. 004 Feedback-channel based solution developed at Univ. Stanford by Prof. Girod s team. 4
PRISM: Encoder Encoder: Divides frame n in blocks. Selects skip/intra/wz coding with different syndrome modes based on Frame Differences. For WZ blocks, sends a syndrome and a CRC (hash). 5
PRISM: Encoder LSB MSB Channel Code Encoding of a WZ block: Compute DCT. Syndrome on LSB of low frequency coefficients. CRC for these low frequency coefficients. Conventional coding for high frequency coefficients. Position of low vs high frontier depends on correlation strength. 6
PRISM: Decoder Decoder: For every WZ block in frame n: Motion search at the decoder by triying every Side Information candidate block in frame (n-). Corrects the (lower frequency) DCT coefficients with the syndrome. Check if the CRC is correct. Keeps the first predictor that provides a correct CRC. Joints the high frequency coefficients which have been conventionally coded. 7
Feedback-channel Solution: Encoder Encoder: Creates groups of frames with one key frame and (N-) WZ coded frames. For every WZ frame: DCT + bitplanes (T-domain) or bitplanes of pixel values (pixel domain). Bitplanes are fed to a turbo encoder and parity bits are generated to be send on (decoder) request. 8
Feedback-channel Solution: Decoder Decoder: Constructs an estimation of the WZ frame using motion compensated interpolation using previous (n-) and next (n+) key frames sent conventionally (if GOP size=). Corrects the bitplanes of the estimation using the received parity bits and a noise correlation model. Requests more parity bits to the encoder through the feedback channel if necessary to converge the decoding process. 9
Pros and Cons PRISM (Univ. Berkeley) Block base approach No need for a feedback channel for rate control A fixed high number of bits per WZ coded block Encoder more complex (mode decision) Decoder more complex (motion search) Feedback-based codec (Univ. Stanford) Finer rate control Simpler encoder and decoder (?) Frame based approach Feedback channel, latency 0
DISCOVER Threats and Opportunities
The Challenges The Conceptual Challenge The Coding Efficiency Challenge The Complexity Challenge The Error Robustness Challenge The Scalability Challenge The Multiview Video Challenge
Conventional Coding is a Winner
Emerging Challenges Applications (from down-link to up-link) Wireless digital video cameras Multimedia mobile phones and PDAs Low-power video sensors and surveillance cameras Wireless video teleconferencing systems Requirements Light and flexible distribution codec complexity Robustness to packet/frame losses High compression efficiency Low latency Target Inter coding efficiency Intra coding complexity (encoder) Intra coding robustness Heavy encoder light Transcoding Light decoder 4
DISCOVER Studied Applications. Wireless Video Cameras. Wireless Low-Power Surveillance. Visual Sensor Networks 4. Networked Camcorders 5. Distributed Video Streaming 6. Mobile Document Scanner 7. Video Conferencing with Mobile Devices 8. Mobile Video Mail 9. Disposable Video Cameras 0. Multiview Image Acquisition. Wireless Capsule Endoscopy 5
DISCOVER Monoview Codec 6
Selecting an Architecture PRISM (Univ. Berkeley) Block base approach No need for a feedback channel for rate control A fixed high number of bits per WZ coded block Encoder more complex (mode decision) Decoder more complex (motion search) Feedback-based codec (Univ. Stanford) Finer rate control Simpler encoder and decoder (?) Frame based approach Feedback channel, latency 7
Selecting an Architecture PRISM (Univ. Berkeley) Block base approach No need for a feedback channel for rate control DIFFICULT TO OBTAIN DETAILED SPECIFICATION A fixed high number of bits per WZ coded block Encoder more complex (mode decision) Decoder more complex ( motion search) September 007 Feedback-based codec (Univ. Stanford) Finer rate control Simpler encoder and decoder (?) A BIT LESS DIFFICULT TO OBTAIN DETAILED SPECIFICATION SOFTWARE IMPLEMENTATION AVAILABLE (from IST and VISNET) Frame based approach Feedback channel, latency 8
DISCOVER Architecture Wyner-Ziv Encoder 8 Wyner-Ziv Decoder a b c d 8a 8b 8c 8d T Q Channel Encoder Buffer Channel Decoder Decoder Succ. / Failure Q - and Reconst. T - e Minimum Rate Estimation 7 7b Soft Input Computation 7a T 6 Virtual Channel Model 4 5 WZ and Conventional Video Splitting Conventional Video Encoder Conventional Video Decoder Side Information Extraction Based on the feedback-channel solution from Univ. Stanford. Based on a split between Wyner-Ziv (WZ) and key frames. Key frames used with a regular (GOP size) or dynamic periodicity. Key frames coded with H.64/AVC Intra. 9
Main Problems to Address Elimination of architectural limitations Coded key frames (not lossless) No original frames for decoder request control No original frames at decoder for correlation noise modeling Efficient exploitation of temporal correlation at encoder by controlling the GOP size. Improvement of the accuracy of the side information interpolation/extrapolation. Improvement of the accuracy of correlation noise estimation at decoder. Elimination or reduction of feedback-channel usage through encoder or hybrid rate control. 0
Encoder Modules: Adaptive GOP Size To better exploit the temporal redundancy in the video, the encoder performs GOP length selection depending on the motion activity in the sequence: High motion low correlation smaller GOP sizes Low motion high correlation longer GOPs sizes
Encoder Modules: Adaptive GOP Size To perform GOP size control, it is proposed to: Measure at the encoder the amount of motion in a video sequence using adequate (low complexity) metrics. Perform hierarchical clustering of motion activity data - group frames which accumulate less motion using four (simple) motion activity metrics.
Enc. Modules: Transform and Quantization Transform: Wyner-Ziv frames are transformed using a 4 4 Discrete Cosine Transform, the one from H.64/AVC, whose coefficients are organized in (4 4) 6 bands. Independent Quantization: Each DCT band is quantized separately using a predefined number of levels, depending on the target quality for the WZ frame. DC Quantization: A uniform scalar quantizer is used for the DC band, assuming the data range. AC Quantization: For AC bands, a dead-zone quantizer with doubled zero interval is applied. The dynamic data range is calculated separately for each bth band, b>, to be quantized, and transmitted to the decoder in the coded bit stream. Bitplane Coding: The quantization indices of each DCT band b are then organized in bitplanes and fed to the channel encoder.
Encoder Modules: Channel Coding Turbo Codes Turbo Encoder identical Recursive Systematic Convolutional (RSC) encoders. Pseudo-random interleaver. Puncturing for lower rates. Turbo Decoder Two Soft-Input Soft-Output (SISO) decoders. Maximum A Posteriori (MAP) algorithm. Laplacian distribution to model the X,Y correlation. LDPC (Low-Density Parity-Check) Codes LDPC Accumulate (LDPCA) codec as developed by D. Varodayan, et al. in Rate-Adaptive Codes for Distributed Source Coding, EURASIP Signal Processing Journal, Special Issue on Distributed Source Coding, pp. - 0, vol. 86, nº, Nov. 006. 4
Encoder Modules: Minimum Rate Estimator To reduce the number of requests to be made by the decoder (with a strong impact on the decoding complexity), the encoder can estimate a minimum number of accumulated syndromes to be sent per bitplane and per band. The DISCOVER codec solution is based on the Wyner-Ziv ratedistortion bound for two correlated Gaussian sources which defines the minimal rate at which one source (X) can be transmitted at a given distortion D X, to be, where σ is the variance of the correlation noise between the two sources, given that the second source (Y, the Side Information) is known perfectly at the decoder. A separate rate for each bitplane can be obtained by estimating the reduction of distortion brought by each bitplane with respect to previously decoded bitplanes (for each band). σ is a parameter of the noise correlation channel model, which is estimated at the decoder side and sent back to the encoder via the feedback-channel. 5
Encoder Modules: Encoder Rate Control The DISCOVER codec assumes Decoder Rate Control based on a feedback channel but In some applications, the feedback channel is not available. The feedback channel introduces delay in the system. So, it may be important to perform efficient Encoder Rate Control (ERC) for transform domain (TD) WZ video coding. 6
Encoder Modules: Encoder Rate Control An estimate of the SI frame is generated at the encoder using a lowcomplexity estimation technique (adjacent original key frames are used as input). The same 4x4 DCT transform is applied over the SI frame estimate and each DCT band is uniformly quantized. The conditional entropy B is computed for each bitplane. The relative error probability p between corresponding DCT band bitplanes of the SI and WZ frames is computed. The parity rate associated to each DCT band bitplane is computed as a function of p and H B. X b X f Ŷ Yˆ DCT X Y ˆ H H XY ˆ B X Y ˆ X DCT H B X Y ˆ Rˆ j Rˆ j B j B H j = N H e X Yˆ + p X Yˆ Yˆ X SI frame estimate WZ frame X DCT 7
The Clever Guy But opposite to conventional video coding, the decoder (not anymore the encoder!) is the KING 8
Decoder Modules: Side Information Creation Since the RD performance is highly dependent on the quality of the side information, it is essential to find efficient encoder and decoder tools to generate the highest quality side information. 9
Decoder Modules: Side Information Creation Trajectory-based Motion Interpolation: Hash-based Motion Estimation: 0
Dec. Modules: Correlation Noise Estimation Performing efficient decoder (online) correlation noise estimation for WZ video coding Is essential for a more realistic/practical PDWZ video coding scenario. Implies the dynamic estimation of the correlation noise distribution parameter assuming a Laplacian distribution. Targets to be as efficient as the offline estimation based on the original information.
Dec. Modules: Correlation Noise Estimation Correlation noise estimation for WZ video coding: Made at the decoder, based on the key frames realistic scenario. Exploits temporal correlation by using the motion compensated residual. Different spatial granularity levels may be used to achieve better adaptation to the correlation noise statistics: Frame level Block level Pixel level Motion compensated residual frame R Frame level? Yes Compute R frame variance Compute CN parameter at frame level as function of R frame variance Next frame No No Block level? Yes Compute block variance Compute CN parameter at block level as function of block variance Last R frame block? Yes Next frame No No Pixel level? Yes Compute CN parameter at pixel level Last R frame pixel? Yes Next frame
Dec. Modules: Request Stopping Criteria To establish if decoding is successful, the decoder convergence is tested by computing the syndrome check error, i.e. the Hamming distance between the received syndrome and the one generated using the decoded bitplane, followed by a cyclic redundancy check (CRC). If the Hamming distance is different from zero, then the decoder proceeds to the next iteration. After a certain amount of iterations ( 00), if the Hamming distance remains different from zero, then the bitplane is assumed to be erroneously decoded and the LDPCA decoder requests for more syndromes via the return channel. If the Hamming distance is equal to zero, then the successfulness of the decoding operation is verified using a 8-bit CRC sum. If the CRC sum computed on the decoded bitplane matches the value received from the encoder, the decoding is declared successful and the decoded bitplane is sent to the reconstruction module. Otherwise, the decoder requests more accumulated syndromes and thus a final low error probability is always guaranteed. If the compression factor comes to, no further requests are made since the code is invertible.
Decoder Modules: Reconstruction The decoded value is reconstructed in a mean squared erroroptimal way as the expectation of x given the decoded quantization index, q, and the side information value, y, this means. The calculation of this expectation value is performed using closed-form expressions derived for a Laplacian correlation model. Those frequency bands for which no information was transmitted from the encoder are taken directly from the Side Information. After that, the inverse 4x4 DCT transform is applied, and the whole WZ frame is restored in the pixel domain. 4
DISCOVER Performance 5
Test Conditions Frames: all frames this means 99 for Foreman, 9 for Hall Monitor, 99 for Coast Guard, and 99 for Soccer. Spatial resolution: QCIF. Temporal resolution: 5 Hz and 0 Hz which means 7.5 or 5 Hz for the WZ frames when GOP= is used. GOP length:, 4 and 8 (a) (b) (c) (d) (e) (f) (g) (h) 6
Evaluation Metrics Forward Channel Performance Evaluation Measuring the Overall Rate-Distortion Performance Measuring the Quality Evolution of WZ Decoded Frames Measuring the Bitplane Compression Factor Measuring the Decoded Quality Versus the Side Information Quality Feedback Channel Performance Evaluation Measuring the Number of Requests Measuring the Feedback Channel Rate Measuring the Number of Errors Versus the Number of Requests Measuring the Number of Requests Versus Side Information Quality Complexity Performance Evaluation Encoding Complexity Decoding Complexity 7
RD Performance (GOP ) QCIF, 5 Hz Coast Guard Soccer 40 40 8 8 6 6 PSNR [db] 4 0 PSNR [db] 4 0 8 6 4 DISCOVER H.64/AVC (Intra) H.6+ (Intra) H.64/AVC (No Motion) 0 50 00 50 00 50 00 50 400 450 500 550 600 8 6 4 DISCOVER H.64/AVC (Intra) H.6+ (Intra) H.64/AVC (No Motion) 0 50 00 50 00 50 00 50 400 450 500 550 600 Rate [kbps] Rate [kbps] Hall Monitor Foreman PSNR [db] 4 4 9 7 5 9 DISCOVER H.64/AVC (Intra) 7 H.6+ (Intra) H.64/AVC (No Motion) 5 0 50 00 50 00 50 00 50 400 450 500 550 600 Rate [kbps] PSNR [db] 4 9 7 5 9 7 5 DISCOVER H.64/AVC (Intra) H.6+ (Intra) H.64/AVC (No Motion) 0 50 00 50 00 50 00 50 400 450 500 550 600 Rate [kbps] 8
RD Performance (GOP,4,8) QCIF, 5 Hz 8 Coast Guard 9 Soccer 6 7 4 5 PSNR [db] 0 PSNR [db] 8 6 4 LDPC - GOP LDPC - GOP 4 LDPC - GOP 8 0 50 00 50 00 50 00 50 400 450 500 9 7 5 LDPC - GOP LDPC - GOP 4 LDPC - GOP 8 0 50 00 50 00 50 00 50 400 450 500 550 600 650 Rate [kbps] Rate [kbps] 4 Hall Monitor 4 Foreman 9 9 PSNR [db] 7 5 9 LDPC - GOP LDPC - GOP 4 LDPC - GOP 8 0 50 00 50 00 50 00 50 5 Rate [kbps] PSNR [db] 7 5 9 7 0 50 00 50 00 50 00 50 400 450 500 550 Rate [kbps] LDPC - GOP LDPC - GOP 4 LDPC - GOP 8 9
LDPC versus Turbo Codes PSNR [db] 7 5 9 7 5 Coast Guard 0 50 00 50 00 50 00 50 400 450 500 Rate [kbps] LDPC - GOP TC - GOP LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 PSNR [db] 9 7 5 9 7 5 Soccer 0 50 00 50 00 50 00 50 400 450 500 550 600 650 Rate [kbps] LDPC - GOP TC - GOP LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 4 9 Hall Monitor 4 9 7 Foreman PSNR [db] 7 5 9 0 50 00 50 00 50 00 50 Rate [kbps] LDPC - GOP TC - GOP LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 PSNR [db] 5 9 7 5 0 50 00 50 00 50 00 50 400 450 500 550 600 Rate [kbps] LDPC - GOP TC - GOP LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 40
Bitplane Compression Factor (Qi 4) Compression Factor (Qi=8) 40 5 Coastguard Foreman Compression Factor 0 5 0 5 0 DC AC AC AC AC4 AC5 AC6 AC7 AC8 AC9 AC0 AC AC AC AC4 5 0 4 5 6 7 4 5 6 4 5 6 4 5 4 5 4 5 Bitplane Number 4 4 4 4 Compression Factor (Qi=8) 40 Hall Monitor Soccer 5 Compression Factor 0 5 0 5 0 DC AC AC AC AC4 AC5 AC6 AC7 AC8 AC9 AC0 AC AC AC AC4 5 0 4 5 6 7 4 5 6 4 5 6 4 5 4 5 4 5 4 4 4 4 Bitplane Number 4
Number of Requests (Qi 8) Number of Requests (Qi=8) 5 Coastguard Foreman Number of Requests 0 5 0 DC AC AC AC AC4 AC5 AC6 AC7 AC8 AC9 AC0 AC AC AC AC4 5 0 4 5 6 7 4 5 6 4 5 6 4 5 4 5 4 5 Bitplane Number 4 4 4 4 Number of Requests (Qi=8) 5 Hall Monitor Soccer Number of Requests 0 5 0 DC AC AC AC AC4 AC5 AC6 AC7 AC8 AC9 AC0 AC AC AC AC4 5 0 4 5 6 7 4 5 6 4 5 6 4 5 4 5 4 5 4 4 4 4 Bitplane Number 4
Encoding Complexity (GOP ) 60 50 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) Coast Guard 50 40 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) Soccer Time (sec) 40 0 0 0 Time (sec) 0 0 0 0 4 Qi 5 6 7 8 0 4 5 6 7 8 Qi 60 50 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) Hall Monitor 50 40 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) Foreman 40 Time (sec) 0 0 Time (sec) 0 0 0 0 0 4 Qi 5 6 7 8 0 4 5 6 7 8 Qi 4
Decoding Complexity (GOP ) 000 500 DISCOVER (WZ Frames) H.64/AVC (Intra) Coast Guard DISCOVER (Key Frames) H.64/AVC (No Motion) 4000 500 DISCOVER (WZ Frames) H.64/AVC (Intra) Soccer DISCOVER (Key Frames) H.64/AVC (No Motion) Time (sec) 000 500 000 Time (sec) 000 500 000 500 000 500 500 0 4 5 6 7 8 0 4 5 6 7 8 Qi Qi 400 Hall Monitor 500 Foreman 00 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) 000 DISCOVER (WZ Frames) H.64/AVC (Intra) DISCOVER (Key Frames) H.64/AVC (No Motion) 000 500 Time (sec) 800 600 Time (sec) 000 500 400 000 00 500 0 4 Qi 5 6 7 8 0 4 5 6 7 8 Qi 44
Performance Conclusions In terms of RD performance, the DISCOVER codec already wins against the H.64/AVC Intra codec, for most test sequences, and for GOP=; for more quiet sequences, the DISCOVER codec already wins against the H.64/AVC No Motion codec. For longer GOP sizes, winning against H.64/AVC Intra is more difficult highlighting the importance and difficulty of side information, notably when key frames are farther away. The total bitrate for the feedback channel is rather low but the feedback adds delay and requires a real-time setup. DISCOVER encoding complexity is always much lower than the H.64/AVC Intra encoding complexity, even for GOP= where it performs better in terms of RD performance. 45
DISCOVER (the) Future 46
Main Conclusions Since the DISCOVER monoview codec performs better than H.64/AVC Intra for GOP=, for most sequences, this highlights that Wyner-Ziv is already a credible coding solution when encoding complexity is a very critical requirement (even if at the cost of some additional decoding complexity). The results achieved during the lifetime of DISCOVER allowed to improve the compression performance of monoview WZ codecs but it is clear that much research is still to be made to approach the theoretical limits Further research should address side information creation, correlation noise modeling, channel codes, rate control, reconstruction, WZ selective coding, etc 47
DISCOVER for the World The DISCOVER Codec may be downloaded at http://www.discoverdvc.org/! The executable codec, along with sample configuration and test files, can be downloaded for: Windows Linux/-bit Linux/64-bit An overview paper and a detailed performance evaluation with precise test conditions are also available. 48
Main References General J. Ascenso, C. Brites, and F. Pereira, Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding, in Proc. 5th EURASIP Conf. Speech Image Processing, Multimedia Commun. Services, Smolenice, Slovak Republic, July 005. J. Ascenso, C. Brites, F. Pereira, Content adaptive Wyner-Ziv video coding driven by motion activity, IEEE International Conference on Image Processing, Atlanta, USA, October 8-, 006. X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, M. Ouaret, The DISCOVER codec: architecture, techniques and evaluation, Picture Coding Symposium, Lisboa, Portugal, November 007. C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi, R. Leonardi, J. Ostermann, Distributed monoview and multiview video coding, IEEE Signal Processing Magazine, vol. 4, nº 5, pp. 67 76, September 007. Codec J. Ascenso, C. Brites, F. Pereira, "Content adaptive Wyner-Ziv video coding driven by motion activity", IEEE International Conference on Image Processing, Atlanta, USA, October 006. J. Ascenso, F.Pereira, "Adaptive hash based side information exploitation for efficient Wyner-Ziv video coding", IEEE International Conference on Image Processing, San Antonio, USA, September 007. Encoder C. Brites, F. Pereira, Encoder rate control for transform domain Wyner-Ziv Video coding, IEEE International Conference on Image Processing, San Antonio, Texas, USA, September 007. D. Kubasov, K. Lajnef, and C. Guillemot, A hybrid encoder/decoder rate control for a Wyner-Ziv video codec with a feedback channel, IEEE Multimedia Signal Processing Workshop, MMSP, Chania, Crete, Greece, Oct. 007. Decoder C. Brites, J. Ascenso, F. Pereira, Modeling correlation noise statistics at decoder for pixel based Wyner-Ziv video coding, Picture Coding Symposium, Beijing, China, April 006. C. Brites, J. Ascenso, F. Pereira, Studying temporal correlation noise modeling for pixel based Wyner-Ziv video coding, IEEE International Conference on Image Processing, Atlanta, USA, October 006 D. Kubasov, J. Nayak, C. Guillemot, Optimal reconstruction in Wyner-Ziv video coding with multiple side information, IEEE MultiMedia Signal Processing Workshop, Chania, Crete, October. -, 007. 49
IST DISCOVER Team Thanks for your attention! More information at http://www.discoverdvc.org/ 50