View Sequence Coding using Warping-based Image Alignment for Multi-view Video



Similar documents
Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video Coding

Adaptive Block Truncation Filter for MVC Depth Image Enhancement

REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4

Multihypothesis Prediction using Decoder Side Motion Vector Derivation in Inter Frame Video Coding

Efficient Prediction Structures for Multi-view Video Coding

THE EMERGING JVT/H.26L VIDEO CODING STANDARD

Parametric Comparison of H.264 with Existing Video Standards

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Video Coding Technologies and Standards: Now and Beyond

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Very Low Frame-Rate Video Streaming For Face-to-Face Teleconference

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

False alarm in outdoor environments

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201


Analyzing Facial Expressions for Virtual Conferencing

WHITE PAPER. H.264/AVC Encode Technology V0.8.0

White paper. H.264 video compression standard. New possibilities within video surveillance.

Figure 1: Relation between codec, data containers and compression algorithms.

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Image Compression through DCT and Huffman Coding Technique

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

A System for Capturing High Resolution Images

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

Using angular speed measurement with Hall effect sensors to observe grinding operation with flexible robot.

Full Interactive Functions in MPEG-based Video on Demand Systems

For Articulation Purpose Only

How does the Kinect work? John MacCormick

Video compression: Performance of available codec software

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

High Quality Image Magnification using Cross-Scale Self-Similarity

Transform-domain Wyner-Ziv Codec for Video

Introduction to image coding

Edge tracking for motion segmentation and depth ordering

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Understanding Compression Technologies for HD and Megapixel Surveillance

Color Segmentation Based Depth Image Filtering

How To Improve Performance Of The H264 Video Codec On A Video Card With A Motion Estimation Algorithm

How To Improve Performance Of H.264/Avc With High Efficiency Video Coding (Hevc)

Motion Estimation. Macroblock Partitions. Sub-pixel Motion Estimation. Sub-pixel Motion Estimation

An Iterative Image Registration Technique with an Application to Stereo Vision

ANIMATION a system for animation scene and contents creation, retrieval and display

302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009

Complexity-rate-distortion Evaluation of Video Encoding for Cloud Media Computing

TECHNICAL OVERVIEW OF VP8, AN OPEN SOURCE VIDEO CODEC FOR THE WEB

Whitepaper. Image stabilization improving camera usability

A Method of Caption Detection in News Video

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Low-resolution Character Recognition by Video-based Super-resolution

Vision based Vehicle Tracking using a high angle camera

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl

Region of Interest Access with Three-Dimensional SBHP Algorithm CIPR Technical Report TR

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

ATSC Standard: 3D-TV Terrestrial Broadcasting, Part 2 Service Compatible Hybrid Coding Using Real-Time Delivery

Case Study: Real-Time Video Quality Monitoring Explored

H.264/MPEG-4 AVC Video Compression Tutorial

Hole-Filling Method Using Depth Based In-Painting For View Synthesis in Free Viewpoint Television (FTV) and 3D Video

Key Terms 2D-to-3D-conversion, depth image-based rendering, hole-filling, image warping, optimization.

DISCOVER Monoview Video Codec

A Novel Hole filling method based on Projection onto Convex Set in DIBR

Intelligent Monitoring Software

Loudspeaker Equalization with Post-Processing

An Active Head Tracking System for Distance Education and Videoconferencing Applications

Automatic Labeling of Lane Markings for Autonomous Vehicles

How To Decode On A Computer Game On A Pc Or Mac Or Macbook

Fast Hybrid Simulation for Accurate Decoded Video Quality Assessment on MPSoC Platforms with Resource Constraints


WHITE PAPER. Are More Pixels Better? Resolution Does it Really Matter?

Research on the UHF RFID Channel Coding Technology based on Simulink

Thresholding technique with adaptive window selection for uneven lighting image

The Study on Mechanical Performances of Thin SiO 2 Film by. Micro-Bridge Methods

Using Photorealistic RenderMan for High-Quality Direct Volume Rendering

A Prototype For Eye-Gaze Corrected

Multiple Description Coding (MDC) and Scalable Coding (SC) for Multimedia

DTU FOTONIK. Ann Ukhanova, Lars Dittmann and Søren Forchhammer. Technical University of Denmark

Two-Frame Motion Estimation Based on Polynomial Expansion

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

Video Coding Standards and Scalable Coding

Analyzing Facial Expressions for Virtual Conferencing

The Open University s repository of research publications and other research outputs

A comprehensive survey on various ETC techniques for secure Data transmission

RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE

UNIVERSITY OF CENTRAL FLORIDA AT TRECVID Yun Zhai, Zeeshan Rasheed, Mubarak Shah

MPEG-1 and MPEG-2 Digital Video Coding Standards

Introducing your Intelligent Monitoring Software. Designed for security.

Spatio-Temporally Coherent 3D Animation Reconstruction from Multi-view RGB-D Images using Landmark Sampling

Transcription:

View Sequence Coding using Warping-based mage Alignment for Multi-view Video Yanwei Liu, Qingming Huang,, Wen Gao 3 nstitute of Computing Technology, Chinese Academy of Science, Beijing, China Graduate School of Chinese Academy of Science, Beijing, China 3 School of Electronics Engineering and Computer Science, Peking University, Beijing, China {ywliu, qmhuang, wgao}@jdl.ac.cn Abstract Multi-view video sequences are recorded by multiple cameras simultaneously. n view direction the images captured by different cameras in the same time instant form the view image sequences. The view sequence represents the geometry changes of the scene in space. The compression of view sequence provides the great contribution to the whole multi-view video compression. This paper discusses the view sequence coding (VSC) and proposes a warping-based alignment technology to improve the compression performance. Proposed method caters to the traits of view sequence and adopts the affine model to warp reference image. mage inpainting making use of information from adjacent warped image is proposed to process the missing regions formed while warping. The aligned multi-hypothesis disparity prediction is employed to further improve the prediction effects. The eperiment shows that the coding efficiency of the proposed scheme keeps head and shoulders above all the intra coding in simulcast coding scheme and the proposed scheme also improves the coding efficiency by up 0.3 ~ 0.5 db at low bit rates when compared with the standard coding method. view direction coding is used and the reconstructed view sequence images will be used as reference images for coding each view with temporal prediction and inter-view prediction. t is illustrated that view sequence coding (VSC) provides great contribution for the whole multi-view video compression compared with simulcast coding, where standard video coding technique are applied to compress each view independently. Keywords: View Sequence Coding, mage Alignment, npainting, Multi-view Video ntroduction With the rapid development of 3D display and capturing technology, multi-viewpoint video is becoming popular because it captures a single dynamic scene simultaneously from different view directions and provides a 3D depth impression of the observed visual scene (A. Smolic 005). A variety of applications, such as interactive free viewpoint video, 3D television (3DTV), surveillance, etc, are typical and potential markets for multi-view systems. For efficient transmitting the multiple signals, many new compression technologies besides traditional video coding methods are eplored. nter-view prediction and disparity compensation are key technologies for multi-view video compression. Most coding scheme established on traditional block-based hybrid coding adopts coding structure in Figure (S/EC JTC/SC9/WG N6909 005). Because the same scene is captured by multiple cameras, the images recorded in the same time instant from different cameras form a view direction sequence which represents the change of scene in geometry, such as Time T0 in Figure. n Figure, at time instant T0 and TN Figure The general multi-view coding structure (8 views, N is the temporal GP length) mage alignment, namely establishing correspondence in space between the two or more images, has been etensively studied in literature of computer vision (H. Sawhney 997 and Y. Caspi 000). These methods focus on improving the visual quality of all sequences in space and time. A geometry transform based alignment method is first proposed in (S/EC JTC/SC9/WG M700 005) to preprocess the input multi-view sequences to improve the coding performance, but this method does resample the original data and changes them. Thus, the method is not very appropriate for video compression. n this paper we shall improve the coding efficiency of view direction sequence by adopting the warping-based alignment to compensate the global camera shift, simplifying the correspondence search computation and improving the disparity coding efficiency. The warped image is just used as the coding reference image and the original image is not changed. Affine model

or perspective model employed in MPEG-4 for the background sprite coding (S/EC 4496, 000) is used here to warp one image from another image. Though the alignment is not very accurate, it compensates the global disparity between two adjacent images and reduces the bits of coding disparity, thereby, increase the coding efficiency. The paper is organized as follows.n Section we propose a novel view sequence coding scheme utilizing warp-based image alignment and disparity compensation; n Section 3 we interpret the key technologies used in the proposed coding solution in detail; Eperiment results are provided in Section 4. Finally, Section 5 concludes the paper with some remarks. Proposed view sequence coding scheme n a multi-view system all cameras have to look at the same central scene from different positions. The images captured simultaneously by multiple cameras reflect the scene structure changes in geometry. The disparity is generated due to relative geometry relation and different parameters of the cameras and it reflects the depth information of the scene, so it is a little different from the object s temporal motion in physical model. The standard video coding methods, which are suitable for coding motion information, are not fully efficient for coding disparities in view sequence. According to the characteristics of multiple camera geometry, we propose a warping-based image alignment to compensate the global disparity. We first establish the global relation between two adjacent images by estimating the affine model and project one image into another image plane. This processing course is named warping. Secondly, after warping the corresponding image would miss some piels at the image boundary (holes). The missing areas will be filled via the inpainting technology using adjacent warped image information. At last the aligned image will be used as reference image for coding current image. The additional warping model coefficients are coded using entropy coder. The general VSC structure is shown in Figure. n the figure, variable-size block-based disparity estimation is just done as block-based motion estimation, but the search range is enlarged. To further strengthen the effects of disparity compensation, warping-aligned multi-hypothesis disparity prediction is also introduced into the VSC scheme. 3 Key technologies for VSC Section illustrates the proposed coding scheme. The whole coding structure is compatible with the standard video coding method, but the interesting techniques, including image warping process, inpainting process and warping-aligned multi-hypothesis disparity prediction, which are catering to the traits of view sequence, are proposed in this section to enhance the coding efficiency. We shall interpret these techniques in detail. 3. Affine model estimation and reference image warping n current study, the affine model is employed to perfectly represent the geometry relation of adjacent views. The model can describe camera motions such as translation, Figure Proposed VSC scheme pan and zoom using si parameters. The model is parameterized by the following equation: = a + a y + a, 3 y = a4 + a5 y + a () 6 Equation () describes the transformations of a coordinate (, y) in the reference image into a coordinate (, y ) in the current image. n multi-view geometry.the model parameters ( a, a, a3, a4, a5, a6 ) could be directly derived by camera parameters. Generally, it is very difficult to achieve the accurate camera parameters, so we adopt global motion estimation method to acquire the si parameters (F. Dufau and J. Konrad 000). Since the warping is aiming at aligning the different view images toward one base image, the images recorded by other cameras will warping toward current view image via achieved affine model parameters. The piels which are not situated in integer coordinate will be processed by bilinear interpolation. Figure 3 illustrates the warping process and warping result of the ballroom sequence (S/EC JTC/SC9/WG M077 005 ) is shown in Figure 4 Figure 3 Warping process: : left view image, : current view image, : warped image from left image toward current image ( toward warping)

3. npainting missing regions During the course of warping, blank regions, such as the white areas in the middle image ( ) of Figure 3, will be possible to appear in the warped image. t is because the piels in one image could not find the corresponding piels in another view. These regions are defined as occluded regions in computer vision. n MPEG-4 object-based video coding the low-pass etrapolation (LPE) padding is utilized to fill the blank regions (S/EC 4496, 000). additional bits to signal to the decoder in order to keep the match of encoder and decoder. Here, we make use of superiority of the multi-view images to solve the missing regions by simplified inpainting technology. ur method has roughly two phases: copying and smoothing. t is assumed that we process the intermediate image between the adjacent three view images. We first warp the left image toward intermediate image and then warp right image toward the intermediate image. Because the fied viewing areas are etended toward opposite directions between two adjacent views, the missing areas in the warped image from left view is definitely different from that in the warped image from right view. Thus, we can copy the corresponding piels from the right warped image to fill the missing region in the left warped image and oppositely we copy the corresponding piels from left warped image to fill the missing areas in the right warped image. The copying from to is defined as (a) (b) where new Ω ) = ( ) + ( 0 ( i, j) Ω / = ( i, j) () (c) (d) is the characteristics function of, which represents missing regions and Ω / is non filling regions. The first term of () is to keep the information of Ω / from image and the second term is to copy the missing information from. (e) Figure 4 The original images and warped images: (a), (b) and (c): the first images of ballroom sequences view 0, view and view. (d): (a) toward (b) warped image. (e): (c) toward (b) warped image. (f) : the inpainted version of (d). Though LPE technology improves the quality of warped image, LPE just makes use of spatial information in the current warped image and unfortunately, the method does not perfectly represent the original image areas. npainting technology is a common method to fill the missing image regions. The basic ideas of inpainting are propagating the local structure information from image to the region to be filled and keeping the global or local consistence of color/intensity, so the technology is also called image completion. n literature (Wen-Yi Zhao 004 and Y.Matsushita 005) the sophisticated inpainting methods including the space-time partial differential equations(pdes) and motion inpainting, are proposed to reasonably represent the missing regions, but the methods don t fit for our coding purpose. Because the warped image is used as reference image during encoding and decoding, the complicated filling method will bring (f) After the copying process, the undesired boundary line maybe occurred around the previous missing region and it affects the inpainting effects badly. The occurrence of the undesired boundary line in the missing regions is due to the non concatenation of piels at boundary in optical flow field because of the imprecise warping. The spatial smoothing process using 55 filter is done for removal the boundary line. The filtering is very efficient and it makes the resulting image look natural and coherent. Figure 4 provides the inpainting result of warped image. 3.3 Multi-hypothesis disparity prediction n the coding solution from Figure, the referenced image will be warped toward the current image while coding the current image. The warping-based alignment can improve the correspondence of two view images, but the warping is not perfectly eecuted to fulfil the accurate correspondence of piels in two images. Thus, inaccuracy of warping has a great influence on the disparity prediction. n standard video coding, multi-hypothesis motion prediction chooses multiple signals for the linear combination to generate a prediction signal (Markus Flierl, Thomas Wiegand, and Bernd Girod 000). The combined image signal could limit the noise error and improve the efficiency of prediction. n our work we continue to adopt the multi-hypothesis prediction in B frames, but the reference images of B frame are warping-aligned toward B frame. The jointly estimated two hypothesis signals from the two warped reference image would compensate the inaccuracy of warping in some etent and reduce the bit rate. To increase the coding gains we also introduce the

multi-hypothesis disparity prediction in P frames, and the frame is allowed to use multiple past warped images to combination as a prediction signal for the current image. 4 Eperiment result To investigate the proposed coding scheme for VSC, we etract the view images recorded in the same time instant from the standard multi-view video sequences and organize the images into a view sequence. The test sequences of the eperiment are ballroom from MERL and Akko&Kayo from the Nagoya University (Tanimoto Laboratory). The coding condition for ballroom sequence is shown in Table. Figure 5 provides the coding performance of first ballroom view sequence (the all view images at T0 in Figure ). t shows that the coding performance of our proposed scheme keeps head and shoulders above standard intra frame coding (When simulcast coding is used to encode multi-view video, the corresponding frames in regular temporal access position are encoded as intra frames, such as all view s frames at time T0 or TN), and the coding performance is about 0.3 db higher than that of the standard coding. t can be seen that at very low bit rates the warping-based alignment will provide more contribution to coding performance. As shown in Figure 6 the coding gains can increase 0.5 db compared with H.46/AVC coding for Akko&Kayo view sequence. The coding conditions, including 5 view images with parallel camera setup (Akko&Kayo sequences are captured by D camera array and the 5 views in the horizontal direction is utilized in the eperiment) are adopted in Figure 6. n the eperiment we observe that the accurate warping has the great influence on prediction efficiency, so efficient estimation of warping model is the key to the whole compression scheme. Though the multi-hypothesis disparity prediction can improve the prediction effects, its profit lies on the more precise warping. View number 8 Setup of views Baseline between views Search range Basis QP Entropy coding Rate control RD optimization Direct mode Parallel 9.5cm 3 (full piel) 36 39 4 CABAC No Yes Spatial Table Coding parameters in the eperiment for ballroom sequence Figure 5 The R-D curves of ballroom view sequence at T0 (the first images of all views) Figure 6 The R-D curves of Akko&Kayo view sequence at T0 5 Conclusions n this paper we have discussed a warping-based image aligned VSC scheme to improve the coding performance. During the encoding process, reconstructed adjacent view image is warped toward current image using the affine model and the warped images are sent to the coding buffer as the current reference images. Novel image inpainting technology is introduced to fill the missing regions formed in the warping transformation. The warped image with inpainting processing brings more information to predict the coding image. To further improve the prediction effects, aligned multi-hypothesis prediction is used to compensate the inaccuracy of warping. t is shown by the eperimental results that the proposed coding method provides the coding performance far and away higher than all intra coding in the simulcast coding scheme and it also achieves coding gains up to 0.3~0.5 db compared with the standard H.64/AVC based coding. n future work, we will investigate how to improve the whole multi-view video compression performance via the proposed method. Acknowledgement. The research of this paper has been supported by the National Natural Science Foundation of China (No.6033300) and the Natural Science Foundation of Beijing, China (No.404003). The authors would like to thank MERL Lab and Tanimoto Lab of Nayoga University for providing test sequences used in the simulation.

References A. Smolic, and P. Kauff, (005) nteractive 3D Video Representation and Coding Technologies, Proceedings of the EEE, Special ssue on Advances in Video Coding and Delivery, vol. 93. S/EC JTC/SC9/WG, Survey of Algorithms used for Multi-view Video Coding (MVC), Doc.N6909, Hong Kong, China, January 005. H. Sawhney and R. Kumar.(997) True multi-image alignment and its application to mosaicing and lens distortioncorrection. n EEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 450-456. Y. Caspi and M. rani.(000) A step towards sequence-to-sequence alignment. n EEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 68-689, Hilton Head sland, South Carolina. S/EC JTC/SC9/WG, Response to Call for Evidence on Multi-View Video Coding, Doc.M700, Hong Kong, China, January 005. S/EC 4496, Part (Visual)(000), Ammendment nformation Technology - Coding of Audio-Visual bjects (MPEG-4) F. Dufau and J. Konrad,(000) Efficient, robust, and fast global motion estimation for video coding, EEE Trans. mage Processing, vol. 9, pp. 497-50. S/EC JTC/SC9/WG, Multiview Video Test Sequences from MERL,Doc.M077, Busan.Korea,April 005. Wen-Yi Zhao,(004) Motion-based spatial-temporal image repairing., in EEE Proceedings of the 004 nternational Conference on mage Processing (CP 004), Singapore. Y.Matsushita, E.ofek, X.Tang and H.Shum, (005) Full-frame Video Stabilization, in EEE Conf on Computer Vision and Pattern Recognition. Markus Flierl, Thomas Wiegand, and Bernd Girod, (000) Rateconstrained multi-hypothesis motion-compensated prediction for video coding, in Proceedings of the EEE nternational Conference on mage Processing CP-000, Vancouver, BC, Canada. Tanimoto Lab http://www.tanimoto.nuee.nagoya-u.ac.jp/.