Latest Results on High-Resolution Reconstruction from Video Sequences S. Lertrattanapanich and N. K. Bose The Spatial and Temporal Signal Processing Center Department of Electrical Engineering The Pennsylvania State University, University Park, PA 16802, U.S.A E-mail: nkb@stspbkn.ee.psu.edu Fax:(814) 865-7065 Abstract A problem of recurring interest involves the construction of a panoramic video mosaic from a video sequence generated from a camera and then obtain high resolution images of regions of interest in the mosaic. The camera motions are estimated by calculating the parameters of a projective model and then superresolution is attained by an algorithm for constructing a high resolution image from undersampled noisy and blurred frames. Research is currently in progress to assess and incorporate robustness to errors in motion parameter estimation of the camera and also the blurring phenomenon. 1 Introduction The classical noise-free image restoration (deconvolution) problem involving the finding of a multidimensional discrete signal s from the linearly blurred observation, g = s h where h denotes a compactly supported blurring operator is ill-posed in the sense that the deconvolution algorithm is not robust to errors in estimating h. On the other hand consider the multiple This research was supported by AFRL Contract F30602-98-0061. 1
deconvolution problem involving observations g i = s h i, i = 1, 2,..., n resulting from snapshots with multiple camera (a single camera with displacements is not excluded). It has been established [1] that if there exists a set of compactly supported distributions, f i, i = 1, 2,..., n such that n h i f i = δ i, (1) i=1 where δ i is the unit impulse function, then the multiple deconvolution problem is well-posed and the original image is given by n s = g i f i. i=1 In the transform domain Eq. (1) leads to the coprimeness condition, n H i ( )F i ( ) = 1 (2) i=1 where H i and F i are, respectively, the transforms of h i and f i. The preceding condition states that the set {H i } of transforms of the blur operators is devoid of a common zero so that no information about the signal to be reconstructed is lost. Here, a well-posed problem is one for which a unique solution exists and the inverse operator is continuous. Naturally, the single deconvolution problem fails to satisfy both the uniqueness and the continuity conditions. The need to reconstruct a high resolution image from multiple undersampled, blurred and noisy frames occur in many applications including, LANDSAT pictures, medical images, and, more recently, in video, where a resolution higher than that of the cameras (sensors) are required. It involved upsampling (interpolation) of the input sampling lattice as well as the reduction or elimination of aliasing, blurring and noise. Remember that most images contain sharp edges that are not strictly bandlimited and, therefore, digital images suffer from aliasing due to undersampling, loss of highwavenumber detail due to low resolution point-spread function of the sensors and possible blurring due to relative motion, optical aberrations, media turbulence, and a variety of camera motions. Dramatic progress has been documented during the last decade in the area of high resolution image sequence processing that encompasses the stages of image registration or camera motion parameter estimation, 2
deblurring and noise reduction (filtering), and upsampling (interpolation) [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. In this paper, the latest research activity pertaining to the attainment of spatial resolution increase or to get a panoramic mosaic for an acceptable resolution of the scene is reported. Because the video is used as a source of information, the user will get the sequence of images that contain both spatial and temporal information. The number of pixels in each frame is fixed to a known resolution. To generate one big snapshot of the scene which covers all desired areas (panoramic image), one needs to adjust the camera in various ways to capture the effects of zooming, panning, tilting, etc. that may be required for capturing the entire scene. Usually, the resulting picture will suffer from undersampling or low resolution effect because the whole scene has to be represented by the limited number of pixels. Therefore, the bigger the scene, the lower will be the resolution. On the other hand, higher resolution of the scene can be obtained if the camera is zoomed into a specific region. However, in this case the panoramic image will not be obtained. Therefore the trade-off between the size of the scene and its resolution has to be addressed. The idea is to take advantage of the intraframe spatial information alongwith interframe temporal information to create the high resolution panoramic image and then attain superresolution of regions of interest in the mosaic. 2 Models for Camera Motion Parameter There are several models for camera motion parameters. The projective model is widely used. This model is described by the eight parameters m i, i = 1, 2,..., 8 through the matrix equation u m 1 m 2 m 3 x v = m 4 m 5 m 6 y w m 7 m 8 1 1 (3) where the transformed spatial coordinates (x, y ) are x = m 1x + m 2 y + m 3 m 7 x + m 8 y + 1 = u w, y = m 4x + m 5 y + m 6 m 7 x + m 8 y + 1 In the model, m 1, m 2, m 4, m 5 represent scaling, rotation, shearing, m 3 and m 6 represent respectively, horizontal and vertical translations while m 7 and m 8 characterize chirping and keystoning (panning, tilting). The assumptions of static scene and absence of parallax are invoked. The projective model = v w (4) 3
is derivable from its approximating bilinear model [14] x = q 1 xy + q 2 x + q 3 y + q 4 y = q 5 xy + q 6 x + q 7 y + q 8, (5) whose parameter vector q [q 1 q 2... q 8 ] T is obtained as described below in the subsection on automatic registration. After calculating q, the projective motion parameter vector m [m 1 m 2... m 8 ] T can be obtained by relating Eqs. (4) and (5). 3 Image Registration Image registration is the procedure that is used to estimate the relative motion parameters between two images, the reference image and the current image, by bringing one to coincide with another. In a video sequence, there are more than two images and each successive images I i and I i+1 where i = 1, 2,..., N 1 (N is the number of images in the sequence) can be registered independently. After all motion parameters are estimated, one can stitch the images into a panoramic mosaic. The existing methods for image registration can be categorized into manual and automatic registration. In order to make this paper self-contained, each method will be discussed briefly. 3.1 Manual registration The concept of manual registration is very straightforward. To estimate the eight motion parameters m 1, m 2,..., m 8 in Eq (4), it is necessary to establish at least eight equations by manually selecting at least four pairs of corresponding points, (x i, y i ) and (x i, y i ), i = 1, 2,..., P where P 4, in the reference and the current images, respectively. From Eq. (4), each pair of point (x i, y i ) and (x i, y i ) will give two linear equations which can be written in the generic form x i y i 1 0 0 0 x i x i x i y i m = 0 0 0 x i y i 1 y i x i y i y i x i y i. (6) After substituting all pairs of corresponding points into Eq. equations in the compact form (6), one can write the set of linear Am = b (7) 4
where the corresponding points are assumed to be such that the (2P 8) matrix A is of full rank. Then the minimum norm least-squares solution is given by m = (A T A) 1 A T b. (8) 3.2 Automatic registration Because the process of manual registration is too tedious to be useful in large scale composition applications, the automatic approach is more suitable in both speed and accuracy. However, the manual registration is more robust than automatic approach. Automatic registrations can be categorized to be either optimization theory based (such as least square method (LSE) [13]) or spatio-temporal derivative and optical flow theory based. 3.2.1 Optimization theory based method This method directly minimizes the sum of squares of discrepancies in intensities over overlapping pixel locations in a pair of images that are to be registered. The cost funtion is E = i [I (x i, y i ) I(x i, y i )] 2 = i e 2 (9) where (x i, y i ) is given by Eq. (4) and I is the transformed current image with respect to reference image I. To perform the minimization problem, several gradient-based optimization methods could be used. However, the Levenberg-Marquardt iterative non-linear algorithm is recommended by Szeliski [13] because it provides a good balance between speed of convergence and computational complexity. The Levenberg-Marquardt algorithm involves the computation of partial derivatives of e i with respect to each unknown motion parameters m k, k = 1, 2,..., 8 and forms an approximate Hessian matrix A and a weighted gradient vector b. Then the estimated motion parameter vector m can be updated recursively until a preset error criterion is satisfied for local convergence. The detailed implementation could be found in [13]. 3.2.2 Optical flow based method An optical flow method was proposed by Mann and Picard [14]. The optical flow is the velocity field in the image plane due to the motion of camera and the motion of objects in the scene. The 2-D 5
optical flow equation is given by u f E x + v f E y + E t 0. (10) where u f and v f are the flow velocities along x and y directions, respectively and E x, E y, E t are the partial derivatives of I(x, y, t) with respect to x, y, and t, respectively. The goal of this method is to fit model velocities u m and v m into the optical flow equation. Those model velocities are defined by u m = x x and v m = y y (11) where x and y are given in Eq. (4). The cost function is ɛ = x (u m E x + v m E y + E t ) 2. (12) Due to the complexity of the projective model motion parameters, the bilinear model in Eq. (5) can be used as an approximant to the projective model. To minimize ɛ, one differentiates ɛ with respect to each element in the parameter vector q = [q 1 q 2... q 8 ] T and then sets the derivatives to zero. Then the resulting linear system of equations from which q is calculated is ( ) Φ(x, y)φ T (x, y) q = E t Φ(x, y), (13) x,y x,y where Φ(x, y) = [xye x xe x ye y E x xye y xe y ye y E y ] T. 4 Panoramic Video Mosaic After estimating motion parameters of each successive images in the sequence, one gets a set of motion parameter matrices {T i,i+1 : i = 1, 2,..., N 1} where matrix T i,i+1 contains all motion parameters m 1, m 2,..., m 8 of image I i+1 with respect to I i as shown in Eq. (3). This set of motion parameter is called differential motion parameter because it is the parameter between successive images. Next, one needs to select one image in the sequence to be an absolute reference image and then calculates a set of absolute motion parameter matrices for the frames with respect to this chosen absolute reference frame [15]. For example, if the image I r is chosen to be absolute reference image, the absolute motion parameter set will be {T r,i : i = 1, 2,..., N and i r}. It is noted that T r,r is the 3 3 identity matrix. After the absolute motion parameters have been calculated, each image in the sequence can be transformed and aligned with respect to the absolute reference frame. The resulting image is a 6
(a) Some frames in Alan Alda sequence. (b) Panoramic video mosaic from Alan Alda sequence. Figure 1: An example of panoramic video mosaic from the Alan Alda sequence. 7
panoramic video mosaic. An example of panoramic video mosaic construction will be give next. In Figure 1(a), some images in Alan Alda sequence are shown. Note that this sequence is originally obtained from Mann and Picard [14] which contains 29 frames. In this computer simulation, the authors estimate differential motion parameters using automatic registration based on optical flow method and choose frame 15 to be the absolute reference image. mosaic is shown in Figure 1(b). The resulting panoramic video 5 Superresolution for Region of Interest Since there is a lot of redundant information over the overlapping parts in a panoramic video mosaic, it is possible to improve the resolution of any Region of Interest (ROI) in the panoramic image. The term Superresolution usually has been used for this area of research. It involves the reconstruction of a high resolution image from a sequence of low resolution frames. The superresolution technique can be directly applied to get higher resolution of ROI in the panoramic image. The procedure is summarized as followings: 1. Construct the panoramic video sequence from the sequence of low resolution images. 2. Define any ROI on the panoramic image by selecting appropriate corners of of an enclosing rectangle. 3. Extract the subsequence of low resolution images each of which meets a minimum preset overlap of the chosen ROI or its subset present with the same ROI in a reference frame from the original sequence. 4. Apply the superresolution technique to the subsequence of low resolution ROIs. The earliest work on superresolution was proposed by Tsai and Huang [17] in 1984. They used a transform domain method to eliminate the aliasing problem due to undersampling. Their algorithm exploits the aliasing relationship between the continuous and discrete Fourier transforms of the original analog and undersampled image. The main drawbacks of their algorithm involve failure to compensate for blur and noise and the restriction to purely translational motion parameters. Later, Kim, Bose and Valenzuela [2] generalized the work of Tsai and Huang [17] to include filtering simultaneously with interpolation and provided a set of necessary and sufficient conditions for solving a 8
Sequence of low resolution images... Image alignment Panoramic video mosaic Image registration... Motion parameters Select two points to define ROI High resolution ROI Supperresolution Extract sequence of ROIs Figure 2: The procedure of superresolution for ROI. structured system of linear equations. Morever, a recursive scheme using the weighted least square algorithm was proposed in [2]. Next, Kim and Su [3] presented the RLMS method for noisy blurred images by using Tikhonov regularization. Subsequently, Bose, Kim, and Valenzuela [4] extended the work in [2] by proposing a recursive total least square algorithm in order to take into accounts both observation error and error in motion parameter estimation. Projection Onto Convex Sets (POCS) is another method that can be used to reconstruct high resolution image. This method exploits the convex sets which represent tight constrains on the required image. POCS-based method was first proposed by Stark and Oskoui [18] and then Tekalp et al. [5] have extended their works by incorporating observation noise into the problem. Later, Patti et al. [6] proposed more a general approach by taking the aperture time into account. Another methodology toward the problem of superresolution is suggested by Elad and Feuer [7]. Their superresolution restoration is modeled by using sparse matrices and the ML (Maximum Likelihood), the MAP (Maximum a Posteriori), and the POCS points of view. They claimed that their algorithm is a unified method which incorporates POCS into the ML or MAP restoration. Recently, they also proposed superresolution restoration based on adaptive filtering [8]. A different approach towards superresolution was suggested by Irani and Peleg [9] [10]. Rather than using the pure translation model, the rigid model (including rotation) was chosen in the image registration process. The iterative back projection algorithm (IBP) was proposed to reconstruct a high resolution image from a sequence of low resolution frames. Recently, this algorithm was directly 9
applied by Zomet and Peleg [11] to construct a high resolution panoramic mosaic. A polyphase backprojection algorithm was reported [12]; however, it is restricted only to translational motion parameters. 6 Conclusion A panoramic video mosaic is constructed from a video sequence following the estimation of the projective model motion parameters of the camera. Subsequently, superresolution of any regions of interest (ROI) in the panoramic mosaic is generated from an appropriate subsequence, each of whose elements contain a chosen acceptably significant portion of the ROI. Researach is in progress for incorporating in the superresolution algorithm the property of robustness to errors in motion parameter estimation and the identification of blur parameters [16]. The desired robustness must be considered in conjunction with fast implementation, possibly using filter banks. In the use of the algorithm for superresolution by backprojection, the choice of the backprojection kernel is crucial to convergence and the quality of the solution. This choice ranges from the inverse of the transform of the blur point-spread function (as done here in the simulation) to the square of the blur PSF, assumed to be known, as in [12], where camera motions are restricted to translations. The problem of simultaneously estimating an unknown blur (or even blurs in case of multiple cameras) from the observed sequence and attaining superresolution of a region of interest, ideally in real time, remains to be tackled. References [1] C.A. Berenstein and E.V. Patrick, Exact deconvolution for multiple convolution operators - An overview, plus performance characterizations for imaging sensors, Proceedings of the IEEE, Vol. 78, No.4, April 1990, pp.723-734. [2] S.P. Kim, N.K. Bose and H.M. Valenzuela, Recursive reconstruction of high resolution image from noisy undersampled multiframes, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 38, 1990, pp.1013-1027. 10
[3] S.P. Kim and W.Y. Su, Recursive high-resolution reconstruction of blurred multiframe images, IEEE Proceedings on International Conference Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, 1991, pp.2977-2980. [4] N.K. Bose, H.C. Kim and H.M. Valenzuela, Recursive total least squares algorithm for image reconstruction from noisy, undersampled frames, Multidimensional Systems and Signal Processing, Vol. 4, 1993, pp.253-268. [5] A.M. Tekalp, M.K. Ozkan and M.I. Sezan, High-resolution image reconstruction from lowerresolution image sequences and space-varying image restoration, IEEE International Conference Acoustics, Speech, and Signal Processing, Vol. III, San Francisco, CA, March 1992, pp.169-172. [6] A.J. Patti, M.I. Sezan and A.M. Tekalp, Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time, IEEE Transactions on Image Processing, Vol. 6, No. 8, August 1997, pp.1064-1076. [7] M. Elad and A. Feuer, Restoration of single super-resolution image from several blurred, noisy and undersampled measured images, IEEE Transactions on Image Processing, Vol. 6, No. 12, December 1997, pp.1646-1658. [8] M. Elad and A. Feuer, Superresolution restoration of an image sequence: Adaptive filtering approach, IEEE Transactions on Image Processing, Vol. 8, No. 3, March 1999, pp.387-395. [9] M. Irani and S. Peleg, Improving resolution by image registration, CVGIP: Graphical Models and Image Processing, Vol. 53, No. 3, May 1991, pp.231-239. [10] M. Irani and S. Peleg, Motion analysis for image enhancement: resolution, occlusion, and transpanrency, Journal of Visual Communication and Image Representation, Vol. 4, 1993, pp.324-335. [11] A. Zomet and S. Peleg, Applying superresolution to panoramic mosaics, IEEE Workshop on Applications of Computer Vision, Princeton, October 1998, pp.286-287. [12] B. Cohen and I.A. Dinstein, Resolution enhancement by polyphase back-projection filtering, Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, Vol. 5, 1998, pp.2921-2924. 11
[13] R. Szeliski, Video Mosaics for virtual environments, IEEE Computer Graphics and Applications, Vol. 16, March 1996, pp.22-30. [14] S. Mann and R.W. Picard, Video orbits of the projective group: A simple approach to featureless estimation of parameters, IEEE Transactions on Image Processing, Vol. 6, No.9, September 1997, pp.1281-1295. [15] S. Lertrattanapanich, Image registration for video mosaic, Master s thesis, The Pennsylvania State University, May 1999. [16] R.L. Lagendijk and J. Biemond, Iterative identification and restoration of images, Kluwer Academic Publishers, Massachusetts, USA, 1991. [17] R.Y. Tsai and T.S. Huang, Multiframe image restoration and registration, Advances in Computer Vision and Image Processing, Vol. 1, 1984, pp.317-339. [18] H. Stark and P. Oskoui, High resolution image recovery from image-plane arrays using convex projection, J. Opt. Soc. Am. A, Vol.6, 1989, pp.1715-1726. 12