View Synthesis by Image Mapping and Interpolation

View Synthesis by Image Mapping an Interpolation Farris J. Halim Jesse S. Jin, School of Computer Science & Engineering, University of New South Wales Syney, NSW 05, Australia Basser epartment of Computer Science, University of Syney Syney, NSW 006, Australia {farrisjh,jesse}@cse.unsw.eu.au Abstract This paper is about implementing an analysing a strategy of generating the intermeiate views of a scene from a pair of images taken from ifferent position an orientation. In particular, the algorithm uses image mapping for pixel registration purpose an a form of interpolation to prouce the in-between view. The main purpose is to note the valiity an limitation of such metho an to explore potential improvement an evelopment. This work may also be a soli base for future works incluing the application of igital vieo. Keywors: View Synthesis, Immersive Vieo, Vieo Introuction The concept of virtual reality is growing an people are now experiencing the flexibility of viewing control in various programs such as computer games, moelling program, an many more. Program written in VRML (Web Consortium 00) has an ability to navigate the objects in the scene using mouse an control buttons. This means that observers can choose where an how to see the objects anyway they like. Having the same sort of experience with real vieo scene requires more approach. The basic iea of this problem is to generate a view of a scene from the angle requeste by the observer. Clearly, the observer may request a viewing location where a physical camera oes not exist, so it nees to have some way to generate this view using the available cameras. Logically, only a limite number of cameras can be place. From these cameras it nees to prouce the requeste view. Fortunately, if the information from the multiple cameras is combine, it is possible to preict an therefore generate new viewpoints. In particular, the problem of generating in-between views from a set of images is usually calle view synthesis. View synthesis can be applie to both still images an vieo, where vieo is simply a sequence of still images or frames.. Strategy Accoring to Pollar et al. (Pollar, Pilu, Hayes & Lorusso 998), there are various approaches to this problem an they are ivie into three main categories Copyright 00, Australian Computer Society, Inc. This paper appeare at the Pan-Syney Area Workshop on Visual Information Processing (VIP00), Syney, Australia. Conferences in Research an Practice in Information Technology, Vol.. avi agan Feng, Jesse Jin, Peter Eaes, Hong Yan, Es. Reprouction for acaemic, notfor profit purposes permitte provie this text is inclue. base on the unerlying technique employe in the system. They are reconstruction-projection, projective transfer, an forms of image interpolation/morphing. The first category approaches the problem by reconstructing the objects in the scene as a moel. The colour of the objects themselves is obtaine by texture mapping. In orer to perform the reconstruction itself, the program requires some knowlege about the camera calibration. This inclues the coorinate position, the angle an orientation, an the optical characteristic of the camera. Such metho is employe in the Immersive Vieo system, as escribe by Moezzi et al. (Moezzi, Katkere, Kuramura & Jain 996). Projective transfer, as in the first category, uses ense corresponences to preict where pixels en up in the virtual projectively istorte images. This means that all pixels in the source images are transferre to the view image using projection mechanism. The last category uses simple image interpolation an intensity blening to generate in-between view from the original sets. This approach will be the base of this project.. Relate Work.. View Synthesis by Ege Transfer Pollar et al. (Pollar, Pilu, Hayes & Lorusso 998), worke on a novel automatic metho for view synthesis from a triplet of uncalibrate images base on trinocular ege matching followe by ege transfers using linear interpolation, occlusion etection an correction an finally renering. This metho has the avantage of performing a much simpler computation than having to projectively calculate the pixel location or reconstructing the objects. Accoring to Seitz an yer (Seitz & yer 995), the in-between views prouce by interpolation are physically vali if the images are first re-projecte to conform to parallel camera geometry. However Pollar et al. took this approach an prouce an approximate view using linear interpolation. The metho begins by performing registration. It uses three source images, arrange in a triangular form. The eges are extracte in each image an then processe to prouce full ege corresponences between the image triplets. Then, the matche eges are transferre to the esire viewpoint using simple linear interpolation. After that, it finally performs the renering, which uses intensity blening technique. The fact that it uses three images means that the observer can actually moves freely

in two imensions, within the triangular shape efine by the camera... Immersive Vieo Another system calle the Immersive Vieo was evelope by a number of researchers from University of California (Moezzi, Katkere, Kuramura & Jain 996). This system is base on builing ynamic reconstruction of the scene using information from multiple cameras to prouce the Environment Moel. The Environment Moel has a global view of the scene, incluing camera locations an parameters, an the state of all objects in the environment. The system consists of four components. The vieo ata analyser is aime to etect an track ynamic objects, which can be configure to observer s interest. The Environment Moel Builer uses the static information of the scene (i.e. the objects that are not moving) an the information from previous component to obtain the moel for the environment. Finally observer can select viewpoints using the viewer interface an see the result prouce by the visualiser. The avantage of this approach is that it can prouce higher quality results because of the sophisticate environment builer. It is not only reconstructing the shape but also making estimation on the worl position of the objects an finally performing object ientification an tracking. They use voxel representations to moel the ynamic objects. It is performing preiction on where each pixel on the image woul lie in the worl of voxel or volume elements. Voxel is the equivalent of pixel in the space. esign an Implementation The goal of this project is to buil a system that can be use to generate in-between view from a pair of images taken by camera. The in-between view will be an approximation of what woul be observe if there is a real camera between the existing cameras. The position of the virtual viewpoint is parameterise by the ratio of the istance between the virtual viewpoint an one of the source images with the istance between the two source images. Figure : High level view of the system. Methoology The process of view synthesis escribe in this thesis is broken own into two main stages (see Figure ). The registration stage is responsible on retrieving the corresponences between the image pairs. For this project, a simple image-mapping algorithm is use. More emphasis will be put towars the secon stage. Note that the first an secon stage is more or less inepenent. This means that no matter how the registration is performe, as long as the ata passe from the first stage to the secon stage is in consistent format, the algorithm will work. The secon stage is to actually prouce the novel view or in-between view using linear interpolation technique. This is in conjunction to the work one by Pollar et al. (Pollar, Pilu, Hayes & Lorusso 998). Once again, ue to the inepenence nature of the two processes, a change in the secon stage metho shoul not affect the first one. Thus, both stages coul be improve or reworke in the future quite inepenently or with little ajustment.. Image Registration The aim of registration is to obtain some kin of relationship between the image pairs. This relationship coul be interprete in various ways. In all cases, it is necessary to get some kin of corresponences between pixels on the left image an pixels on the right image. In the simplest case, the corresponences woul be a full pixel-to-pixel mapping. Figure : Image Mapping Image mapping (Tang 00) is use to prouce the full pixel-to-pixel mapping between the first image an the secon image. Image mapping is a process to map an image (in rectangle shape) into another image of arbitrary quarilateral shape (see Figure ). Performing image mapping is like stretching the original image into the new image efine by the quarilateral. The quarilateral is specifie by the observer through the four corner points. With respect to Figure, the following equations must hol for all mappe points: = an = With this property in min, it is actually easier if the mapping is one reversely. That is, for every pixel p insie the quarilateral in the estination image, it calculates the respective pixel in the source image that will map to p. The calculation is as follows: X = + Y = + *( *( + ) + )

The values of,,, an are calculate using shortest istance (perpenicular istance) formula from the point (x 0, y 0 ) to a line A*x + B*y + C = 0. The formula is: A* x + = A 0 B * + B y + C The overall process consists of the following step: Obtain the image to be mappe an four corner points efining the quarilateral. etermine the pixels insie the quarilateral. For each pixel insie the quarilateral, calculate the corresponing pixel in the source image. Retrieve the colour of the pixel in the source image to be the colour of the pixel in the quarilateral. etermining that a pixel is insie an arbitrary quarilateral is not straightforwar if it is to be one efficiently. A technique usually use for polygon filling in computer graphics, scanline algorithm, is use for this task (Lambert 00). The formula to get the pixel coorinate in the source image from the pixel in the quarilateral efine earlier prouces non-integer coorinate. One way to solve it is to simply roun the real coorinates into integer coorinates. This is normally calle nearest neighbour sampling. While it is simple to o this, the result is not satisfactory an in orer to prouce smoother image, one nees to o some interpolation. This is escribe in the next section.. Image Sampling Occasionally, program wants to access a pixel at noninteger coorinate. The image mapping proceure is one example an the image interpolation view synthesis in. is another one. The nearest neighbour sampling simply converts the non-integer coorinates into integer coorinates by rouning process. This is not satisfying in a lot of cases. The other way is to use the colour of neighbour pixels to generate the colour at the non-integer coorinate of interest. A reasonably popular metho is the bilinear interpolation (Tang 00). Figure : Bilinear Interpolation 0 Bilinear interpolation linearly interpolates along each row of the image an then uses the result in a linear interpolation own each column in the image. This means a linear interpolation is performe in two irections. With this metho, each estimate pixel in the output image is a weighte combination of its four nearest neighbours in the input image accoring to the following equation (refer to Figure ): f ( x, y) = ( p ) *( q) * f + q * ( p) f + where p,q [0,] 00 * 0 p * ( q) * f + f 0 p * q * The colour of the neighbour pixels is calculate as follows: f nm = f ( x + n, y0 +. Image Interpolation 0 m The image interpolation process requires four inputs. Firstly, it nees to have the image pairs, which is neee for renering pixel colours. Seconly, the output of the image registration process, which is the ense pixel mapping. Finally, it nees to know where observer intens to see the scene. As note earlier, this will be efine by the ratio of the istance between the virtual viewpoint an one of the source images with the istance between the two source images. Assume this is calle lamba,, which is range from 0 to inclusive. Figure : Image Interpolation For each pair of points from the set given by the first stage process, a linear interpolation is performe. Let say the point p in the first source image maps to p in the secon source image. Then the point where they en up in the virtual viewpoint efine by is calculate using linear interpolation as follows: P = ( λ ) * p + p λ * Once the interpolate point P is known, it is a matter of etermining what colour this point shoul be. This is one by combining or blening the colour of the pixel of the first source image (at point p ) an the pixel of the secon source image (at point p ), as suggeste by Pollar et al. (Pollar, Pilu, Hayes & Lorusso 998). The contribution of each colour is etermine by the istance of the virtual viewpoint to each of the source images. Thus, if the colour of pixel p is a an the colour of pixel p is b then the resulting colour of pixel P is: C = ( λ ) * a + λ * b )

Notice the similarity of the calculation proceure of the interpolate point an colour. It is clear that the success of this metho of view synthesis epens a lot on the accuracy of the pixel mapping prouce by the registration process. A mapping of p an p is sai to be accurate if pixel at p in one image is actually the same object (or part of object) as the pixel at p in the other image. If this conition is satisfie for all mapping, then the interpolation result will be correct. Note that ue to the linear interpolation routine, the istance an angular isplacement between the two cameras must not be too large, otherwise it is necessary to re-project the source images to a parallel camera plane before performing the interpolation (Seitz & yer 995). This is to ensure the valiity of the interpolate shape of objects..5 Interpolation Correction Without any further work, the result of the interpolation proceure woul be incomplete in the sense that not all pixels in the interpolate image are fille. The incompleteness is ue to the rouning effect of the interpolate coorinate. For example, there may be some pixel-to-pixel mappings that are interpolate to the same pixel estination. On the other han, there are some pixels in the interpolate image that no pixel-to-pixel mapping woul interpolate to. first interpolation stage, this table is fille throughout the process. This table can be realise as an array of the size of the interpolate image. Each entry is a pair of the pixel origins. Un-initialise entry can be interprete as if the pixel has not been fille. With the knowlege of the en points in all three images (the sources an the interpolate image), it is now straightforwar to complete the proceure. Suppose the left image source (see Figure 5) has en points (x a, y a ) an (x a, y a ). Note that the segment in the interpolate image is always on the same scanline, however the segments in the source images is most likely not a horizontal line, i.e. y a y a an y b y b. The coorinate of pixel in the left source image that shoul map to (x, y) will be: ( x, y ) = ( x + ( x x )* λ, y + ( y y )* a a a a a a a a λ (x b, y b ) can be calculate in similar fashion. After this, the colour of pixel (x a, y a ) in the left image is interpolate with the colour of pixel (x b, y b ) in the right image using the interpolation formula as before..6 User Interface The purpose of the user interface is to guie the user to perform the step-by-step routine to o view synthesis. This begins from loaing the image until viewing the interpolate novel view. The mapping proceure is quite ifficult, as it requires the user to choose the best corner points by trial an error means. This user interface helps user to easily compare the result of the image-mapping routine an the source image that it shoul be mappe to. The graphical user interface is evelope in Java using Swing library. ) Figure 5: Image Interpolation correction ue to these missing pixels, it is necessary to perform correction routine. In this project, for each unfille or missing pixel, it etermines the corresponing pixels in both image sources. Using the colour of these pixels, it performs interpolation as usual to get the colour for the missing pixel. Figure 5 illustrates a situation in the interpolate image, where a series of consecutive pixels (in scanline orer) nees to be fille after the interpolation routine. The first points at each en that have been fille are enote by (x, y) an (x, y) respectively. Since the calculation of the point in the image sources are base on the relative istance from the en point of the missing region, namely (x, y) an (x, y), then it is require to know which pair of pixels prouce (x, y) an (x, y) in the first place. Since this correction proceure is one after all mapping in the ense corresponence set is interpolate completely, the intersection must be store somewhere. For this purpose, there is a table that serves just for that. Thus, uring the Figure 6: The graphical user interface The main interface consists of two-canvas panel place sie by sie. They are use to loa the image source pairs. Both panels have the basic ability to scroll image (move image in two imensional) an zoom the image (in an out). For the right panel, there is aitional feature that allows user to choose the corners require for the image mapping. For this reason, user must loa the image to be use for image mapping source on the left an the other image on the right.

The panel on the right has the ability to loa two images simultaneously, although they can only be viewe one at a time. This is very helpful to compare the result of mapping to the target image. Remember that the goal is to prouce a mappe image from left source image that is as close as possible to the right source image. irect comparison can be one by toggling the image using the available button. The two images will be shown alternately on the same position an at the same scale factor. Finally, after a reasonably goo mapping has been obtaine, the interpolation can be performe. User nees to specify the interpolation factor, namely. The result of the interpolation will be shown on a similar panel in a new winow. Experimental Results. Test Configuration All testing are performe uner an Intel Pentium III 7 MHz machine with 56 MB PC SRAM. The operating system use is Re Hat Linux 7.. Some interpolation results can be foun in Figure 7 an 8. Images are in gif format with size of 5 x 88 pixels.. Result Analysis.. painting scene This test case presents a painting hung on a wall (Figure 7). The objects here are relatively flat an the result of the mapping is quite goo. Because of that, the interpolate views are mostly fine. Note that the right part of the image contains a non-flat region (beyon the wall) an this causes ba effects in the interpolate views. Image mapping is not aware about this an hence the mapping in those regions is simply wrong. This test shows how the system works on a real flat scene an prouces reasonable in-between views out of the image pairs... esk scene This time, the scene is quite complicate (Figure 8). It shows a lot of objects taken from a close range. The mapping is chosen with the monitor shape as reference. As a result, there are a lot of artefacts in the objects surrouning the monitor. The most prominent ones are the keyboar rawer an the esk lamp. The shaows are quite obvious. It appears that there are two objects each in the interpolate view. It is simply because of the mismatches in the registration prouce by image mapping. The lamp shape from the left view ene up in ifferent position than the lamp shape from the right view. The test shows that this system is not really suitable for complicate non-flat images taken from close range. This is because at short istance, a little camera isplacement causes a lot of changes in the way object is seen. By relying on irect image mapping for the registration, there will be a lot of mismatches an the resulting interpolate views can not be goo. Thus, it is really necessary to insert aitional stage before the interpolation, which is responsible to re-ajust the image mapping result. espite all that, in general the interpolate views are pretty reasonable. The effect of looking at the scene from the intermeiate in-between virtual viewpoint is really shown. This means that the simple linear interpolation is actually a reasonable metho to generate the novel views. Improvements nee to be mae on the registration part an once it is one, with the same interpolation process, the resulting interpolate views coul be mae better. Conclusion This project has successfully implemente a system to generate in-between views from a pair of images. In orer to get a satisfying interpolation result, it is important to prouce a goo mapping of pixels between the image pairs. The current solution, which uses image-mapping algorithm, is generally acceptable for a flat scene, but not so well in real scene. Even for the conventional flat scene, it is quite ifficult to get the best mapping possible. This system is an early phase of a comprehensive flexible viewing system. Obviously, there are a number of aspects that coul be improve in orer to get more accurate result an more efficient usage. To name a few of them: Automation of mapping proceure is essential to increase the system usage efficiency. Mapping correction proceure for non-flat scene. Alternative solution to obtain pixel corresponences, since image mapping has its own limitation. View synthesis by objects reconstruction. Application of view synthesis to generate vieo. 5 References Moezzi, S., Katkere, A., Kuramura,. & Jain, R. (996): Immersive Vieo. Proc. IEEE Virtual Reality Annual International Symposium. Pollar, S., Pilu, M., Hayes, S. & Lorusso, A. (998): View Synthesis by Trinocular Ege Matching an Transfer. Proc. The Ninth British Machine Vision Conference. Seitz, S. & yer, C. (995): Physically-Vali View Synthesis by Image Interpolation. Proc. IEEE Representation of Visual Scenes. Tang, T. (00): Software Base Vieo Processing Using Microsoft irectshow. Master of Information Technology thesis. University of Syney, Australia. Web Consortium (00): Web Consortium. http://www.vrml.org/ Lambert, T. (00): Polygon Filling. http://www.cse.unsw.eu.au/~cs/slies/bres/scanl ine.html

Figure 7: The interpolation results of painting scene Figure 8: The interpolation results of esk scene