WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES

Transcription

1 WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES M. Sarim 1, A. Hilton 2, J.-Y. Guillemaut 3 University of Surrey, Guildford, UK. 1 m.farooqui@surrey.ac.uk 2 a.hilton@surrey.ac.uk 3 j.guillemaut@surrey.ac.uk Abstract Digital image matting is a process of extracting foreground objects from an image. This is extremely challenging for natural images and videos because of its ill posed nature. Initial user interaction is required to aid the algorithms in identifying the definite foreground and background regions. Recently techniques have been developed to estimate the alpha matte of an image using multi-view images of a foreground object. However these algorithms are only capable of handling narrow baseline views having small intensity and structural variations in the foreground. In this paper, we propose a novel non-parametric approach to generate alpha matte for wide-baseline multi-view images having different inter-view foreground appearance. Keywords: Digital matting, alpha matte, multiple view, trimap, wide-baseline. 1 Introduction Digital image matting is a classical problem of computer vision where a foreground object is extracted from an image along with its pixel-wise opacity to form a composite with a desired background. The problem has been extensively studied because of the increasing number of special effects in the media industry. An image can be thought of as a composite of three layers namely foreground, background and opacity generally referred to as an alpha matte. A composite image was first mathematically formulated in terms of these layers by Porter and Duff [17] as I = αf + (1 α) B. (1) Equation (1) is known as compositing equation, where I, F and B are the composite, foreground and background layers while α represents an alpha matte. The alpha matte is an image layer providing pixel s foreground opacity in the range of [0, 1]. The value α = 0 or α = 1 defines the definite background or foreground pixel respectively and 0 < α < 1 represents a mixed pixel with blending proportion defined by α. The solution of equation (1) is not possible as it is underconstrained. In a RGB colour space we have to solve the equation for seven unknown (all the variables on the right hand side) given only three equations corresponding to the RGB channels. The equation is constrained in a studio environment by using homogeneous known background colour typically blue or green [20]. The assumption that foreground colour distribution is different from background colour provides a straightforward solution to the compositing equation for alpha. However in natural scene these constraints are not available and the solution of equation (1) becomes extremely challenging. In natural images the constraints on the foreground and background regions are provided by the user in the form of a trimap. A trimap is typically a hand drawn segmentation of an image into three regions namely definite foreground, background and the unknown region. The regions are represented by white, black and gray colour on the trimap lattice respectively. A typical trimap of a natural image is shown in Fig 1 along with the estimated alpha matte and the new composite. Matting algorithms then utilise the statistics of the definite known regions to estimate alpha values for the unknown region where the pixels are usually a blend of foreground and background colour. Recently techniques [10, 12, 13, 16, 25, 26] have been developed to exploit multiple view statistics to estimate alpha. The main limitations of these approaches are their incapability to handle wide-baseline views and requirement of epipolar constraints. These algorithms work on the fundamental assumption that the foreground appearance in terms of intensity and shape is invariant across multiple views. This assumption only holds for narrow baseline views with similar projection such as camera array. The problem becomes more difficult for wide-baseline views captured by a surrounding camera setup because: (1) there is a significant change in foreground projection even in adjacent views due to occlusion and projective distortion (2) variation in luminance due to incident light and shadows result in changes in appearance with viewpoint and (3) background changes for different viewpoints. In this paper we present a novel non-parametric approach to estimate an alpha matte for wide-baseline views. Previously, inpainting techniques [8, 7] and view interpolation [9] have successfully used similar non-parametric approaches to represent local image statistics in a single view. Our algorithm uses a mean shift clustering [5, 6] to propagate a key view trimap across multiple views without using epipolar constraints. Once a trimap is transfered to the neighboring view of the key image, template based non-parametric matting algorithms are applied to extract the alpha matte. Since

2 (a) (b) (c) (d) Figure 1: (a) Original image, (b) trimap, (c) estimated alpha matte and (d) new composite. Images are taken from the data-set provided by [24]. our technique only relies on the user aided information available in the key view it can handle images captured by an ordinary uncalibrated cameras without fixation constraints. A fixation constraint is an assumption that the foreground object is centrally located across multiple views. The approach significantly reduces the user interaction required to extract alpha matte for multiple wide-baseline views which can later be used for 3D modeling and reconstruction or object insertion. 2 Related work 2.1 Single view matting Natural image matting is a well studied field of computer vision. Unlike studio images, natural images have no constraints on foreground and background colour. Therefore to initialise an algorithm, user interaction is required to aid the definition of foreground and background regions in an image. Once the definite foreground and background layers are identified by the user, algorithms then exploit the global or local statistics of these regions to estimate the alpha value of the undefined region. Approaches like [4, 11, 18] fit statistical models to the local foreground and background pixels, alpha value for the unknown pixels are computed using these local models. An isotropic mixture of Gaussian approach was proposed by Ruzon and Tomasi [18] to model the local known regions. The alpha value for an unknown pixel is computed by using these mixture of Gaussian distributions. Hillman et al. [11] extended the idea of [18] by using anisotropic distributions as the intensity variation in an image forms prolate rather then spherical clusters in colour space. They utilised principal component analysis to identify the major axis of these anisotropic clusters which are then used to estimate the alpha value of a local unknown pixel. Chuang et al. [4] formulated the matting problem in the well known Bayesian framework. They used a similar isotropic approach to [11], to model the local known pixels but unlike [11] they also considered the already estimated foreground and background colour of unknown pixels with in a predefined spatial window. Technique developed by Berman et al. [2], now available as a Corel plug in named Knockout, assumed the nearby regions to be locally smooth. Alpha value of an unknown pixel is computed by taking the weighted average of the local foreground and background colour values. Strong assumption made by these techniques regarding the smoothness and correlation of the nearby known pixels introduced a requirement for a precise trimap. Since these techniques are heavily biased toward the colour distribution of local known regions they tend to suffer if the local foreground and background clusters overlap. To avoid the errors with this local dependency, approaches like [1, 23] associate Gaussian mixture models to the known foreground and background region globally. Misclassification of colour samples is the fundamental limitation of sample based techniques. Algorithms like [15, 22] use local affinities to alleviate this problem. Poisson matting [22] assumed that the intensity variations in the foreground and background region is locally smooth. They computed the alpha value by solving the Poisson equation with the matte gradient field. Levin et al. [15] utilised the local smoothness assumption to fit a linear model to the foreground and background colours resulting in a closed form solution for alpha. A technique called Robust matting [24] is proposed which uses local colour sampling as well as the affinity approach similar to [15]. The algorithm uses optimised colour sampling to extract the higher confidence colour samples which are then combined with the affinity to obtain a matting energy function. Alpha values are estimated by minimising this energy function. Although affinity based approaches overcome the limitation of sample misclassification they are prone to accumulation of small errors in the final alpha matte because of their propagation behavior in estimating the alpha value. Recently a non-parametric template based matting technique is proposed in [19] which used known or globally inpainted background plate. The foreground colour for an unknown pixel is estimated by the median colour of the centre pixel in the few most similar local foreground templates. Since template matching preserve the spatial information along with colour, the algorithm is robust against highly textured natural images. The template based approach tends to produce error in the regions where the inpainted background is not similar to the true background.

3 Figure 2: Flow chart for the wide-baseline multi-view alpha matting 2.2 Multiple view matting All the techniques mentioned above use single view information to extract an alpha matte. If multiple views are available, an algorithm has more information at its disposal to better estimate an alpha matte for each view. Nearly all the multi-view matting technique assume the foreground is invariant across the views while the background is different. Using this assumption they formulate the matting problem in a triangular fashion [20], with pixels having a single foreground colour and multiple background colours. Approaches [12, 13] have used pixel variance across the views to extract a variance image. This reference image is then thresholded to generate the trimap. The techniques used the nearby variance information of the known regions to estimate the final alpha matte. Both of these techniques have a very narrow baseline of around 5cm, Joshi et al. [13] estimate an alpha matte for a single reference view while Hyun et al. [12] extended their approach by sharing the trimap across multiple views. Alpha mattes are generated by merging the foreground edges in the normal views and histogram equalised views. Wexler et al. [25] estimated the alpha matte using the relative motion of foreground and background. They assume that the rigid foreground is sweeping over a background in multiple images. They constructed the clean background plate from the planar background motion and then formulate the problem in a Bayesian framework to estimate the alpha matte. Their technique suffers for blurred foreground regions and require a planar motion to construct the clean background plate. Won et al. [26] build a high dimensional feature space from multi-resolution rectified stereo pairs in a Gaussian pyramid. They use local linear embedding to construct a trimap after which Bayesian matting [4] is employed to extract the final alpha matte. Hasinoff at el. [10] formulate the matting problem as estimating the 3D boundary curve and foreground colour that best fits the multi-view images. They used depth information across multiple views to estimate the background and foreground boundary colour. Their results are prone to stereo inaccuracies and do not exploit the colour statistics of an image. In defocus video matting McGuire et al. [16] used a specialised setup comprising three imaging sensors sharing the same centre of projection. The multiple views captured from this set up, which are focused separately on foreground and background regions, aiding the automatic trimap generation. They assume that the foreground, background depths and camera parameters are known and formulate the matting problem as an error minimization of quadratic function of alpha and foreground colour. Their method cannot handle fast moving blurred regions. Graph cut optimization approach is used by Campbell et al. [3] to perform a binary segmentation of a given view into foreground and background. They used fixation constraints to obtain the seed pixels, from widebaseline views, to construct the initial Gaussian mixture model of foreground colour. The process is iterated to improve the colour model until convergence. Image edges are then combine with the obtained colour model and graph cut algorithm is then applied to achieve the final segmentation. Since they rely on Gaussian mixture colour model their technique suffer when the foreground and background colour distribution overlaps. All multiple views matting algorithms to date assume a narrow baseline( separation in view orientation) between views such that the foreground has similar appearance. In this paper we address the problem of wide-baseline matting for camera views with (> 30 0 ) separation such that there are large changes in foreground appearance. Our approach does not require any camera calibration or specialised setup and does not make hard assumptions on the foreground colour and position across multiple views. The algorithm can handle wide-baseline views having significantly different foreground appearance. 3 Wide-baseline multi-view alpha matte estimation Our algorithm is composed of two main steps: (1) inter-view trimap propagation, and (2) alpha matte estimation using a nonparametric matting algorithm. An overview of our algorithm is shown in Fig 2. We have used three high definition cameras in a roughly 90 0 arc with 45 0 between views to capture foreground. To represent multiple views we use the notation [I l, I c, I r ] for left, centre and right view respectively. The main assumption of our approach is the static and known background for all the views represented correspondingly as [B l, B c, B r ]. 3.1 Inter-view trimap propagation To propagate a trimap through multiple views, initially, user has to define a trimap T c for the centre view I c which defines definite foreground and background pixels. The available backgrounds are clean from foreground shadow, this causes

4 (a) (b) (c) (d) Figure 3: (a) Centre view I c, (b) centre view background B c, (c) bimap and (d) user defined trimap T c. error in proper labeling of the region in the views [I l, I r ] which are contaminated by the foreground shadow. To overcome this problem we have modeled the shadow region present in the centre view as background by difference keying of the view I c from the pure background. Since we are dealing with wide-baseline views having orientation of = 45 0 making the background across the views largely different, difference keying also helps us to narrow down the search region for trimap propagation. Since the pure background for all the views is known, the trimap T c could be defined initially by performing a binary segmentation of I c by removing the background B c from it. The centre view I c and its background B c are shown in Fig 3(a,b). This segmentation, splits I c into two regions: (1) the foreground, blended and shadow contaminated background pixels and (2) the definite background pixels, let us call it a bimap. The subtraction is performed patch-wise rather than pixel-wise to avoid the background noise and mis-labeling of fine blended foreground structures as background. Image I c can be represented as a function of Euclidean coordinates as I c (x, y) so as its background B c (x, y). If a square patch of size [m] is used, the subtraction is performed according to the equation S(x, y) = g s= g t= h h (I(x + s, y + t) B(x + s, y + t)) 2, (2) where S(x, y) is the subtraction map which is the function of sum of square difference, while g = h = (m 1)/2. The definite background pixels, represented by blue in Fig 3(c), are labeled using a background distance threshold τ b as T c (x, y) = background if, S(x, y) τ b not background otherwise (3) The not background region B, represented by green in Fig 3(c), consist of the definite foreground, blended and shadow contaminated background pixels. The bimap is converted into a trimap T c by the user, manually defining the shadow region as background. To avoid the user interaction in defining the shadow region explicitly, a foreground extraction technique [14] could be used. Fig 3(d) shows the refined trimap T c where the foreground, blended and shadow regions are represented in traditional trimap form of white, gray and black colour respectively Template clustering using mean shift An inter-view trimap propagation, [T l T c T r ] could be achieved by brute force template matching between I c and [I l, I r ]. T l and T r are the propagated trimaps for the left and right view respectively. There are two main limitations which make the brute force implementation prohibitively expensive: (1) the wide-baseline camera setup projects a different foreground aspect in the left and right views and (2) in order to reliably estimate wide-baseline correspondence an initial surface reconstruction such as the visual hull is required [21]. However this requires the foreground segmentation to be known a priori. This work is focused on obtaining the segmentation and therefore a coarse reconstruction can not be performed to constraint the search and the orientation of surface patches. To alleviate the aforementioned problem, mean shift algorithm [5, 6] is employed to reduce the template search space in the centre view I c. We define two template spaces, namely foreground template space T f and background template space T b, constructed by placing a square patch of size n at every foreground and background pixel in I c in accordance with T c respectively. Each template effectively has 3n 2 dimensions in RGB space. Both template spaces are clustered individually using mean shift algorithm with spherical radius of r c, for the cluster window, in RGB colour space and mean shift vector threshold ɛ of 0.1. This template grouping reduces the template search space to an order of 10 2 from Mean shift is performed as [ ] C f k, cf,m k [ ] Cl b, c b,m l = meanshift ( T f, r c, ɛ ) = meanshift ( T b, r c, ɛ ) (4) where the foreground and background clusters are represented by C f k and Cb l respectively while c f,m k and c b,m l denote their mean template. k = 1,.., n f l = 1,.., n b are the number of foreground and background clusters formed Trimap label propagation Although the mean shift clustering reduces the search space considerably, further reduction in computational cost can be achieved by labeling the definite background in left and right views, [I l, I r ], by subtracting their respective backgrounds [B l, B r ]. We have used a similar approach to equation (3) to classify the definite background and not background

5 (a) (b) (c) (d) Figure 4: (a) Right image I r, (b) bimap of the view I r after subtracting the background B r, (c) unrefined trimap T r after label propagation to the green region of the bimap(white foreground, yellow background, red unknown pixels) and (d) the refined trimap in the traditional (white, gray, black) representation. regions, referred to as a bimap. Fig 4(b) shows the bimap for the right view I r, blue represents the definite background while green corresponds to the not background region. Now the problem is reduced to populating the trimap labels in the not background region, B, in the trimaps [Tl, T r] corresponding to views [I l, I r ]. Consider one view at a time, let us take I r. The trimap propagation T c T r is achieved by comparing every template in the not background region, B, of T r to the foreground and background mean template cluster spaces c f,m k and c b,m l respectively. For a B pixel p, a template P is extracted by localising a square patch of dimension n. The template should be dimensionally consistent to c f,m k and c b,m l for comparison. Now the patch P is compared to foreground and background mean template cluster spaces individually. The minimum normalised sum of square difference (NSSD) from the two search spaces is given by d f 1 ( (p) = min k=1 n f n 2 P, c f,m k d b 1 ( (p) = min l=1 n b n 2 P, c b,m l ) ). (5) Where d f (p) and d b (p) denote the minimum NSSD of pixel p to the mean foreground and background cluster space. Function (A, B) gives the sum of square difference in RGB space between the templates A and B, while n 2 is the number of pixels in the patch used for normalising SSD. The process is iterated for all the pixels in the not background region B of T r. We can visualise these minimum normalised sum of square differences as a difference image. Let us denote these foreground and background difference images by D f and D b respectively. The trimap label for all the B pixels is assigned by thresholding the ratio of the difference images separately for the foreground and background. For a B pixel p the label is tagged as foreground if, D b /D f ɛ f T r (p) = background if, D b /D f ɛ b (6) unknown otherwise. Where ɛ f and ɛ b are the foreground and background thresholds used for classification. The algorithm is iterated until all the B pixels are assigned a trimap label, an example is shown in Fig 4(c) for a right image I r in Fig 4(a). For the sake of visibility, the foreground, background and the unknown pixels, in the B region in Fig 4(c), are represented by white, yellow and red respectively. The trimap obtained suffers from some mis-classification which, if not rectified, can lead to erroneous alpha matte. Therefore a trimap refinement step is necessary prior to final alpha matte estimation process Trimap refinement The mis-classification is mainly caused by: (1) image noise, (2) presence of specular surfaces and (3) overlap of foreground and background distributions in colour space. We have used morphological operations to remove the small errors present in the trimap T r that occurred due to the image noise. To rectify the large erroneous regions, caused by specular reflection and intersection of foreground and background colour distributions, we assume that the foreground is opaque. Initially the regions having area less than the predefined area threshold ɛ a are identified. Let us take one of these regions as R regardless of its type as foreground, background or unknown. The region R is then dilated to get the surrounding pixels R s. If all the pixels in R s belongs to the foreground or background, the region R assigned the label accordingly otherwise it is labeled as an unknown region. Mathematically this correction can be written as foreground R background unknown if, R s foreground if, R s background otherwise. A refined trimap of an image, in traditional white, gray and black representation, is shown in Fig 4(d). Once the trimap is refined we can extract the final alpha matte by estimating the foreground colour for all the unknown pixels and using the background colour from the available background plate. 3.2 Alpha matte estimation Given a trimap for the wide-baseline views we estimate the alpha matte using the non-parametric approach introduced in [19]. We have utilised a non-parametric approach because of (7)

6 its strong mechanism to represents the local image features, colours and textures that attempts to preserve the spatial information of an image Foreground colour estimation A square patch of size n is localised at every unknown and foreground pixel, separately, to construct the template spaces for the unknown and foreground region. Let us denote these foreground and unknown template space by F and U respectively. To estimate the foreground colour, f(p), for a pixel p in the unknown region we consider the patch associated with it as u p and find the most similar patch f q in the foreground template space F. The colour of the foreground pixel q is assigned as the foreground colour f(p) of the unknown pixel p. The templates in F which are associated with the pixels present at the foreground boundary contain unknown pixels as well. To avoid the effect of these unknown pixels, the comparison is only performed between the pure foreground pixels present in the patches of the foreground template space F. Like most of the previous matting techniques, we assume that the foreground colour in the unknown region comes from the nearby known foreground region. We initially define the minimum size, r i, of the circular search region for all the pixels in the unknown region. To identify the final size,r s, of the search region for a pixel p, first the spatial distance between the pixel p and the nearest foreground pixel is computed denoted by r p, and then it is added to the initialized minimum size r i. Therefore the dimension of the circular foreground search region for a pixel p is give by r s = r i +r p as shown by the green region in Fig 5. The main reason to introduce the distance r p is to avoid the wrong estimation of the foreground colour for the pixels laying near the far edge of the unknown region. Now let us represent all the patches in the template space F, which are spatially contained in the search region r s, by F(r s ). The most similar patch f q to the unknown patch u p can be found as 1 f q = arg min f i F(r s) n f (u p, f i ). (8) Where, (u p, f i ) has the same definition as explained in section and n f is the number of foreground pixels present in the patch, f i, used for normalization to ensure the costs are comparable. The presence of noise in the foreground region leads to segmentation artifacts, therefore a more robust approach is required to estimate the foreground colour Robust foreground colour estimation A normalised sum of square difference vector D in RGB space for a patch u p is constructed as D i = 1 n f (u p, f i ). (9) f i F(r s) To robustly estimate the foreground colour for pixel p, the difference vector, D, is sorted as D j < D j+1. Now we consider Figure 5: Green portion shows the search area in the foreground region for the unknown pixel p. the centre pixel colour for the N most similar patches in the foreground template space F(r s ) as τ = {f1, c f2, c..., fn c }. The foreground colour f(p) for pixel p is estimated as the median of τ, that is f(p) = µ 1/2 (τ). In this paper we have used the three most similar patches, that is N = 3. The algorithm is iterated for all the pixels in the unknown region Alpha estimation The alpha value for a pixel p is estimated by reformatting the compositing equation (1) as α(p) = c(p) b(p) f(p) b(p). (10) Where, c(p) and b(p) are the composite and the background colour of pixel p taken from the images I r and its background image B r. Equation (10) is iterated for all the unknown pixels in the trimap T r to yield the final alpha matte, α r, for the right image I r. Similar trimap label propagation and alpha matte estimation processes are performed to generate the alpha matte, α l, for the left image I l. 4 Results and evaluation In this paper we used three different scenes for qualitative and quantitative evaluations. All the images are captured by high definition cameras in a studio. The cameras are placed in a circular arc in front of the foreground object. The pair of views roughly have angular separation of For this paper, the values of the parameters in our algorithm are set as m = 5, ɛ f = 0.9 and ɛ b = 0.3. The algorithm is implemented in Matlab and has a runtime of (12 13 min) for a pair of images, where major proportion of the time is consumed in high dimensional template clustering and foreground colour estimation. For evaluating the estimated mattes quantitatively,

7 Original Estimated alpha matte Trimap Groundtruth Original Estimated alpha matte Trimap Groundtruth Right camera view Centre camera view Left camera view Figure 6: Views from two different dance scenes along with their propagated trimaps, estimated alpha mattes and the ground truths.

8 Estimated alpha matte Trimap Groundtruth Original Right camera view Centre camera view Left camera view Figure 7: Views from an office scene along with their propagated trimaps, estimated alpha mattes and the ground truths. the ground truth mattes are generated by using the Closed form [15] matting technique. Initially precise trimap for all the views are drawn by the user and then the Closed form algorithm is used to estimate the ground truth matte individually for all the views. 4.1 Qualitative evaluation Fig [6,7] show the three different scenes along with their propagated trimaps, estimated alpha mattes and the ground truths. For the dance scenes in Fig 6 our algorithm is able to propagate the labels correctly even though in the centre view of the first image, the trouser of the boy is largely occluded. Our matting technique easily removed the shadow region present near the models feet, which is initially identified as an unknown region during the trimap propagation. The images in Fig 7 are difficult because of the presence of a large shadow region. The trimap propagation algorithm did classify the large part of the shadow region as background but the portion close to the chair and girl s feet has a strong shadow and the black trouser, shoes and chair made it difficult to tag the region as background. The matting algorithm also performed less well in this region as there is no colour information available to exploit and the local foreground region also contains black in it. The matting technique has produced good alpha matte for the rest of the foreground region. The alpha mattes estimated by our technique does not have visible artifacts compared to the ground truth in the dance scenes. The mattes are consistent across views and do not suffer from visible segmentation inaccuracies. 4.2 Quantitative evaluation For quantitative analysis we used two error estimates: (1) mean absolute error, MAE and (2) the number of pixels which have error greater than 90% of the maximum absolute error present in the matte, represented by NME. The MAE provide the overall error present in the matte while NME gives the large misclassification of foreground and background pixels. Fig 8 shows the mean absolute error in alpha matte against the ground truth, the alpha values are scaled to [0, 255]. It is clear from the chart that the errors are mainly produced in the shadow region as the centre view in all the scenes has a small error. The office scene has large error compared to the dance images because the local foreground colour distribution is similar to the strong shadow region.

9 mattes that our technique is capable of producing good alpha mattes for the wide-baseline views. The technique is robust against shadows but has a limitation for the background pixels that are contaminated by strong shadow and have black local foreground pixels. For such pixels the estimated foreground and background colours are not distinct enough to compute the proper alpha values. Future research will concentrate on building a better model for shadows and optimise the technique for aforementioned problems that occur in rare scenes. Also the technique would be extended to deal with the challenges presented by outdoor scenes. Acknowledgement Figure 8: Mean absolute error in the three views (right R, centre C and left L) of the different scenes. Figure 9: Number of pixels having error >90% of the maximum absolute error in the three views (right R, centre C and left L) of the different scences. The number of pixels having error greater than 90% of the maximum absolute error is plotted in Fig 9. For the dance scenes the algorithm showed robustness against foreground and background pixel misclassification in the regions which are free from strong shadow. Again the technique suffered in the office image to classify the pixels in the shadow contaminated background region with similar local foreground colour distribution. Overall the majority pf pixels are correctly classified with low error. 5 Conclusion We have presented a novel approach for matte propagation across wide-baseline views without using epipolar or fixation constraints. Previous multi-view matting techniques are limited to narrow baseline views having very small variation in foreground appearance, they also require the epipolar or fixation constraint to deal with the correspondence problem. It is clear from the evaluation and the visual analysis of the alpha This research was executed with the financial support of the EU IST FP7 project i3dpost. References [1] X. Bai and G. Sapiro. Geodesic matting: A framework for fast interactive image and video segmentation and matting. Int. J. Comput. Vision, 82(2): , [2] A. Berman, A. Dadourian, and P. Vlahos. Method of removing from an image the background surrounding a selected object. U.S. Patent 6,134,346, [3] N. D. F. Campbell, G. Vogiatzis, C. Hernandez, and R. Cipolla. Automatic 3d object segmentation in multiple views using volumetric graph-cuts. Image and Vision Computing, September [4] Y. Y. Chuang, B. Curless, D. H. Salesin, and R. Szeliski. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 01, volume 2, pages , December [5] D. Comaniciu and P. Meer. Robust analysis of feature spaces: Color image segmentation. pages , [6] D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5): , [7] A. Criminisi, P. Prez, and K. Toyama. Object removal by exemplar-based inpainting. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 2: , [8] A. Efros and T. Leung. Texture synthesis by nonparametric sampling. In IEEE International conference on computer vision, pages , [9] A. Fitzgibbon, Y. Wexler, and A. Zisserman. Image based redering using image based priors. In International conference on computer vision ICCV, pages , [10] Samuel W. Hasinoff, Sing Bing Kang, and Richard Szeliski. Boundary matting for view synthesis, 2004.

10 [11] P. Hillman, J. Hannah, and D. Renshaw. Alpha channel estimation in high resolution images and image sequences. In IEEE CVPR, pages , [12] M.H. Hyun, S.Y. Kim, and Y.S. Ho. Multi-view image matting and compositing using trimap sharing for natural 3-d scene generation. In 3DTV08, pages , [13] N. Joshi, W. Matusik, and S. Avidan. Natural video matting using camera arrays. ACM Trans. Graph., 25(3): , [25] Y. Wexler, A. W. Fitzgibbon, and A. Zisserman. Bayesian estimation of layers from multiple images. In ECCV 02: Proceedings of the 7th European Conference on Computer Vision-Part III, pages , London, UK, Springer-Verlag. [26] K.H. Won, S.Y. Park, and S.K. Jung. Natural image matting based on neighbor embedding. pages , [14] H. Kim and A. Hilton. Region-based foreground extraction. In Conference of Visual Media Production (CVMP), [15] A. Levin, D. Lischinski, and Y. Weiss. A closed form solution to natural image matting. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 1:61 68, [16] M. McGuire, W. Matusik, H. Pfister, J. F. Hughes, and F. Durand. Defocus video matting. ACM Trans. Graph., 24(3): , [17] T. Porter and T. Duff. Compositing digital images. In ACM SIGGRAPH 84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages , [18] M. A. Ruzon and C. Tomasi. Alpha estimation in natural images. In CVPR, pages 18 25, June [19] M. Sarim, A. Hilton, and J.-Y.Guillemaut. Nonparametric patch based video matting. British Machine Vision Conference (BMVC), [20] A. R. Smith and J. F. Blinn. Blue screen matting. In ACM SIGGRAPH 96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages , [21] J. Starck, G. Miller, and A. Hilton. Volumetric stereo with silhouette and feature constraints. British Machine Vision Conference (BMVC), 3: , [22] J. Sun, J. Jia, C.K. Tang, and H. Y. Shum. Poisson matting. ACM Transactions on Graphics, 23(3): , [23] J. Wang and M. F. Cohen. An iterative optimization approach for unified image segmentation and matting. In ICCV 05: Proceedings of the Tenth IEEE International Conference on Computer Vision, pages , Washington, DC, USA, IEEE Computer Society. [24] J. Wang and M. F. Cohen. Optimized color sampling for robust matting. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:1 8, 2007.