Statistical Image Completion

STATISTICAL IMAGE COMPLETION 1 Statistical Image Completion Kaiming He, Member, IEEE and Jian Sn, Member, IEEE Abstract Image completion inoles filling missing parts in images. In this paper we address this problem throgh noel statistics of patch offsets. We obsere that if we match similar patches in the image and obtain their offsets (relatie positions), the statistics of these offsets are sparsely distribted. We frther obsere that a few dominant offsets proide reliable information for completing the image. We show that sch statistics can be incorporated into both matching-based and graph-based methods for image completion. Experiments show that or method yields better reslts in arios challenging cases, and is faster than existing state-of-the-art methods. Index Terms Image completion, image inpainting, natral image statistics 1 INTRODUCTION Image completion inoles the isse of filling missing parts in images. This is a non-triial task in compter ision/graphics: on one hand, the completed images are expected to be isally plasible and has little noticeable artifacts; on the other hand, the algorithm shold be efficient, becase in practice an image completion tool is often applied with ser interactions and needs qick feedbacks. For today s consmerleel mlti-mega-pixel cameras, high-qality and fast image completion is still a challenging problem. One category of image completion methods are diffsion-based [1], [2], [3], [4], [5]. These methods sole Partial Differential Eqations (PDE) [1] or similar diffsion systems, so as to propagate colors into the missing regions. They are mainly designed for filling narrow or small holes (also known as inpainting [1]). They work less well for large missing regions de to the lack of semantic textre/strctre synthesis. Another category of image completion methods are exemplar-based. They perform more effectiely for large holes. In this paper, we frther categorize exemplar-based methods into two grops: matchingbased [6], [7], [8], [9], [10], [11], [12], [13], [14] and graph-based [15], [16], [17]. Matching-based methods explicitly match the patches in the nknown region with the patches in the known region, and copy the known content to complete the nknown region. This strategy makes it possible to synthesize textres [6] and more complex strctres [7], [8], [9], [10], [11], [12], [13]. Unlike many matching-based methods sing greedy fashions, the method proposed by Wexler et al. [11] optimizes a global cost fnction called the coherence measre. This cost fnction encorages that each patch in the filled region is as similar as possible K. He and J. Sn are with the Visal Compting Grop, Microsoft Research Asia, Beijing, China. E-mail: {kahe,jiansn}@microsoft.com to a certain known patch. It has been generalized for image retargeting/reshffling in [14]. This measre helps to yield more coherent reslts for image completion. Bt becase this cost fnction inherently has mltiple disconnected local optima, this method is sensitie to initialization and to the optimization strategy. Matching patches can be a comptationally expensie operation. A fast PatchMatch algorithm [18] largely reliees this problem and is combined with the method in [11], [13]. This combination is implemented 1 as the Content Aware Fill in Adobe Photoshop, is argably the crrent state-of-the-art image completion software in terms of both isal qality and speed. Besides the matching-based strategy, exemplarbased methods can also be realized by optimizing graph-based models like Marko Random Fields (MRFs), as in the works of Priority-BP [15], [16] and Shift-map [17]. Rather than match patches, these methods rearrange the patch/pixel locations to complete the image. The rearrangement is formlated as an MRF, where each node in the graph takes its ale among a set of discrete labels. These labels can represent the absolte coordinates of the patches/pixels [16], or the relatie offsets [17]. The edges in the graph encorages that the neighboring nodes shold hae isally coherent content. The MRFs are optimized ia well-stdied techniqes like belief propagation (BP) [20] as in [16], or graph-cts [21] as in [17]. Althogh aoiding matching patches, the graphbased methods are still comptationally expensie: the complexity is linear in the nmber of labels and also in the nmber of nknown pixels, so is approximately qadratic in the nmber of image pixels. Existing methods adopt label prning [16] or hierarchical solers [17]. Bt they may still take tens of seconds to process small images (e.g., 400 300 pixels). We note that exemplar-based methods, both 1. As reported in [19].

STATISTICAL IMAGE COMPLETION 2 inpt (b) matching (c) offsets stat. s 2 s 1 (d) montage I K (x)=i(x+s K ) (e) reslt (f) content-aware fill.. I 2 (x)=i(x+s 2 ) I 1 (x)=i(x+s 1 ) inpt I(x) Fig. 1. Otline. Inpt image with a mask oerlayed. (b) Matching similar patches in the known region. (c) The statistics of the offsets of the similar patches. The offsets of the highest peaks are picked ot. (d) Combining a set of shifted images with the gien offsets. (e) Or graph-based reslt. (f) Reslt of Content-Aware Fill. matching-based and graph-based, shold implicitly or explicitly assign each nknown pixel/patch an offset - the relatie location from where it copies the content. Both the coherence measre in [11], [13] and the MRFs in [17] can be iewed as optimization w.r.t. the offsets (Sec. 2). Bt existing methods do not predict reliable offsets beforehand, and generally accept all possible offsets in the optimization. We will show that it is beneficial to constrain the offsets sing certain statistics of patch offsets. In terms of qality, or constrained offsets may prodce better reslts for both graph-based and matching-based methods. Typically, we find nexpected biases may present for graphbased methods if the offsets are not restricted (c.f. Sec. 3.3). In terms of speed, a small set of pre-defined offsets can lead to ery efficient algorithms. In this paper, we present noel statistics of patch offsets for high-qality and fast image completion. We obsere that if we match similar patches in the image, the statistics of patch offsets are sparsely distribted (Fig. 1(c)): a majority of patches hae similar offsets, forming seeral prominent peaks in the statistics. Sch dominant offsets describe how the patterns are most possibly repeated, and ths proide reliable cles for completing the missing region. We obsere that these offsets are predictie for completing linear strctres, textres, and repeated objects. Then we show that the statistics of patch offsets can be incorporated into both graph-based and matching-based methods for image completion. In or graph-based soltion, we create a stack of shifted images corresponding to a few dominant offsets, and combine them ia graphcts to fill the missing region (Fig. 1(d)). In or matching-based soltion, we optimize the coherence measre bt only allow patches to be matched to those shifted by the dominant offsets. In experiments, both methods prodce high-qality reslts in arios cases that are challenging for many state-of-theart methods. The graph-based method is also faster than the competitors inclding Content-Aware Fill in Adobe Photoshop. A preliminary ersion of this work has been pblished in ECCV 12 [22]. Interestingly, since then the statistics of patch offsets hae witnessed new applications beyond image completion. Inspired by or work, Chen et al. [23] se the dominant offsets to initialize optical flows. They present top-ranked reslts in optical flow benchmarks. Zhang et al. [24] se the the dominant offsets with the graph-cts algorithm to extrapolate images. We beliee the statistics of patch offsets, as a kind of natral image statistics, will find more applications in the ftre. 2 APPROACH We first introdce a way of compting the statistics of patch offsets. Based on these statistics, we deelop both matching-based and graph-based methods for image completion. We proide analysis in the next section. 2.1 Compting the Statistics of Patch Offsets To compte the statistics, we first match similar patches in the known region and obtain their offsets (Fig. 1(b)). For each patch P in the known region, we find another known patch that is the most similar with P and compte their relatie position s. Formally, the

STATISTICAL IMAGE COMPLETION 3 offset s is: s(x) =argmin P (x + s) P (x) 2 (1) s s.t. s >τ. Here, s =(, ) is the 2-d coordinates of the offset, x =(x, y) is the position of a patch, and P (x) is a w w patch centered at x. We represent each patch as a 3w 2 -dimensional ector of its RGB colors. The similarity between two patches is measred by the sqared Eclidean distance between their representations. The threshold τ is to preclde nearby patches. This constraint is to aoid triial statistics as we will discss. The soltion to Eqn. (1) can be approximately obtained by nearest-neighbor field algorithms like the PatchMatch [18] or its improements [25], [26]. Becase we will compte the statistics, the approximation in these algorithms almost does not impact the dominant offsets. In this paper we adopt [26] de to its fast speed. Gien all the offsets s(x) for all the known pixels x, we compte their statistics by a 2-d histogram h(, ): h(, ) = x δ (s(x) =(, )), (2) where δ( ) is 1 when the argment is tre and 0 otherwise. We pick ot the K highest peaks of this histogram. They correspond to the first K dominant offsets. We empirically se K =60throghot this paper. Fig. 1(c) shows an example of this histogram. There are two major peaks in the horizontal direction. The offsets gien by these two peaks indicates how the patterns are mostly repeated in the image. In Sec. 3 we will discss arios cases and explain how these dominant offsets inflence the image completion algorithms. 2.2 Image Completion sing the Statistics of Patch Offsets As discssed in the introdction, both matching-based and graph-based methods can be iewed as assigning an offsets s(x) to each nknown pixel/patch at x. The algorithms will copy content from the location x + s and paste it (pixel-wise or patch-wise) into the position x. Next we show how the statistics of patch offsets can be applied to both matching-based and graph-based methods. 2.2.1 Graph-based Image Completion sing the S- tatistics of Patch Offsets In or graph-based soltion, we treat image completion as a photomontage [27] problem. Gien the K dominant offsets, we combine a stack of shifted images corresponding to these offsets (see Fig. 1(d)). Formally, we optimize the following MRF energy fnction: E(L) = E d (L(x))+ E s (L(x),L(x )). x Ω (x,x ) x Ω,x Ω (3) Here Ω is the nknown region (expanded by one pixel to inclde bondary conditions). The neighboring pixels (x, x ) are 4-connected. The argment L is a labeling map. It assigns a label to each nknown pixel x, where the labels represent the pre-selected offsets {s i } K i=1 or s 0 =(0, 0). Here s 0 is alid if and only if x is on the bondary of Ω, so as to impose bondary constraints. Intitiely, L(x) = i means that we copy the pixel at x + s i to the location x. The data term E d is 0 if the label is alid for x, i.e., x + s is a known pixel; otherwise E d is +. The smoothness term E s penalizes the incoherent seams. Denoting a = L(x) and b = L(x ), we define E s as: E s (a, b) = I(x + s a ) I(x + s b ) 2 + I(x + s a ) I(x + s b ) 2. (4) Here I(x) is the RGB color of x. Note I a ( ) I( + s a ) is an image shifted by s a (Fig. 1(d)); and likewise I b ( ) I( + s b ).Ifs a s b, the neighboring pixels x and x will be assigned different offsets, i.e., L(x) L(x ), so there will exist a seam between x and x.in this sense, Eqn. (4) penalizes the neighboring labels if the two shifted images I( + s a ) and I( + s b ) are not similar near this seam. This smoothness term is similar to the those sed in the GraphCts textre [28], Photomontage [27], or Shift-map [17] methods. As in the photomontage problem [27], we optimize the energy (3) sing mlti-label graph-cts [21] and then frther sppress the seams by Poisson blending [29]. More implementation details are in Sec. 4. Fig. 1(e) shows an image completion reslt of or graph-based soltion. 2.2.2 Matching-based Image Completion sing the Statistics of Patch Offsets The statistics of patch offsets can also be incorporated into the coherence measre d cohere proposed in [11], [13]. In this paper we adopt the definition sed in [14], [18]: d cohere = min P Q 2, (5) Q Ω P Ω where P is a patch in the synthesized region Ω, and Q is a patch in the known region Ω. This measre penalizes any patch P in the synthesized region if its best match Q in the known region is not similar to it. In or crrent implementation we do not apply the weights sed in [13]. We can rewrite Eqn. (5) in a way of offset assignments: d cohere = x Ω min s;x+s Ω P (x) P (x + s) 2, (6)

STATISTICAL IMAGE COMPLETION 4 where P (x + s) is a known patch. Eqn. (6) clearly shows that each patch in the Ω will be assigned an offset s. It also implies that there is no constraint on the selection of s: it accepts all alid s sch that x + s Ω. Based on Eqn. (6), it is easy to incorporate the statistics of patch offsets into the coherence measre: ˆd cohere = min P (x) P (x + s i) 2. (7) 1 i K x Ω This eqation means that the offsets can only be chosen from the dominant offsets {s i } K i=1 obtained from the statistics. Note the patch size here need not be the same as the one sed for compting the statistics. We denote this patch size as w w. The coherence measre in Eqn. (7) can be optimized in an EM fashion jst as in [14], [18]. In the E-step, the algorithm comptes a nearest neighbor field (each nknown pixel assigned an offset) that maps a patch P Ω to its most similar patch Q. This was done by the PatchMatch algorithm in [18]. In contrast, here we simply exhast i [1,K] for each P (x) to find the most similar patch P (x + s i ). In the M-step, the color of each nknown pixel is reconstrcted by oting as in [18]: becase each nknown pixel is coered by mltiple oerlapping patches, all the corresponding pixels in these patches are aeraged to gie the new color of this pixel. This algorithm is iterated. As in [14], [18], we also adopt a mlti-scale strategy. More implementation details are in Sec. 4. 2.2.3 Discssions We hae shown the statistics of patch offsets can be natrally incorporated into both graph-based and matching-based methods. In Sec. 5 we show the reslts of or both methods. Between the two methods sing the statistics, we recommend the graph-based one. Or graph-based soltion does not reqire patch representations in the optimization step, so it is less inoled in the isse of choosing patch sizes (see Sec. 3.4). Besides, we find or graph-based soltion is faster (see Sec. 5) thanks to the efficiency of graphcts [21]. Unless specified, the reslts in these paper are obtained from or graph-based soltion. 3 ANALYSIS In this section, we analyze the statistics of patch offsets and their impacts to image completion. 3.1 Sparsity of the Offsets Statistics One of or key obserations is that the offsets statistics are sparse (Fig. 1(c)). We erify this obseration in the MSRA Salient Object Database [30] which contains 5,000 images with manally labeled salient objects. We omit these objects and compte the offsets statistics in the backgrond. Note the backgrond still # of offsets (%) 100 90 80 70 60 50 40 30 20 10 τ = 0 τ = 32 τ = 24 τ = 16 niform distribtion 0 0 10 20 30 40 50 60 70 80 90 100 # of bins (%) 90 80 (7, 80) 70 0 10 20 (b) zoom-in Fig. 2. : cmlatie distribtions of offsets, aeraged oer 5,000 images. (b): zoom-in of. τ = 32 (b) τ = 0 Fig. 3. Offset statistics of Fig. 1 sing τ=32 or τ=0 (shown in the same scale). contains arios strctres or less salient objects. We se 8 8 patches and test τ=16, 24, or 32 in Eqn. (1). For each image, we sort the histogram bins in the descending order of their magnitde (i.e., the nmber of offsets in a bin), and we cmlate the bins. The cmlatie distribtion, aeraged oer 5,000 images, is shown in Fig. 2. We can see the offsets are sparsely distribted: e.g., when τ =32, abot 80% of the offsets are in 7% of all possible bins (Fig. 2(b)). We also obsere the cmlatie distribtion changes only a little with different τ ales (16, 24, or 32) (Fig. 2(b)). This means the sparsity is insensitie to τ in a wide spectrm. It is worth mentioning that the non-nearby constraint ( s >τ) in Eqn. (1) is important. A recent work on natral image statistics [31] shows that the best match of a patch is most probably located near itself. We erify this by setting τ =0(ths a patch can match any other patch rather than itself). We can see that the offsets statistics hae a single dominant peak arond (0, 0) (e.g., Fig. 3(b)). Althogh the offsets distribtion is een sparser (see Fig. 2 and Fig. 3(a,b)), the zero offset is insignificant for inferring the strctres in the hole. 3.2 Offsets Statistics for Image Completion We frther obsere that the dominant offsets (with the non-nearby constraint) are informatie for filling the

STATISTICAL IMAGE COMPLETION 5 offsets shift Ω (b) (c) Ω Fig. 4. Illstration of completing linear strctres. : matching patches. (b): ideal dominant offsets. (c) filling the hole by shifting the image sing these offsets. This figre is for illstration only. The real cases are in Fig. 5. (b) (c) (d) Fig. 5. Linear strctres. Top: inpt. Middle: dominant offsets. Bottom: the reslts obtained by or graph-based soltion. Long/thin objects. (b) Linear color edges. (c)(d) Linear textral edges. hole nder at least three sitations: (i) linear strctres, (ii) reglar/random textres, and (iii) repeated objects. 3.2.1 Linear strctres As illstrated in Fig. 4, a patch on a linear strctre can find its match that is also on this strctre. The offsets statistics will exhibit a series of peaks along the direction of the strctre (Fig. 4(b)). Here we se a dot to denote a dominant offset in the histogram. These offsets will shift the image along the linear strctre, so can reliably complete the mission region. Fig. 5 shows some real examples of this case. We find or method works well for the linear strctres inclding long and thin objects (Fig. 5), color edges (Fig. 5(b)), and textral edges (Fig. 5(c)(d)). Note that or method is tolerant to the strctres that are not salient (e.g., Fig. 5(c)) or that are not strictly straight (e.g., Fig. 5(d)); it is sfficient if the strctres hae a trend along one or a few directions. 3.2.2 Textres Textres can yield prominent patterns in the offset statistics. Ideally, a reglar textre shold generate a reglar pattern of dominant offsets describing how the textres are repeated (Fig. 6). We can complete the textre by shifting the image sing these offsets. Fig. 7 shows a real example of reglar textres. Becase the period of the reglar textres can be larger than some predefined patch sizes, completing sch textres

STATISTICAL IMAGE COMPLETION 6 Ω shift Ω offsets (b) shift (c) Fig. 6. Illstration of completing textres. The notations are as in Fig. 4. (b) (c) (d) Fig. 7. Reglar textres. : inpt. (b): dominant offsets. They describe how the textres are repeated. (c): the reslt obtained by or graph-based soltion. (d): reslt of Content-Aware Fill. (b) (c) Fig. 8. Random textres. : inpt. (b): dominant offsets. (c): the reslt obtained by or graph-based soltion. is a challenging task for other techniqes like ContentAware Fill [19] (Fig. 7(d)). For irreglar textres, we find the dominant offsets generate random patterns (Fig. 8). In this case or method behaes jst like the Graphct textre algorithm [28]. strctres, textres, and repeated objects. Or method has the adantage that it need not consider the aboe cases separately. It can handle all of them or a mixtre of them in the same framework. 3.2.3 Repeated objects Repeated objects can also generate prominent peaks in the offset statistics. This is helpfl in synthesizing semantic content. As shown in Fig. 9, the partially missing circles yield peaks in the offsets that correspond to the relatie positions of the circles. We can complete each circle by shifting another circle with these offsets. In Fig. 10 we show a real example in which or algorithm faithflly recoers a flly missing sclptre. We find that Content-Aware Fill might prodce nsatisfactory reslts (see Fig. 9(e) and Fig. 10(f)), mainly becase it is naware of how the objects are repeated. 3.3 Offsets Selection and the Graph-based Energy Optimization In sm, the offsets statistics can predict the strctres in the missing regions in the cases of linear Or graph-based method has an energy fnction Eqn. (3) similar to the Shift-map method [17]. The main difference is that Shift-map allows all possible offsets. As a reslt, or soltion space is a ery small sbset of the one of Shift-map. Theoretically, Shift-map can achiee a smaller energy than or method (this is obsered in experiments). Howeer, we find that the reslts of Shift-map may hae nexpected bias, and their isal qality can be nsatisfactory een if their energy is mch lower than ors. In Fig. 11 we optimize the energy Eqn. (3) respectiely sing or selected K dominant offsets (Fig. 11(b)) and sing all possible offsets (Fig. 11(c)). The later is the way of Shift-map [17] (except that [17] has an extra gradient smoothness term). As expected,

STATISTICAL IMAGE COMPLETION 7 (b) (c) (d) (e) Fig. 9. Repeated circles. : inpt with a missing region in red. (b): offset histogram. (c): dominant offsets fond by or algorithm. (d): or reslt (this is a real reslt, not a synthetic illstration). (e): reslt of Content-Aware Fill. (b) (c) (d) (e) ors (f) Content-Aware Fill Fig. 10. Repeated objects. : inpt with a sclptre missing. (b): dominant offsets. The arrows indicate the offsets of the relatie positions of the sclptres. (c): the label map obtained by graph-cts: each color represents an offset. (d): the hole is mainly filled by copying the other sclptres, sing the offsets indicated in (b). (e) or reslt and zoom-in. (f) reslt of Content-Aware Fill and zoom-in. or energy (4.6 10 6 ) is mch larger than the energy of Shift-map (1.1 10 6 ). Bt or reslt is isally sperior. We inestigate this nexpected phenomenon throgh the graph-cts label maps (Fig. 11(b)(c)). We find that with a hge nmber of offsets, the Shift-map method can decrease the energy by inserting a great nmber of insignificant labels into the seam (see the zoom-in of Fig. 11(c)). These labels correspond to a few isolated pixels that occasionally connect the content on both sides of the seam. When the offset candidates are in a great nmber, these occasional pixels are not rare. We frther obsere that this problem is inherent and cannot be safely aoided by a hierarchical soler [17] (Fig. 11(d)) or by combining the gradient smoothness term (Fig. 11(e), obtained from the athors demo [32]). On the contrary, or method is less inflenced by this problem (Fig. 11(b)). Actally, an ideal exemplarbased method shold fill the missing region by copying large segments (like patches or regions). This means that only a few offsets shold take effect, which is ensred by or method. The aboe experiments and analysis indicate that reliably limiting the soltion space is important for improing the qality in image completion. Althogh the dominant offsets selected by or method can improe the qality (and also speed), it is non-triial to select a few reliable candidate offsets (e.g., 60) ot of all possible ones (sally 10 4 10 6 ). We compare some naie offset selection methods in Fig. 12. We generate the same nmber (K =60)of offsets, either on a reglar grid (Fig. 12(b)), randomly (Fig. 12(c)), or by or method (Fig. 12(d)). We obsere that the alternatie methods cannot prodce satisfactory reslts, becase the predefined offsets do not captre sfficient information to predict the missing strctres. 3.4 Patch Sizes for the Offsets Statistics Most exemplar-based methods (except [17]) inole the isse of setting sitable patch sizes. Or graphbased energy does not rely on patch representations as in [17]; the patch sizes only impact the comptation of the offsets statistics (Eqn. 1). As analyzed aboe, the dominant offsets in the statistics are mainly determined by how the patterns are repeated in the known regions. Sch repeatedness is insensitie to the patch sizes.

STATISTICAL IMAGE COMPLETION 8 (b) E = 4.6 (c) E = 1.1 (d) E = 1.9 (e) Fig. 11. Offsets sparsity and optimized energy. : inpt. (b): or graph-based reslt and the label map. Energy: E =4.6( 10 6 ), rnning time: t =0.5s. (c): the reslt sing all possible offsets. E =1.1( 10 6 ), t = 4300s. (In this case the nmber of labels is K =3.8 10 5, and the nmber of nknown pixels is N =2.2 10 4.) (d): the reslt of a hierarchical soler [17]. E =1.9( 10 6 ), t =83s. (e): the reslt from the athors demo [32] (with gradient smoothness terms). (b) (c) (d) (e) Fig. 12. Comparisons of offsets selection methods. : inpt. (b): the reslt of reglarly spaced offsets. (c): the reslt of randomly distribted offsets. (d): or reslt. (e): or offsets. The dash line indicates the offsets sed to complete the strctre of the roof. In Fig. 13 we show two examples sing or graphbased soltion. Here we test w w =4 4, 8 8, 16 16, 24 24, and 32 32 patches sed for compting the offsets statistics. We can see that or method can prodce isally plasible reslts in a ery side spectrm of patch sizes. This experiment shows that or method is ery robst to patch sizes. In all other experiments in this paper, we fix the patch size as 8 8. 4 IMPLEMENTATION DETAILS In this section we elaborate the implementation details. 4.1 Compting the Statistics To efficiently matching the patches as in Eqn. (1), we apply a nearest-neighbor field algorithm in [26] with a slight modification: to handle the non-nearby constraint ( s >τ), before compting the difference between a pair of patches we first check their spatial distance and reject them if the constraint is disobeyed. We perform this matching step in a rectanglar region centered arond the bonding box of the hole. This rectangle is 3 times larger (in length) than this bonding box. The prpose of sing sch a rectanglar region is to aoid nreliable statistics if the hole is too small in practical applications (in most examples in this paper this region is the entire image becase the holes are large). The threshold τ in Eqn. (1) is set

STATISTICAL IMAGE COMPLETION 9 inpt 4x4 patches 8x8 patches 16x16 patches 24x24 patches 32x32 patches inpt 4x4 patches 8x8 patches 16x16 patches 24x24 patches 32x32 patches Fig. 13. Image completion reslts obtained by or graph-based soltion, sing arios patch sizes for compting the statistics of patch offsets. 1 max(w, h) where w and h are the width and as 15 height of this region. We downsample this region to 800 600 pixels if it is larger than this size. Then we se 8 8 patches to perform the matching step2. This step takes <0.1s. Gien the nearest-neighbor field s(x), we compte the 2-d histogram h(, ) as in Eqn. (2). We smooth this histogram by a Gassian filter (σ= 2). In this smoothed histogram, we consider a peak as a bin whose magnitde is locally maximal inside a 9 9 window. The highest K peaks are picked ot, and their corresponding offsets gie the offset candidates {si }K i=1 that will be sed in the image completion algorithms. 4.2 The Graph-based Method We adopt a two-scale soler in or graph-based method (Sec 2.2.1). We first downsample the rectanglar region (by a scale l) to 800 600 pixels if it is larger than this size. Then we bild a graph as in Eqn. (3) and optimize it sing graph-cts [21]. We se the pblic code of the mlti-label graph-cts in [33]. Its time complexity is O(N K), where N is the nmber of nknown pixels and K is the nmber of labels. The time of this graph-cts step is 0.2-0.5 seconds for an 800 600 image with 10-20% pixels missing. As a comparison, it takes oer one hor to sole sch an 800 600 image if all possible offsets are allowed at this scale (K=104-106, like Fig. 11(c)), or tens of seconds sing a hierarchical soler [17] with the coarsest leel as small as 100 100 pixels (like Fig. 11(d)). Ths or method is 1-2 orders of magnitde faster than Shift-map [17]. 2. The method in [26] only spports patch sizes 4k 4k for some integer k. When compting the offset statistics, we need not se a (2r + 1) (2r + 1) patch that centered at a certain pixel; instead, we can represent the spatial coordinates of a patch by its top-left corner. This representation is also adopted in the pblic codes of [18] and [25]. We psample the aboe reslting label map to the fll resoltion by nearest-neighbor interpolation and mltiply the offsets by l. To correct small misalignments, we optimize a cost similar to Eqn. (3) in the fll resoltion. We allow each pixel to take 5 offsets: the psampled offset and 4 relatie offsets: if the psampled shift is s = (, ), then the other 4 shifts are ( ± 2l, ) and (, ± 2l ). In this cost fnction we only treat the pixels as nknowns if they are in 2l -pixel arond the seams. This psampling takes <0.1s for typical 2Mp images. We hae also tested or graph-based method in fll resoltion withot downsampling, and fond the isal qalities are similar to the two-scale soler. We adopt the two-scale soler becase it is faster. Finally a Poisson fsion [29] is applied to hide the possibly isible seams. We adopt a recent O(N ) time Poisson soler proposed in [34]. In or implementation it takes 50ms per Megapixel. 4.3 The Matching-based Method As in [14], [18], we adopt a mlti-scale soler in or matching-based method (Sec 2.2.2). We bild an image pyramid of L leels sing a scaling factor 2, with a fixed coarsest size ( 100 100). We start the EM algorithm from the coarsest leel, with an initialization discssed below. The reslt of a coarser leel is interpolated into the next finer leel. We interpolate the color image and at the next leel start from the E-step. (Alternatiely, we can interpolate the nearest neighbor field and at the next leel start from the M-step. We find the former way is better at hiding the seams.) We rn 20 iterations of EM steps in the coarsest leel, 2 iterations in the finest leel, and 5 iterations in other leels. The reslt of the matching-based method is sensitie to the initialization. We hae tested two ways of

STATISTICAL IMAGE COMPLETION inpt 10 ors (graph-based) ors (matching-based) Content-Aware Fill Fig. 14. Comparisons with Content-Aware Fill. From left to right: inpt, or graph-based reslts, or matchingbased reslts, and reslts of Content-Aware Fill. The artifacts are highlighted by the arrows. Image size (from top to down): 0.12Mp, 0.2Mp, 0.26Mp, 0.6Mp, 2Mp, 4Mp, 10Mp. The rnning time is in Fig. 15. initializing at the coarsest leel: sing the Poisson eqation [29] to roghly propagate colors and generate a smoothed gess, or se or graph-based soltion to generate a strctral gess. We find the second way is more robst and we report the reslts sing this way. Unlike the graph-based method, the matchingbased method reqires to set a patch size in its cost fnction (7). We fixed this size as w w = 9 9 throghot this paper, althogh in some cases we find adjsting this size can gie better reslts. 5 E XPERIMENTAL R ESULTS All experiments are rn on a PC with an Intel Core i7 3.0GHz CPU and 8G RAM. We recommend iewing the spplementary ideo to experience the ser interactions and speed.

STATISTICAL IMAGE COMPLETION 11 Fig. 7, door (300x400) Fig. 14, lady (300x400) Fig. 14, lion (500x400) Fig. 14, cat (1000x600) Fig. 1, sclptre (800x600) Fig. 14, bicycle (650x400) Fig. 14, shark (1600x1200, 2Mp) Fig. 10, temple (3900x2600, 10Mp) Fig. 14, farmer (3900x2600, 10Mp) Fig. 14, dck (2200x1700, 4Mp) Rnning time / seconds Ors (graph-based) Content-Aware Fill 0 1 2 3 4 5 6 7 Fig. 15. Rnning timing comparisons between or graph-based method and Content-Aware Fill. The bars are sorted in the ascending order of Content-Aware Fill s time. Inpt (b) Ors (c) Content-Aware Fill (d) Priority-BP (e) Shift-map (f) Criminisi et al. s Fig. 16. Comparisons with state-of-the-art methods. Inpt (640 430). (b) Ors (graph-based, 0.18s). (c) Content-Aware Fill (0.3s). (d) Priority-BP [15] (117s). (e) Shift-map [17] (13s). (f) Criminisi et al. s [7] (6.9s). 5.1 Comparison with Content-Aware Fill The tool Content-Aware Fill in Adobe Photoshop is reported [19] as an implantation partially based on [11], [18]. We beliee it is a well-tned, optimized, and perhaps enhanced implementation. It has shown compelling qality and speed in many practices. In this sbsection we compare with this tool 3. Some comparisons hae been shown in the preios sections (Fig. 1, 7, 9, and 10). In Fig. 14 we show more examples sing or both methods (graphbased and matching-based). In all these examples or methods complete the images sing as few as K =60 pre-selected offsets. Or both methods generate highqality reslts, whereas the Content-Aware Fill prodces noticeable artifacts in these examples. Fig. 15 shows the rnning time of or graph-based soltion and the Content-Aware Fill. Both methods are sing qad cores. (Or graph-based method benefits less than Content-Aware Fill in mlti-core, mainly becase in Content-Aware Fill the EM algorithm and 3. We hae also tested an implementation of [11], [18] gien by the pblic code in [35], bt we find it is non-triial to tne niersally acceptable parameters. PatchMatch are flly parallelized, bt the graph-cts algorithm we sed is not. In or graph-based method, the parallelism is only for matching patches and Poisson blending.) For small images where the two-scale soltion does not take effect (the first six examples in Fig. 15), or graph-based method is slightly faster than Content-Aware Fill. Bt for mega-pixel images or graph-based method is 2-5 times faster. This is becase matching patches in mega-pixel images can be slow at finer scales. Or matching-based method takes abot 50-100% more time than or graph-based method. 5.2 Comparisons with Other State-of-the-art Methods In Fig. 16 we frther compare with Priority-BP [15], Shift-map [17], and Criminisi et al. s method [7]. Or method faithflly recoers the textre edges here, and is mch faster than the other three methods (see the caption in Fig. 16). More comparisons are in the spplementary materials 4. 4. research.microsoft.com/en-s/m/people/kahe/ecc12

STATISTICAL IMAGE COMPLETION 12 inpt ors Content-Aware Fill Priority-BP Fig. 17. A comparison with Priority-BP [16]. For this 256 163 image Priority-BP takes 40s in a well parallelized qad-core implementation, while or graph-based method takes 0.09s. inpt ors Shift-map Fig. 18. A comparison with Shift-map [17]. In the zoom-in we show how Shift-map behaes near inconsistent seams. inpt ors (graph-based) ors (matching-based) inpt ors (graph-based) ors (matching-based) Fig. 19. More reslts of images from preios papers [8], [13]. These images are arond 400 300. Or methods complete each image in less than 0.3 seconds. Fig. 17 shows a comparison with Priority-BP [15]. This method optimizes an MRF sing the BP algorithm with on-the-fly label prning. Its rnning time for this 256 163-pixel image is 40s sing a well parallelized qad-core implementation, while or graphbased method takes 0.09s. Also note Priority-BP cannot recoer the pattern in the lamp in this example. Fig. 18 shows a comparison with Shift-map [17]. As also in Fig. 11, Shift-map cannot presere the strctre in this case. We can see (in zoom-in) how this method conceals a seam when the content is not consistent on both sides of this seam. This reslt is obtained from the athors on-line demo [32]. We tried arios parameter settings bt obsered similar artifacts. 5.3 More Reslts and Limitations In Fig. 19 we show more reslts in the example images from preios papers [8], [13]. In Fig. 20 we show an example of remoing mltiple small objects in a large image. In Fig. 21 we show two examples of completing panoramic images. Limitations. Or methods may fail when the desired offsets do not form dominant statistics. Fig. 22(b) show a failre case. We can partially sole this problem by manally introdcing offsets. E.g., we can paint an extra stroke on the image (Fig. 22(d)), and treat this image as a new sorce for patch statistics. This stroke contribtes to the statistics and oercomes the problem (Fig. 22(e)). Some other failre examples are in the spplementary materials. 6 CONCLUSION In this paper we hae presented noel statistics of patch offsets. We hae demonstrated the effects of these statistics for image completion sing both graph-based and matching-based methods. Natral image statistics are essential for many compter ision problems. Gradient-domain statistics hae been applied in denoising [36], deconoltion [37], and diffsion-based inpainting [3]. Patch-domain statistics hae been shown ery sccessfl in denoising [36] and sper-resoltion [38]. We beliee or statistics of patch offsets, as a kind of natral image

STATISTICAL IMAGE COMPLETION inpt 13 objects to be remoed or reslt zoom-in Fig. 20. Remoing mltiple small objects from a large image (10Mp). The objects are remoed seqentially. The reslts are obtained by or graph-based soltion. inpt ors (graph-based) ors (matching-based) Fig. 21. Or reslts for completing panoramic images. Or graph-based method takes 1.1s in the top image (3200 2000 pixels) and 0.97s in the bottom image (2600 1800 pixels). inpt (b) ors (c) Content-Aware Fill (d) modified inpt (e) ors on (d) Fig. 22. Failre of the statistics. Inpt. (b) Or graph-based reslt. (c) Reslt of Content-Aware Fill. (d) An extra stroke is casally painted on the image. (e) Or graph-based reslt of (d). statistics, will find more applications in the ftre (e.g., [23], [24]). The sage of the patch offsets implies that we only consider translations of patches for image completion. Recently, there are stdies [39], [35], [40] on sing more complex transforms like scaling, rotation, reflection, and their combinations. It will be interesting to inestigate the statistics in these higher dimensional transformation spaces. We leae this problem for ftre stdy. R EFERENCES [1] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, Image inpainting, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2000, pp. 417 424. [2] [3] [4] [5] [6] [7] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera, Filling-in by joint interpolation of ector fields and gray leels, IEEE Transactions on Image Processing (TIP), pp. 1200 1211, 2001. A. Lein, A. Zomet, and Y. Weiss, Learning how to inpaint from global image statistics, in Proceedings of the IEEE International Conference on Compter Vision (ICCV), 2003, pp. 305 312. M. Bertalmio, L. Vese, G. Sapiro, and S. Osher, Simltaneos strctre and textre image inpainting, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2003. S. Roth and M. J. Black, Fields of experts: a framework for learning image priors, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2005, pp. 860 867. A. A. Efros and T. K. Leng, Textre synthesis by nonparametric sampling, in Proceedings of the IEEE International Conference on Compter Vision (ICCV), 1999, pp. 1033 1038. A. Criminisi, P. Perez, and K. Toyama, Object remoal by exemplar-based inpainting, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2003.

STATISTICAL IMAGE COMPLETION 14 [8] I. Drori, D. Cohen-Or, and H. Yeshrn, Fragment-based image completion, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2003, pp. 303 312. [9] J. Jia and C.-K. Tang, Image repairing: Robst image synthesis by adaptie nd tensor oting, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), ol. 1. IEEE, 2003, pp. I 643. [10], Inference of segmented color and textre description by tensor oting, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), ol. 26, no. 6, pp. 771 786, 2004. [11] Y. Wexler, E. Shechtman, and M. Irani, Space-time ideo completion, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2004. [12] J. Sn, L. Yan, J. Jia, and H.-Y. Shm, Image completion with strctre propagation, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2005, pp. 861 868. [13] Y. Wexler, E. Shechtman, and M. Irani, Space-time completion of ideo, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pp. 463 476, 2007. [14] D. Simako, Y. Caspi, E. Shechtman, and M. Irani, Smmarizing isal data sing bidirectional similarity, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2008. [15] N. Komodakis and G. Tziritas, Image completion sing global optimization, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2006, pp. 442 452. [16], Image completion sing efficient belief propagation ia priority schedling and dynamic prning, IEEE Transactions on Image Processing (TIP), pp. 2649 2661, 2007. [17] Y. Pritch, E. Ka-Venaki, and S. Peleg, Shift-map image editing, in Proceedings of the IEEE International Conference on Compter Vision (ICCV), 2009, pp. 151 158. [18] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, PatchMatch: A randomized correspondence algorithm for strctral image editing, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2009, pp. 1 8. [19] Adobe Systems Inc, www.adobe.com/technology/graphics/ content aware fill.html, 2009. [20] K. P. Mrphy, Y. Weiss, and M. I. Jordan, Loopy belief propagation for approximate inference: An empirical stdy, in Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Morgan Kafmann Pblishers Inc., 1999, pp. 467 475. [21] Y. Boyko, O. Veksler, and R. Zabih, Fast approximate energy minimization ia graph cts, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pp. 1222 1239, 2001. [22] K. He and J. Sn, Statistics of patch offsets for image completion, in Proceedings of the Eropean Conference on Compter Vision (ECCV). Springer-Verlag, 2012, pp. 16 29. [23] Z. Chen, H. Jin, Z. Lin, S. Cohen, and Y. W, Large displacement optical flow from nearest neighbor fields, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2013. [24] Y. Zhang, J. Xiao, J. Hays, and P. Tan, Framebreak: Dramatic image extrapolation by gided shift-maps, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2013. [25] S. Korman and S. Aidan, Coherency sensitie hashing, in Proceedings of the IEEE International Conference on Compter Vision (ICCV), 2011, pp. 1607 1614. [26] K. He and J. Sn, Compting nearest-neighbor fields ia propagation-assisted kd-trees, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2012. [27] A. Agarwala, M. Dontchea, M. Agrawala, S. Drcker, A. Colbrn, B. Crless, D. Salesin, and M. Cohen, Interactie digital photomontage, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2004, pp. 294 302. [28] V. Kwatra, A. Schödl, I. Essa, G. Trk, and A. Bobick, Graphct textres: image and ideo synthesis sing graph cts, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2003, pp. 277 286. [29] P. Pérez, M. Gangnet, and A. Blake, Poisson image editing, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2003, pp. 313 318. [30] T. Li, J. Sn, N.-N. Zheng, X. Tang, and H.-Y. Shm, Learning to detect a salient object, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2007. [31] M. Zontak and M. Irani, Internal statistics of a single natral image, in Proceedings of the IEEE Conference on Compter Vision and Pattern Recognition (CVPR), 2011, pp. 977 984. [32] Shift-map On-line Demo, www.ision.hji.ac.il/shiftmap/. [33] Mlti-label Graph Cts, http://ision.csd.wo.ca/code/. [34] Z. Farbman, R. Fattal, and D. Lischinski, Conoltion pyramids, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH Asia, 2011. [35] A. Mansfield, M. Prasad, C. Rother, T. Sharp, P. Kohli, and L. V. Gool, Transforming image completion, British Machine Vision Conference (BMVC), Agst 2011. [36] V. Katkonik, A. Foi, K. Egiazarian, and J. Astola, From local kernel to nonlocal mltiple-model image denoising, International Jornal of Compter Vision (IJCV), 2010. [37] R. Fergs, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, Remoing camera shake from a single photograph, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2006, pp. 787 794. [38] D. Glasner, S. Bagon, and M. Irani, Sper-resoltion from a single image, in Proceedings of the IEEE International Conference on Compter Vision (ICCV), 2009. [39] C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein, The generalized patchmatch correspondence algorithm, in Proceedings of the Eropean Conference on Compter Vision (EC- CV). Springer, 2010, pp. 29 43. [40] S. Darabi, E. Shechtman, C. Barnes, D. B. Goldman, and P. Sen, Image Melding: Combining Inconsistent Images sing Patchbased Synthesis, in ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH, 2012. Kaiming He receied the BS degree from the Academic Talent Program, Physics Department, Tsingha Uniersity in 2007, and the PhD degree from the Department of Information Engineering, the Chinese Uniersity of Hong Kong in 2011. He joined Microsoft Research Asia in 2011. His research interests inclde compter ision and compter graphics. He has won the Best Paper Award at the IEEE Conference on Compter Vision and Pattern Recognition (CVPR) 2009. Jian Sn got the BS degree, MS degree and Ph.D degree from Xian Jiaotong Uniersity in 1997, 2000 and 2003. He joined Microsoft Research Asia in Jly, 2003. His crrent t- wo major research interests are interactie compter ision (ser interface + ision) and internet compter ision (large image collection + ision). He is also interested in stereo matching and comptational photography. He has won the Best Paper Award at the IEEE Conference on Compter Vision and Pattern Recognition (CVPR) 2009.