Understnding cmer trde-offs through Byesin nlysis of light field projections Ant Levin 1 Willim T. Freemn 1,2 Frédo Durnd 1 1 MIT CSAIL 2 Adoe Systems Astrct. Computer vision hs trditionlly focused on extrcting structure, such s depth, from imges cquired using thin-lens or pinhole optics. The development of computtionl imging is rodening this scope; vriety of unconventionl cmers do not directly cpture trditionl imge nymore, ut insted require the joint reconstruction of structure nd imge informtion. For exmple, recent coded perture designs hve een optimized to fcilitte the joint reconstruction of depth nd intensity. The redth of imging designs requires new tools to understnd the trdeoffs implied y different strtegies. This pper introduces unified frmework for nlyzing computtionl imging pproches. Ech sensor element is modeled s n inner product over the 4D light field. The imging tsk is then posed s Byesin inference: given the oserved noisy light field projections nd prior on light field signls, estimte the originl light field. Under common imging conditions, we compre the performnce of vrious cmer designs using 2D light field simultions. This frmework llows us to etter understnd the trdeoffs of ech cmer type nd nlyze their limittions. 1 Introduction The flexiility of computtionl imging hs led to rnge of unconventionl cmer designs. Cmers with coded pertures [1,2], plenoptic cmers [3,4], phse pltes [5,6], nd multi-view systems [7] record different comintions of light rys. Reconstruction lgorithms then convert the dt to viewle imges, estimte depth nd other quntites. These cmers involves trdeoffs mong vrious quntites sptil nd depth resolution, depth of focus or noise. This pper descries theoreticl frmework tht will help to compre computtionl cmer designs nd understnd their trdeoffs. Computtion is chnging imging in three wys. First, the informtion recorded t the sensor my not e the finl imge, nd the need for decoding lgorithm must e tken into ccount to ssess cmer qulity. Second, eyond 2D imges, the new designs enle the extrction of 4D light fields nd depth informtion. Finlly, new priors cn cpture regulrities of nturl scenes to complement the sensor mesurements nd mplify decoding lgorithms. The trditionl evlution tools sed on the imge point spred function (PSF) [8,9] re not le to fully model these effects. We seek tools for compring cmer designs, tking into ccount those three spects. We wnt to evlute the ility to recover 2D imge s well s depth or other informtion nd we wnt to model the decoding step nd use nturl-scene priors.
2 A useful common denomintor, cross cmer designs nd scene informtion, is the lightfield [7], which encodes the tomic entities (lightrys) reching the cmer. Light fields nturlly cpture some of the more common photogrphy gols such s high sptil imge resolution, nd re tightly coupled with the trgets of mid-level computer vision: surfce depth, texture, nd illumintion informtion. Therefore, we cst the reconstruction performed in computtionl imging s light field inference. We then need to extend prior models, trditionlly studied for 2D imges, to 4D light fields. Cmer sensors sum over sets of light rys, with the optics specifying the mpping etween rys nd sensor elements. Thus, cmer provides liner projection of the 4D light field where ech projected coordinte corresponds to the mesurement of one pixel. The gol of decoding is to infer from such projections s much informtion s possile out the 4D light field. Since the numer of sensor elements is significntly smller thn the dimensionlity of the light field signl, prior knowledge out light fields is essentil. We nlyze the limittions of trditionl signl processing ssumptions [10,11,12] nd suggest new prior on light field signls which explicitly ccounts for their structure. We then define new metric of cmer performnce s follows: Given light field prior, how well cn the light field e reconstructed from the dt mesured y the cmer? The numer of sensor elements is of course criticl vrile, nd we chose to stndrdize our comprisons y imposing fixed udget of N sensor elements to ll cmers. We focus on the informtion cptured y ech cmer, nd wish to void the confounding effect of cmer-specific inference lgorithms or the decoding complexity. For clrity nd computtionl efficiency we focus on the 2D version of the prolem (1D imge/2d light field). We use simplified opticl models nd do not model lens errtions or diffrction (these effects would still follow liner projection model nd cn e ccounted for with modifictions to the light field projection function.) Our frmework cptures the three mjor elements of the computtionl imging pipeline opticl setup, decoding lgorithm, nd priors nd enles systemtic comprison on common seline. 1.1 Relted Work Approches to lens chrcteriztion such s Fourier optics [8,9] nlyze n opticl element in terms of signl ndwidth nd the shrpness of the PSF over the depth of field, ut do not ddress depth informtion. The growing interest in 4D light field rendering hs led to reserch on reconstruction filters nd nti-lising in 4D [10,11,12], yet this reserch relies mostly on clssicl signl processing ssumptions of nd limited signls, nd do not utilize the rich sttisticl correltions of light fields. Reserch on generlized cmer fmilies [13,14] mostly concentrtes on geometric properties nd 3D configurtions, ut with n ssumption tht pproximtely one light ry is mpped to ech sensor element nd thus decoding is not tken into ccount. Reconstructing dt from liner projections is fundmentl component in CT nd tomogrphy [15]. Fusing multiple imge mesurements is lso used for superresolution, nd [16] studies uncertinties in this process.
3 plne plne () 2D slice through scene () Light field (c) Pinhole (d) Lens (e) Lens, focus chnge (f) Stereo (g) Plenoptic cmer (h) Coded perture lens (i) Wvefront coding Fig. 1. () Flt-world scene with 3 ojects. () The light field, nd (c)-(i) cmers nd the light rys integrted y ech sensor element (distinguished y color) 2 Light fields nd cmer configurtions Light fields re usully represented with two-plne prmeteriztion, where ech ry is encoded y its intersections with two prllel plnes. Figure 1(,) shows 2D slice through diffuse scene nd the corresponding 2D slice of the 4D light field. The color t position ( 0, 0 ) of the light field in fig. 1() is tht of the reflected ry in fig. 1() which intersects the nd lines t points 0, 0 respectively. Ech row in this light field corresponds to 1D view when the viewpoint shifts long. Light fields typiclly hve mny elongted lines of nerly uniform intensity. For exmple the green oject in fig. 1 is diffuse nd the reflected color does not vry long the dimension. The slope of those lines corresponds to the oject depth [10,11]. Ech sensor element integrtes light from some set of light rys. For exmple, with conventionl lens, the sensor records n integrl of rys over the lens perture. We review existing cmers nd how they project light rys to sensor elements. We ssume tht the cmer perture is positioned on the line prmeterizing the light field. Pinhole Ech sensor element collects light from single ry, nd the cmer projection just slices row in the light field (fig 1(c)). Since only tiny frction of light is let in, noise is n issue. Lenses gther more light y focusing ll light rys from point t distnce D to sensor point. In the light field, 1/D is the slope of the integrtion (projection) stripe (fig 1(d,e)). An oject is in focus when its slope mtches this slope (e.g. green in fig 1(d)) [10,11,12]. Ojects in front or ehind the focus distnce will e lurred. Lrger pertures gther more light ut cn cuse more defocus. Stereo [17] fcilitte depth inference y recording 2 views (fig 1(g), to keep constnt sensor udget, the resolution of ech imge is hlved).
4 Plenoptic cmers cpture multiple viewpoints using microlens rry [3,4]. If ech microlens covers k sensor elements one chieves k different views of the scene, ut the sptil resolution is reduced y fctor of k (k = 3 is shown in fig 1(g)). Coded perture[1,2] plce inry msk in the lens perture (fig 1(h)). As with conventionl lenses, ojects deviting from the focus depth re lurred, ut ccording to the perture code. Since the lur scle is function of depth, y serching for the code scle which est explins the locl imge window, depth cn e inferred. The lur cn lso e inverted, incresing the depth of field. Wvefront coding introduces n opticl element with n unconventionl shpe so tht rys from ny world point do not converge. Thus, integrting over curve in light field spce (fig 1(i)), insted of the stright integrtion of lenses. This is designed to mke defocus t different depths lmost identicl, enling deconvolution without depth informtion, therey extending depth of field. To chieve this, cuic lens shpe (or phse plte) is used. The light field integrtion curve, which is function of the lens norml, cn e shown to e prol (fig 1(i)), which is slope invrint (see [18] for derivtion, lso independently shown y M. Levoy nd Z. Zhng, personl communiction). 3 Byesin estimtion of light field 3.1 Prolem sttement We model n imging process s n integrtion of light rys y cmer sensors, or in n strct wy, s liner projection of the light field y = Tx + n (1) where x is the light field, y is the cptured imge, n is n iid Gussin noise n N(0, η 2 I) nd T is the projection mtrix, descriing how light rys re mpped to sensor elements. Referring to figure 1, T includes one row for ech sensor element, nd this row hs non-zero elements for the light field entries mrked y the corresponding color (e.g. pinhole T mtrix hs single non-zero element per row). The set of relizle T mtrices is limited y physicl constrints. In prticulr, the entries of T re ll non-negtive. To ensure equl noise conditions, we ssume mximl integrtion time, nd the mximl vlue for ech entry of T is 1. The mount of light reching ech sensor element is the sum of the entries in the corresponding T row. It is usully etter to collect more light to increse the SNR ( pinhole is noisier ecuse it hs single non-zero entry per row, while lens hs multiple ones). To simplify nottion, most of the following derivtion will ddress 2D slice in the 4D light field, ut the 4D cse is similr. While the light field is nturlly continuous, for simplicity we use discrete representtion. Our gol is to understnd how well we cn recover the light field x from the noisy projection y, nd which T mtrices (mong the cmer projections descried in the previous section) llow etter reconstructions. Tht is, if one is llowed to tke N mesurements (T cn hve N rows), which set of projections leds to etter light field reconstruction? Our evlution methodology cn e dpted to weight w which specifies
5 how much we cre out reconstructing different prts of the light field. For exmple, if the gol is n ll-focused, high qulity imge from single view point (s in wvefront coding), we cn ssign zero weight to ll ut one light field row. The numer of mesurements tken y most opticl systems is significntly smller thn the light field dt, i.e. T contins mny fewer rows thn columns. As result, it is impossile to recover the light field without prior knowledge on light fields. We therefore strt y modeling light field prior. 3.2 Clssicl priors Stte of the rt light field smpling nd reconstruction pproches [10,11,12] pply signl processing techniques, typiclly ssuming nd-limited signls. The numer of non-zero frequencies in the signl hs to e equl to the numer of smples, nd therefore efore smples re tken, one hs to pply low-pss filter to meet the Nyquist limit. Light field reconstruction is then reduced to convolution with proper low-pss filter. When the depth rnge in the scene is ounded, these strtegies cn further ound the set of ctive frequencies within shered rectngle insted of stndrd squre of low frequencies nd tune the orienttion of the low pss filter. However, they do not ddress inference for generl projection such s the coded perture. One wy to express the underlying nd limited ssumptions in prior terminology is to think of n isotropic Gussin prior (where y isotropic we men tht no direction in the light field is fvored). In the frequency domin, the covrince of such Gussin is digonl (with one vrince per Fourier coefficient), llowing zero (or very nrrow) vrince t high frequencies ove the Nyqusit limit, nd wider one t the lower frequencies. Similr priors cn lso e expressed in the sptil domin y penlizing the convolution with set of high pss filters: P(x) exp( 1 X f k,i x T 2 ) = exp( 1 2σ 0 2 xt Ψ 1 0 x) (2) k,i where f k,i denotes the kth high pss filter centered t the ith light field entry. In sec 5, we will show tht nd limited ssumptions nd Gussin priors indeed led to equivlent smpling conclusions. More sophisticted prior choices replce the Gussin prior of eq 2 with hevytiled prior [19]. However, s will e illustrted in section 3.4, such generic priors ignore the very strong elongted structure of light fields, or the fct tht the vrince long the disprity slope is significntly smller thn the sptil vrince. 3.3 Mixture of Gussins (MOG) Light field prior To model the strong elongted structure of light fields, we propose using mixture of oriented Gussins. If the scene depth (nd hence light field slope) is known we cn define n nisotropic Gussin prior tht ccounts for the oriented structure. For this, we define slope field S tht represents the slope (one over the depth of the visile point) t every light field entry (fig. 2() illustrtes sprse smple from slope field). For given slope field, our prior ssumes tht the light field is Gussin, ut hs
6 vrince in the disprity direction tht is significntly smller thn the sptil vrince. The covrince Ψ S corresponding to slope field S is then: x T Ψ 1 S x = X i 1 σ s g T S(i),ix 2 + 1 σ 0 g T 0,ix 2 (3) where g s,i is derivtive filter in orienttion s centered t the ith light field entry (g 0,i is the derivtive in the horizontl/sptil direction), nd σ s << σ 0, especilly for nonspeculr ojects (in prctice, we consider diffuse scenes nd set σ s = 0). Conditioning on depth we hve P(x S) N(0, Ψ S ). We lso need prior P(S) on the slope field S. Given tht depth is usully piecewise smooth, our prior encourges piecewise smooth slope fields (like the regulriztion of stereo lgorithms). Note however tht S nd its prior re expressed in light-field spce, not imge or oject spce. The resulting unconditionl light field prior is n infinite mixture of Gussins (MOG) tht sums over slope fields Z P(x) = P(S)P(x S) (4) S We note tht while ech mixture component is Gussin which cn e evluted in closed form, mrginlizing over the infinite set of slope fields S is intrctle, nd pproximtion strtegies re descried elow. Now tht we hve modeled the proility of light field x, we turn to the imging prolem: Given cmer T nd noisy projection y we wnt to find Bysin estimte for the light field x. For this, we need to define P(x y; T), the proility tht x is the explntion of the mesurement y. Using Byes rule: Z Z P(x y;t) = P(x, S y; T) = P(S y;t)p(x y,s; T) (5) S To express the ove eqution, we note tht y should equl Tx up to mesurement noise, tht is, P(y x; T) exp( 1 2η Tx y 2 ). As result, for given slope field S, 2 P(x y, S; T) P(x S)P(y x; T) is lso Gussin with covrince nd men: Σ 1 S S = Ψ 1 S + 1 η 2 T T T µ S = 1 η 2 ΣST T y (6) Similrly, P(y S; T) is lso Gussin distriution mesuring how well we cn explin y with the slope component S, or, the volume of light fields x which cn explin the mesurement y, if the slope field ws S. This cn e computed y mrginlizing over light fields x: P(y S; T) = P(x S)P(y x; T). Finlly, P(S y; T) is otined from x Byes rule: P(S y; T) = P(S)(y S; T)/ P(S)(y S; T) S To recp, the proility P(x y; T) tht light field x explins mesurement y is lso mixture of Gussins (MOG). To evlute it, we mesure how well x cn explin y, conditioning on prticulr slope field S, nd weight it y the proility P(S y) tht S is ctully the slope field of the scene. This is integrted over ll slope fields S. Inference Given cmer T nd n oservtion y we seek to recover the light field x. In this section we consider MAP estimtion, while in section 4 we pproximte the vrince s well in n ttempt to compre cmers. Even MAP estimtion for x is hrd,
7 7 6 5 4 3 x 10 3 lens pinhole wve front coding isotropic gussin prior isotropic sprse prior light fields prior nd pss ssumption coded perture stereo 2 1 plenoptic () Test imge () light field nd slope field (c) SSD error in reconstruction Fig. 2. Light field reconstruction. 0 s the integrl in eq 5 is intrctle. We pproximte the MAP estimte for the slope field S, nd conditioning on this estimte, solve for the MAP light field x. The slope field inference is essentilly inferring the scene depth. Our inference generlizes MRF stereo lgorithms [17] or the depth regulriztion of the coded perture [1]. Detils regrding slope inference re provided in [18], ut s rief summry, we model slope in locl windows s constnt or hving one single discontinuity, nd we then regulrize the estimte using n MRF. Given the estimted slope field S, our light field prior is Gussin, nd thus the MAP estimte for the light field is the men of the conditionl Gussin µ S in eq 6. This men minimizes the projection error up to noise, nd regulrize the estimte y minimizing the oriented vrince Ψ S. Note tht in trditionl stereo formultions the multiple views re used only for depth estimtion. In contrst, we seek light field tht stisfies the projection in ll views. Thus, if ech view includes lising, we otin super resolution. 3.4 Empiricl illustrtion of light field inference Figure 2(,) presents n imge nd light field slice, involving depth discontinuities. Fig 2(c) presents the numericl SSD estimtion errors. Figure 3 presents the estimted light fields nd (sprse smples from) the corresponding slope fields. See [18] for more results. Note tht slope errors in the 2nd row often ccompny ringing in the 1st row. We compre the results of the MOG light field prior with simpler Gussin priors (extending the conventionl nd limited signl ssumptions [10,11,12]) nd with modern sprse (ut isotropic) derivtive priors [19]. For the plenoptic cmer we lso explicitly compre with signl processing reconstruction (lst r in fig 2(c))- s explined in sec 3.2 this pproch do not pply directly to ny of the other cmers. The prior is criticl, nd resolution is significntly reduced in the sence of slope model. For exmple, if the plenoptic cmer includes lising, figure 3(left) demonstrtes tht with our slope model we cn super-resolve the mesurements nd the ctul informtion encoded y the recorded plenoptic dt is higher thn tht of the direct mesurements. The rnking of cmers lso chnges s function of prior- while the plenoptic cmer produced est results for the isotropic priors, stereo cmer chieves higher
8 Plenoptic cmer Stereo Coded Aperture Fig. 3. Reconstructing light field from projections. Top row: reconstruction with our MOG light field prior. Middle row: slope field (estimted with MOG prior), plotted over ground truth. Note slope chnges t depth discontinuities. Bottom row: reconstruction with isotropic Gussin prior resolution under n MOG prior. Thus, our gol in the next section is to nlyticlly evlute the reconstruction ccurcy of different cmers, nd to understnd how it is ffected y the choice of prior. 4 Cmer Evlution Metric We wnt to ssess how well light field x 0 cn e recovered from noisy projection y = Tx 0 + n, or, how much the projection y nils down the set of possile light field interprettions. The uncertinty cn e mesured y the expected reconstruction error: Z E( W(x x 0 ) 2 ; T) = P(x y;t) W(x x 0 ) 2 (7) x where W = dig(w) is digonl mtrix specifying how much we cre out different light field entries, s discussed in sec 3.1. Uncertinty computtion To simplify eq 7, recll tht the verge distnce etween x 0 nd the elements of Gussin is the distnce from the center, plus the vrince: E( W(x x 0 ) 2 S; T) = W(µ S x 0 ) 2 + X dig(w 2 Σ S) (8) In mixture model, the contriution of ech component is weighted y its volume: Z E( W(x x 0 ) 2 ; T) = P(S y)e( W(x x 0 ) 2 S; T) (9) S Since the integrl in eq 9 cn not e computed explicitly, we evlute cmers using synthetic light fields whose ground truth slope field is known, nd evlute n pproximte uncertinty in the vicinity of the true solution. We use discrete set of slope field smples {S 1,...,S K } otined s perturtions round the ground truth slope field. We pproximte eq 9 using discrete verge: E( W(x x 0 ) 2 ; T) 1 X P(S k y)e( W(x x 0 ) 2 S k ; T) (10) K k Finlly, we use set of typicl light fields x 0 t (generted using ry trcing) nd evlute the qulity of cmer T s the expected squred error over these exmples E(T) = X t E( W(x x 0 t) 2 ; T) (11)
9 Note tht this solely mesures informtion cptured y the optics together with the prior, nd omits the confounding effect of specific inference lgorithms (like in sec 3.4). 5 Trdeoffs in projection design Which designs minimize the reconstruction error? Gussin prior. We strt y considering the isotropic Gussin prior in eq 2. If the distriution of light fields x is Gussin, we cn integrte over x in eq 11 nlyticlly to otin: E(T) = 2 dig(1/η 2 T T T + Ψ 1 0 ) 1. Thus, we rech the clssicl PCA conclusion: to minimize the residul vrince, T should mesure the directions of mximl vrince in Ψ 0. Since the prior is shift invrint, Ψ 1 0 is digonl in the frequency domin, nd the principl components re the lowest frequencies. Thus, n isotropic Gussin prior grees with the clssicl signl processing conclusion [10,11,12] - to smple the light field one should convolve with low pss filter to meet the Nyquist limit nd smple oth the directionl nd sptil xis, s plenoptic cmer does. (if the depth in the scene is ounded, fewer directionl smples cn e used [10]). This is lso consistent with our empiricl prediction, s for the Gussin prior, the plenoptic cmer chieved the lowest error in fig 2(c). However, this smpling conclusion is conservtive s the directionl xis is clerly more redundnt thn the sptil one. The second order sttistics cptured y Gussin distriution do not cpture the high order dependencies of light fields. Mixture of Gussin light field prior. We now turn to the MOG prior. While the optiml projection under this prior cnnot e predicted in closed-form, it cn help us understnd the mjor components influencing the performnce of existing cmers. The score in eq 9 revels two spects which ffect cmer qulity - first, minimizing the vrince Σ S of ech of the mixture components (i.e., the ility to relily recover the light field given the true slope field), nd second, the need to identify depth nd mke P(S y) peked t the true slope field. Below, we elorte on these components. 5.1 Conditionl light field estimtion known depth Fig 4 shows light fields estimted y severl cmers, ssuming the true depth (nd therefore slope field), ws successfully estimted. We lso disply the vrince of the estimted light field - the digonl of Σ S (eq 6). In the right prt of the light field, the lens reconstruction is shrp, since it verges rys emerging from single oject point. On the left, uncertinty is high, since it verges light rys from multiple points.in contrst, integrting over prolic curve (wvefront coding) chieves low uncertinties for oth slopes, since prol covers ll slopes (see [18,20] for derivtion). A pinhole lso ehves identiclly t ll depths, ut it collects only smll mount of light nd the uncertinty is high due to the smll SNR. Finlly, the uncertinty increses in stereo nd plenoptic cmers due to the smller numer of sptil smples. The centrl region of the light field demonstrtes the utility of multiple viewpoint in the presence of occlusion oundries. Occluded prts which re not mesured properly
10 Pinhole Lens Wvefront coding Stereo Plenoptic Fig. 4. Evluting conditionl uncertinty in light field estimte. Left: projection model. Middle: estimted light field. Right: vrince in estimte (equl intensity scle used for ll cmers). Note tht while for visul clrity we plot perfect squre smples, in our implementtion smples were convolved with low pss filters to simulte relistic optics lur. led to higher vrince. The vrince in the occluded prt is minimized y the plenoptic cmer, the only one tht spends mesurements in this region of the light field. Since we del only with sptil resolution, our conclusions correspond to common sense, which is good snity check. However, they cnnot e derived from nive Gussin model, which emphsizes the need for prior such s s our new mixture model. 5.2 Depth estimtion Light field reconstruction involves slope (depth) estimtion. Indeed, the error in eq 9 lso depends on the uncertinty in the slope field S. We need to mke P(S y) peked t the true slope field S 0. Since the oservtion y is Tx + n, we wnt the distriutions of projections T x to e s distinguishle s possile for different slope fields S. One wy to chieve this is to mke the projections corresponding to different slope fields concentrted within different suspces of the N-dimensionl spce. For exmple, stereo cmer yields liner constrint on the projection- the N/2 smples from the first view should e shifted version (ccording to slope) of the other N/2. The coded perture cmer lso imposes liner constrints: certin frequencies of the defocused signls re zero, nd the loction of these zeros shifts with depth [1]. To test this, we mesure the proility of the true slope field, P(S 0 y), verged over set of test light fields (creted with ry trcing). The stereo score is < P(S 0 y) >= 0.95 (where < P(S 0 y) >= 1 mens perfect depth discrimintion) compred to < P(S 0 y) >= 0.84 for coded perture. This suggests tht the disprity constrint of stereo etter distriutes the projections corresponding to different slope fields thn the zero frequency suspce in coded perture.
11 We cn lso quntittively compre stereo with depth from defocus (DFD) - two lenses with the sme center of projection, focused t two different depths. As predicted y [21], with the sme physicl size (stereo seline shift doesn t exceed perture width) oth designs perform similrly, with DFD chieving < P(S 0 y) >= 0.92. Our proilistic tretment of depth estimtion goes eyond liner suspce constrints. For exmple, the verge slope estimtion score of lens ws < P(S 0 y) >= 0.74, indicting tht, while weker thn stereo, single monoculr imge cptured with stndrd lens contins some depth-from-defocus informtion s well. This result cnnot e derived using disjoint-suspce rgument, ut if the full proility is considered, the Occm s rzor principle pplies nd the simpler explntion is preferred. Finlly, pinhole cmer-projection just slices row out of the light field, nd this slice is invrint to the light field slope. The prol filter of wvefront coding lens is lso designed to e invrint to depth. Indeed, for these two cmers, the evluted distriution P(S y) in our model is uniform over slopes. Agin, these results re not surprising ut they re otined within generl frmework tht cn qulittively nd quntittively compre vriety of cmer designs. While comprisons such s DFD vs. stereo hve een conducted in the pst [21], our frmework encompsses much roder fmily of cmers. 5.3 Light field estimtion In the previous section we gined intuition out the vrious prts of the expected error in eq 9. We now use the overll formul to evlute existing cmers, using set of diffuse light field generted using ry trcing (descried in [18]). Evluted configurtions include pinhole cmer, lens, stereo pir, depth-from-defocus (2 lenses focused t different depths), plenoptic cmer, coded perture cmers nd wvefront coding lens. Another dvntge of our frmework is tht we cn serch for optiml prmeters within ech cmer fmily, nd our comprison is sed on optimized prmeters such s seline length, perture size nd focus distnce of the individul lens in stereo pir, nd vrious choices of codes for coded perture cmers (detils provided in [18]). By chnging the weights, W on light field entries in eq 7, we evlute cmers for two different gols: () Cpturing light field. () Achieving n ll-focused imge from single view point (cpturing single row in the light field.) We consider oth Gussin nd our new MOG prior. We consider different depth complexity s chrcterized y the mount of discontinuities. We use slopes etween 45 o to 45 o nd noise with stndrd devition η = 0.01. Additionlly, [18] evlutes chnges in the depth rnge nd noise. Fig. 5(-) plot expected reconstruction error with our MOG prior. Evlution with generic Gussin prior is included in [18]. Source code for these simultions is ville on the uthors wepge. Full light field reconstruction Fig. 5() shows full light field reconstruction with our MOG prior. In the presence of depth discontinues, lowest light field reconstruction error is chieved with stereo cmer. While plenoptic cmer improves depth informtion our comprison suggests it my not py for the lrge sptil resolution loss. Yet, s discussed in sec 5.1 plenoptic cmer offers n dvntge in the presence of complex occlusion oundries. For plnr scenes (in which estimting depth is esy) the coded perture surpsses stereo, since sptil resolution is douled nd the irregulr smpling
12 3.5 2.5 3 x 10 4 3 2 pinhole lens wve front coding No depth discontinuities Modest depth discontinuities Mny depth discontinuities coded perture DFD stereo plenoptic 3 x 10 2.5 pinhole 2 1.5 lens wve front coding No depth discontinuities Modest depth discontinuities Mny depth discontinuities plenoptic coded perture DFD stereo 1.5 1 () full light field () single view Fig. 5. Cmer evlution. See [18] for enlrged plots of light rys cn void high frequencies losses due to defocus lur. While the performnce of ll cmers decreses when the depth complexity increses, lens nd coded perture re much more sensitive thn others. While the depth discrimintion of DFD is similr to tht of stereo (s discussed in sec 5.2), its overll error is slightly higher since the wide pertures lur high frequencies. The rnking in figs 5() grees with the empiricl prediction in fig 2(c). However, while fig 5() mesures inherent optics informtion, fig 2(c) folds-in inference errors s well. Single-imge reconstruction For single row reconstruction (fig 5()) one still hs to ccount for issues like defocus, depth of field, signl to noise rtio nd sptil resolution. A pinhole cmer (recording this single row lone) is not idel, nd there is n dvntge for wide pertures collecting more light (recording multiple light field rows) despite not eing invrint to depth. The prol (wvefront coding) does not cpture depth informtion nd thus performs very poorly for light field estimtion. However, fig 5() suggests tht for recovering single light field row, this filter outperforms ll other cmers. The reson is tht since the filter is invrint to slope, single centrl light field row cn e recovered without knowledge of depth. For this centrl row, it ctully chieves high signl to noise rtios for ll depths, s demonstrted in figure 4. To vlidte this oservtion, we hve serched over lrge set of lens curvtures, or light field integrtion curves, prmeterized s splines fitted to 6 key points. This fmily includes oth slope sensitive curves (in the spirit of [6] or coded perture), which identify slope nd use it in the estimtion, nd slope invrint curves (like the prol [5]), which estimte the centrl row regrdless of slope. Our results show tht, for the gol of recovering single light field row, the wvefront-coding prol outperforms ll other configurtions. This extends the rguments in previous wvefront coding pulictions which were derived using optics resoning nd focus on depth-invrint pproches. It lso grees with the motion domin nlysis of [20], predicting tht prolic integrtion curve provides n optiml signl to noise rtio.
13 5.4 Numer of views for plenoptic smpling As nother wy to compre the conclusions derived y clssicl signl processing pproches with the ones derived from proper light field prior, we follow [10] nd sk: suppose we use cmer with fixed N pixels resolution, how mny different views (N pixels ech) do we ctully need for good virtul relity? Figure 6 plots the expected reconstruction error s function of the numer of views for oth MOG nd nive Gussin priors. While Gussin prior requires dense smple, the MOG error is quite low fter 2-3 views (such conclusions depend on depth complexity nd the rnge of views we wish to cpture). For comprison, we lso mrk on the grph the significntly lrger views numer imposed y n exct Nyquist limit nlysis, like [10]. Note tht to simulte relistic cmer, our directionl xis smples re lised. This is slightly different from [10] which lur the directionl xis in order to properly eliminte frequencies ove the Nyquist limit. 6 Discussion 3 x 10 7 6 5 4 3 2 1 Nyquist Limit Gussin prior MOG prior 0 0 10 20 30 40 Fig. 6. Reconstruction error s function numer of views. The growing vriety of computtionl cmer designs clls for unified wy to nlyze their trdeoffs. We show tht ll cmers cn e nlyticlly modeled y liner mpping of light rys to sensor elements. Thus, interpreting sensor mesurements is the Byesin inference prolem of inverting the ry mpping. We show tht proper prior on light fields is criticl for the successes of cmer decoding. We nlyze the limittions of trditionl nd-pss ssumptions nd suggest tht prior which explicitly ccounts for the elongted light field structure cn significntly reduce smpling requirements. Our Byesin frmework estimtes oth depth nd imge informtion, ccounting for noise nd decoding uncertinty. This provides tool to compre computtionl cmers on common seline nd provides foundtion for computtionl imging. We conclude tht for diffuse scenes, the wvefront coding cuic lens (nd the prol light field curve) is the optiml wy to cpture scene from single view point. For cpturing full light field, stereo cmer outperformed other tested configurtions. We hve focused on providing common ground for ll designs, t the cost of simplifying opticl nd decoding spects. This differs from trditionl optics optimiztion tools such s Zemx tht provide fine-grin comprisons etween sutly-different designs (e.g. wht if this sphericl lens element is replced y n sphericl one?). In contrst, we re interested in the comprison etween fmilies of imging designs (e.g. stereo vs. plenoptic vs. coded perture). We concentrte on mesuring inherent informtion cptured y the optics, nd do not evlute cmer-specific decoding lgorithms. The conclusions from our nlysis re well connected to relity. For exmple, it cn predict the expected trdeoffs (which cn not e derived using more nive light
14 field models) etween perture size, noise nd sptil resolution discussed in sec 5.1. It justifies the exct wvefront coding lens design derived using optics tools, nd confirms the prediction of [21] relting stereo to depth from defocus. Anlytic cmer evlution tools my lso permit the study of unexplored cmer designs. One might develop new cmers y serching for liner projections tht yield optiml light field inference, suject to physicl implementtion constrints. While the cmer score is very non-convex function of its physicl chrcteristics, defining cmer evlution functions opens up these reserch directions. Acknowledgments We thnk Royl Dutch/Shell Group, NGA NEGI-1582-04-0004, MURI Grnt N00014-06-1-0734, NSF CAREER wrd 0447561. Fredo Durnd cknowledges Microsoft Reserch New Fculty Fellowship nd Slon Fellowship. References 1. Levin, A., Fergus, R., Durnd, F., Freemn, W.: Imge nd depth from conventionl cmer with coded perture. SIGGRAPH (2007) 1, 4, 7, 10 2. Veerrghvn, A., Rskr, R., Agrwl, A., Mohn, A., Tumlin, J.: Dppled photogrphy: Msk-enhnced cmers for heterodyned light fields nd coded perture refocusing. SIGGRAPH (2007) 1, 4 3. Adelson, E.H., Wng, J.Y.A.: Single lens stereo with plenoptic cmer. PAMI (1992) 1, 4 4. Ng, R., Levoy, M., Bredif, M., Duvl, G., Horowitz, M., Hnrhn, P.: Light field photogrphy with hnd-held plenoptic cmer. Stnford U. Tech Rep CSTR 2005-02 (2005) 1, 4 5. Brdurn, S., Dowski, E., Cthey, W.: Reliztions of focus invrince in opticl-digitl systems with wvefront coding. Applied optics 36 (1997) 9157 9166 1, 12 6. Dowski, E., Cthey, W.: Single-lens single-imge incoherent pssive-rnging systems. App Opt (1994) 1, 12 7. Levoy, M., Hnrhn, P.M.: Light field rendering. In: SIGGRAPH. (1996) 1, 2 8. Goodmn, J.W.: Introduction to Fourier Optics. McGrw-Hill Book Compny (1968) 1, 2 9. Zemx: www.zemx.com. 1, 2 10. Chi, J., Tong, X., Chn, S., Shum, H.: Plenoptic smpling. SIGGRAPH (2000) 2, 3, 5, 7, 9, 13 11. Isksen, A., McMilln, L., Gortler, S.J.: Dynmiclly reprmeterized light fields. In: SIG- GRAPH. (2000) 2, 3, 5, 7, 9 12. Ng, R.: Fourier slice photogrphy. SIGGRAPH (2005) 2, 3, 5, 7, 9 13. Seitz, S., Kim, J.: The spce of ll stereo imges. In: ICCV. (2001) 2 14. Grosserg, M., Nyr, S.K.: The rxel imging model nd ry-sed clirtion. IJCV (2005) 2 15. Kk, A.C., Slney, M.: Principles of Computerized Tomogrphic Imging. 2 16. Bker, S., Knde, T.: Limits on super-resolution nd how to rek them. PAMI (2002) 2 17. Schrstein, D., Szeliski, R.: A txonomy nd evlution of dense two-frme stereo correspondence lgorithms. Intl. J. Computer Vision 47(1) (April 2002) 7 42 3, 7 18. Levin, A., Freemn, W., Durnd, F.: Understnding cmer trde-offs through yesin nlysis of light field projections. MIT CSAIL TR 2008-049 (2008) 4, 7, 9, 11, 12 19. Roth, S., Blck, M.J.: Fields of experts: A frmework for lerning imge priors. In: CVPR. (2005) 5, 7 20. Levin, A., Snd, P., Cho, T.S., Durnd, F., Freemn, W.T.: Motion invrint photogrphy. SIGGRAPH (2008) 9, 12 21. Schechner, Y., Kiryti, N.: Depth from defocus vs. stereo: How different relly re they. IJCV (2000) 11, 14