Diagonal preconditioning for first order primaldual algorithms in convex optimization


 Lambert Bruce
 1 years ago
 Views:
Transcription
1 Diagonal preconditioning for first order primaldual algorithms in conve optimization Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology Antonin Chambolle CMAP & CNRS École Polytechnique Abstract In this paper we study preconditioning techniques for the firstorder primaldual algorithm proposed in [5]. In particular, we propose simple and easy to compute diagonal preconditioners for which convergence of the algorithm is guaranteed without the need to compute any step size parameters. As a byproduct, we show that for a certain instance of the preconditioning, the proposed algorithm is equivalent to the old and widely unknown alternating step method for monotropic programg [7]. We show numerical results on general linear programg problems and a few standard computer vision problems. In all eamples, the preconditioned algorithm significantly outperforms the algorithm of [5].. Introduction In [5, 8, 3] firstorder primaldual algorithms are studied to solve a certain class of conve optimization problems with known saddlepoint structure. ma K, y + G) F y), ) X y Y where X and Y are finitedimensional vector spaces equipped with standard inner products,. K : X Y is a linear operator and G : X R } and F : Y R } are conve functions with known structure. The iterates of the algorithm studied in [5] to solve ) are very simple: k+ =I + τ G) k τk T y k ) y k+ =I + σ F ) y k + σk k+ + θ k+ k ))) ) They basically consist of alternating a gradient ascend in the dual variable and a gradient descend in the primal The first author acknowledges support from the Austrian Science Fund FWF) under the grant P49N3. Figure. On problems with irregular structure, the proposed preconditioned algorithm PPD) converges significantly faster than the algorithm of [5] PD). variable. Additionally, the algorithm performs an overrelaation step in the primal variable. A fundamental assumption of the algorithm is that the functions F and G are of simple structure, meaning that the socalled proimity or resolvent operators I + σ F ) and I + τ G) have closedform solutions or can be efficiently computed with a high precision. Their eact definitions will be given in Section.. The parameters τ, σ > 0 are the primal and dual step sizes and θ [0, ] controls the amount of overrelaation in. It is shown in [5] that the algorithm converges as long as θ = and the primal and dual step sizes τ and σ are chosen such that τσl <, where L = K is the operator norm of K. It is further shown that a suitably defined partial primaldual gap of the average of the sequence 0, y 0 ),..., k, y k )) vanishes with rate O/k) for the complete class of problems covered by ). For problems with more regularity, the authors propose acceleration schemes based on nonempirical choices on τ, σ and θ. In particular they show that they can achieve O/k ) for problems where G of F is uniformly conve and Oω k ), ω < for problems where both G and F are uniformly conve. See [5] for more details. A common feature of all numerical eamples in [5] is that the involved linear operators K have a simple structure which makes it very easy to estimate L. We observed that for problems where the operator K has a more compli
2 cated structure, L cannot be estimated easily or it might be very large such that the convergence of the algorithm significantly slows down. As we will see, linear operators with irregular structure frequently arise in many different vision problems. In this work, we study preconditioning techniques for the primaldual algorithm ). This allows us to overcome the aforementioned shortcogs. The proposed preconditioned algorithm has several advantages. Firstly, it avoids the estimation of the operator norm of K, secondly, it significantly accelerates the convergence on problems with irregular K and thirdly, it leaves the computational compleity of the iterations basically unchanged. Figure shows convergence plots on two LP problems with such an irregular structure. The proposed algorithm can better adapt to the problem structure, leading to faster convergence. The rest of the paper is as follows. In Section. we fi some preliary definitions which will be used throughout the paper. In Section we present the preconditioned primaldual algorithm and give conditions under which convergence of the algorithm is guaranteed. We propose a family of simple and easy to compute diagonal preconditioners, which turn out to be very efficient on many problems. In Section.3 we establish connections to the old and widely unknown alternating step method for monotropic programg [7]. In Section 3 we detail eperimental results of the proposed algorithm. In the last Section we draw some conclusions and show directions for future work... Preliaries We consider finitedimensional vector spaces X and Y, where n = dim X and m = dim Y with inner products, X = T,,, X, y, y Y = Σ y, y, y, y Y, where T and Σ are a symmetric, positive definite preconditioning matrices. We further define the norms in the usual way as X =, X, y Y = y, y Y. We will make frequent use of the socalled resolvent or proimity operator of a function G). Given a point ˆ X, it is defined as the solution of the auiliary imization problem = arg G) + ˆ X The unique imizer to the above problem is characterized by the optimality condition G) + T ˆ) 0, whose optimal solution can be written in operator form as = I + T G) ˆ). 3). Preconditioned primaldual algorithm In this work, we propose the following preconditioned firstorder primaldual algorithm: Choose symmetric and positive definite matrices T, Σ, θ [0, ], 0, y 0 ) X Y. Then for k 0, update k, y k ) as follows: k+ =I + T G) k TK T y k ) y k+ =I + Σ F ) y k + ΣK k+ +θ k+ k ))) 4) Comparing the iterates 4) of the proposed algorithm to ), one can see that the global steps τ and σ have been replaced by the preconditioning matrices T and Σ. It is known that ) converges as long as θ = and τσ K <. Hence, a natural question is now to establish conditions on T and Σ and θ which ensure convergence of the proposed preconditioned algorithm. In very recent work [0], it has been shown that the iterates ) can be written in form of a proimal point algorithm [4], which greatly simplifies the convergence analysis. From the optimality conditions of the iterates 4) and the conveity of G and F it follows that for any, y) X Y the iterates k+ and y k+ satisfy ) ) ) k+ k+ y y k+,f y k+ +M k+ k y k+ y k 0, where ) ) k+ G F y k+ = k+ ) + K T y k+ F y k+ ) K k+, and [ T K M = T θk Σ 5) ]. 6) It is easy to check, that the variational inequality 5) now takes the form of a proimal point algorithm [0, 4, 6]. In the net Section we will establish conditions on θ, T and Σ which ensure convergence of the algorithm... Convergence of the algorithm We can make direct use of the convergence analysis developed in [0, 4, 6]. In fact, convergence of 5) can be guaranteed as long as the matri M is symmetric and positive definite. In the following Lemma we establish conditions on θ, T and Σ which indeed ensure these properties of M. Lemma. Let θ =, T and Σ symmetric positive definite maps satisfying Σ KT <, 7)
3 then the matri M defined in 6) is symmetric and positive definite, more precisely it holds, y) T, M, y) T > 0, 8) for any, y) T X Y and, y) T 0. Proof. Due to the structure of M, symmetry follows directly from the symmetry of T, Σ and the fact that θ =. Epanding the inner product in 8) we have T, + Σ y, y K, y > 0 X + y Y K, y > 0. 9) Since T and Σ are symmetric and positive definite, the above scalar product can be rewritten as K, y = Σ KT T, Σ y Using the CauchySchwarz inequality and the fact that ab ca + b /c for any a, b and c > 0, we obtain K, y Σ KT T Σ y c Σ KT X + ) c y Y In view of 7), it is clear that there eists a ε > 0 such that it holds + ε) Σ KT =. With this relation, we can perform suitable choices of c and /c such that our estimate becomes ) Σ KT X K, y + ε) Σ KT + y Y + ε) ) X + y ) Y. + ε Inserting this estimate into 9), we obtain ε +ε ) X + y Y ) > 0. Since ε > 0, it becomes clear that the condition 7) ensures positive definiteness of M. Remark. Choosing T = τ I and Σ = σi, for some τ, σ > 0, the assertion 7) reduces to τσ K <, which is the original condition on the step widths τ and σ to ensure convergence of the primaldual algorithm [5]. Note that this criterion requires to compute or to estimate) the operator norm of K which can be difficult in some situations. Theorem. Let θ = and T, Σ be symmetric and positive definite matrices such that Σ KT <. Then, the sequence n, y n ) generated by the iterates 4) converges weakly to an optimal solution, y ) of the saddlepoint problem ). Proof. As shown in Lemma, the conditions θ = and Σ KT < ensure that the matri M defined in 6) is symmetric and positive definite. Hence, the variational inequality 5) takes the form of a proimal point algorithm. Convergence of the iterates 4) to an optimal solution, y ) of ) then follows from the weak convergence of the proimal point algorithm Theorem of [4]). In the net Section we study simple diagonal preconditioners which obey 7) and hence ensure the convergence of the algorithm. Furthermore, computing these preconditioners only requires to compute row and column wise norms of K, which can be done very efficiently... A family of diagonal preconditioners In general, Σ and T could be any symmetric, positive definite maps. However, since it is a fundamental requirement of the proposed algorithm, that the resolvent operators are simple, some care has to be taken in choosing T and Σ. In particular, if G and F are separable in j and y i, the resolvent operators remain simple, if Σ and T are restricted to diagonal matrices. In the following we study a family of preconditioners which satisfy all these requirements and additionally guarantee the convergence of the algorithm. Lemma. Let T = diagτ ), where τ = τ,...τ n ) and Σ = diagσ), where σ = σ,..., σ m ). In particular, τ j = m i= K i,j, σ α i = n j= K 0) i,j α then for any α [0, ] Σ KT Σ KT = sup X, 0. ) Proof. In order to prove the inequality, we need to find an upper bound on Σ KT. We have Σ KT = Σ KT, Σ KT m n = σi K i,j τj j = i= m i= m i= σ i σ i j= n K i,j τj j j= n K i,j α Ki,j α τ j j j= Then, by applying the CauchySchwarz inequality we obtain m n n Σ KT σ i K i,j α K i,j α τ j j i= j= j=
4 By definition of σ i and τ j, the above inequality can be simplified to Σ KT = m i= j= n K i,j α τ j j n m ) K i,j α τ j j =. j= i= Substituting back into the definition of the operator norm, we finally obtain Σ KT Σ KT = sup X, 0 = ) Remark. To ensure positive definiteness in 8) inequality ) has to be strict. This requirement can be easily satisfied by multiplying τ and σ by appropriate positive constants µ and ν such that µν <. However, in practice, we did not observe any convergence problems of the algorithm for µ = ν =. Remark 3. Basically, the diagonal preconditioning leads to dimensiondependent times steps τ j and σ i instead of global steps τ and σ of the algorithm in [5] and hence do not change the computational compleity of the algorithm..3. A remark on the alternating step method A monotropic program is defined as a conve optimization problem of the form h), s.t. A = b, 3) where R n, h) = n j= h j j ) with h j ) being conve functions, A R m n and b R m. Let us denote by a j the j th column of A, by r i ) the i th entry of the vector r) = b A and by q i the number of nonzero entries in the i th row of A. In [7], the alternating step method for monotropic programg has been introduced as a special case of the alternating direction method of multipliers. It is defined as the iterative scheme k+ j = arg j h j j ) + a j, π k j + j λ a j [ k j + ]) m a i,jr i k } ) a j i= q i π k+ i = πi k + λ q i r i k+ ) 4) where π R m is the dual variable and λ > 0 is a parameter which weights between the strengths of the primal and dual updates. Let us now cast 3) in the form of ). Indeed, it can be written very easily as a saddlepoint problem by introducing Lagrange multipliers y for the constraint A = b, giving us ma b A, y + h). 5) y The iterates 4) now become k+ = I + T h) k + TA T y k ) y k+ = y k + Σb A k+ k )) 6) Let us now perform the particular choice α = 0 in 0) to compute T, Σ and perform an additional scaling by the factors /λ and λ, respectively see Remark ). We consequently find that τ j = λ a j and σ i = λ q i, where we used the convention that 0 0 = 0. By a change of variables in the duals, π k = y k Σb A k ), it can be checked that in this particular setting, 6) is equivalent to the alternating step method 4). 3. Numerical results In this Section we perform several numerical eperiments using the proposed algorithm. We first show applications to general linear programg problems and show that the proposed algorithm leads to a significantly faster convergence compared to the algorithm in [5]. Furthermore, we show that for large scale problems, a simple Matlab implementation of the proposed algorithm significantly outperforms a highly optimized interior point solver of Matlab. Finally, we show applications of the proposed algorithm to classical computer vision problems such as graph cuts and imal partitioning problems, where we show that a GPU implementation of the proposed algorithm can easily compete with specialized algorithms such as the maflow algorithm of Boykov and Kolmogorov [3]. In our numerical eperiments, we use the following algorithms and parameter settings: IP: Matlab s LP interior point solver linprog with settings optimized for large scale problems. PD: The primaldual algorithm ) where we set τ = L and σ = L and L = K is estimated using the Matlab command normest. PPD: Preconditioned primal dual algorithm 4) with the preconditioners T and Σ defined in 0) and using α =. In general, the solutions of the problems we consider here are not unique. Therefore, we use the value of the objective functions as a criterion to detere the convergence
5 of the algorithms. We define the relative error of the objective function as δ k = E E k / E, where E refers to the optimal value and E k is the value of the current iterate. Depending of the particular problem, E might be the value of either the primal or the dual problem. In order to detere the optimal value of the objective function we run one method with a very high accuracy. During the evaluation, we run the algorithms until δ k is below a fied threshold of tol = 0 4. Note that for practical purposes, δ k can also be bounded by the primaldual gap. All eperiments were eecuted on a.73 GHz i7 CPU running Matlab and a NVidia GTX480 GPU using the CUDA framework. 3.. Linear programg Linear programs LPs) define an important class of optimization problems in computer vision. See for eample the recent work [5] for a huge scale) LP formulation of curvature dependent segmentation and inpainting models. Therefore, there is a strong demand for efficient solver which can handle very large problems. Here, we consider LPs in socalled equality form: c T s.t. A = b, 0, 7) where, c R n, b R m and A R m n. Note that it is straightforward to transform other standard forms to this form []. The above LP can be easily reformulated in terms of a saddlepoint problem by introducing Lagrange multipliers y to account for the equality constraints: ma A b, y + c,, s.t. 0 y Applying PPD to 7) we obtain the following simple iterates k+ = proj [0, ) k TA T y k + c) ) y k+ = y k + ΣA k+ k ) b), 8) where proj [0, ) ) can be implemented by simple truncation operations. In our first eample, we compare IP, PD and PPD on two standard LP test problems that come along with the Matlab package. To gain more insight to the proposed preconditioning technique, we compare the global steps of PD with the dimensiondependent adaptive steps of PPD. sc50b: This problem consists of 48 variables, 30 inequalities and 0 equalities. We find L = and set τ = σ = for PD. For PPD we find that the average primal step is τ = and the average dual step is σ = Therefore, PPD can take in average approimately.8 times larger primal and dual steps than PD. densecolumn: This problem consists of 677 variables and 67 equalities with one dense column in the linear operator. We find L = and consequently set τ = σ = for PD. For PPD, we find the average primal and dual steps to be τ = and σ = Hence PPD can take on average approimately 40 times larger primal and 5 times larger dual steps. Table shows the results of the performance evaluation on the standard LP test problems. The optimal value of the objective function was detered by running IP with a very high accuracy. This eamples clearly demonstrate the advantage of the proposed algorithm. On sc50b P PD is.4 times faster than PD while on densecolumn, PPD is more than 00 times faster than PD. Especially, densecolumn has a very irregular structure which can be handled much better by the preconditioned algorithm. Clearly, on smallscale problems, the simple primaldual algorithms can not compete with the highly optimized IP. However, as we will see in the net section, this is quite different for largescale problems. Figure depicts convergence plots for PD and PPD for the two LP test problems. IP PD PPD sc50b 0.0s.75s 0.49s densecolumn 0.7s 68.5s 0.6s Table. Comparison of of IP, PD and PPD on two standard LP test problems. 3.. TVL image denoising The TVL model [, 6] is a popular image denoising model. It consists of total variation TV) regularization and a L data fidelity term. It is known that the TVL model is very effective in removing strong outliers such as salt and pepper noise in images. We assume discrete images defined on a regular Cartesian grid of size M N with indices i, j). The anisotropic) TVL model in a discrete setting is defined as the l imization problem u Du l + λ u f l, 9) where f = f i,j ) R MN is the noisy input image and u = u i,j ) R MN is the denoised image. The free parameter λ > 0 is used to control the tradeoff between smoothing and data fidelity. The linear operator D R MN MN is the discretized gradient operator defined by ) Du) Du) i,j = i,j Du), 0) i,j where Du) i,j = u i+,j u i,j if i < M 0 if i = M,
6 a) 3 phases b) Output c) 4 phases d) Output e) Natural f) Segmentation 8 phases) Figure 3. Figures ab) and cd) show the result of standard Dirichlet problems. Figures ef) show the segmentation of a natural image into 8 regions. usual isotropic total variation. The only modification is to replace the projections of the duals onto bo constraints by projections onto l balls. Figure shows the result of PD and PPD applied to a binary image segmentation problem. Let I [0, ] 3MN be the input image. The unary terms were computed by first computing the average RGB values µ f R 3 and µ b R 3 in the user specified foreground and background regions and then by setting wi,j u = α I i,j µ f I i,j µ b ). The binary terms were computed by w b, i,j = ep β I i+,j I i,j ) and w b, i,j = ep β I i,j+ I i,j ), where we set α = and β = 0. We used the popular maimum flow algorithm of Boykov and Kolmogorov [3] to compute the optimal value of the primal objective function ). We then ran PD and PPD until the relative error δ k was below tol. Table 3 presents the tigs. Although a Matlab implementation of PPD cannot compete with the highly optimized maimum flow algorithm, a parallel GPU implementation is already significantly faster. MAXFLOW PD PPD PPDGPU 0.60s 5.75s 8.56s 0.045s Table 3. Comparison of the proposed algorithm for graph cut segmentation Minimal partitions In the last eample we quickly eplain the application of the proposed preconditioned primaldual algorithm for computing imal partitions. Following [5] and references therein, a conve relaation of the multilabel Potts model can be computed by solving the saddle point problem where S = ma u S q B k Du l, q l + u l, f l, 3) l= u = u l ) k l= : } k u l =, u l 0, l= 4) is the usual simple constraint and the conve set B = Bm n, 5) m<n k can be written as the intersection of balls of the form Bm n = q = ql ), ql )) k l= : q m q n ) i,j, i, j }, 6) The linear operator D is the discretized gradient operator as defined in 0), u = u l ) k l=, is the labeling function which represents a relaed assignment of the image piels to each region, f = f l ) k l= is some given weight function and q = ql ), q l ))k l= are the dual variables. The original approach taken in [5] is to compute the projections with respect to S and B by solving auiliary optimization problems. This approach has the disadvantage that the inner optimization problems have to be computed with sufficient accuracy such that the overall algorithm converges, which can heavily slow down the algorithm. Here, we take a different approach, which is closely related to the multiple operator splitting method proposed in []. We first introduce slack variables b for the epressions q m q n in 6). Then we introduce Lagrange multipliers λ to account for the equality constraints l u l = in 4) and µ to account for the equality constraints b n m = q m q n. Together, this yields the following augmented saddlepoint problem u,µ ma q,b,λ l k= Du l, q l + u l, f l + P u, λ + b Qq, µ s.t. u l ) i,j 0, b n m) i,j, 7) where the linear operators P and Q are such that P u) i,j = k u l ) i,j, Qq) n m) i,j = q m q n ) i,j. l= Note that 7) now depends only on simple pointwise constraints for which the projections can be implemented very efficiently. To cast the above problem into the form of ), it remains to arrange all primal variables in a vector, all
7 dual variables into a vector y and all linear operators into a global linear operator K. Then, applying the preconditioning techniques proposed in this paper leads to an algorithm that is guaranteed to converge to the optimal solution without the need to solve any inner optimization problems. Figure 3 shows some results of standard imal partitioning and segmentation problems. We compared the original approach solving inner optimization problems and using PD to PPD applied to 7). We first precomputed the optimal solution using a large number of iterations and then recorded the time until the error is below a threshold of tol. The tigs are presented in Table 4. In all cases, the proposed algorithm clearly outperforms the original approach of [5]. PD PPD Speedup Synthetic 3 phases).7s 75.65s.9 Synthetic 4 phases) 39.0s s.6 Natural 8 phases) 59.85s 3.76s 5. Table 4. Comparison of the proposed algorithm on partitioning problems. 4. Conclusion In this paper we have proposed a simple preconditioning technique to improve the performance of the firstorder primaldual algorithm proposed in [3, 5]. The proposed diagonal preconditioners can be computed efficiently and guarantee the convergence of the algorithm without the need to estimate any step size parameters. In several numerical eperiments, we have shown that the proposed algorithm significantly outperforms the algorithm in [5]. Furthermore, on large scale linear programg problems, an unoptimized implementation of the proposed algorithm easily outperforms a highly optimized interior point solver and a GPU implementation of the proposed algorithm can easily compete with specialized combinatorial algorithms for computing imum cuts. We believe that the proposed algorithm can become a standard algorithm in computer vision since it can be applied to a large class of conve optimization problems arising in computer vision and has the potential for parallel computing. Future work will mainly concentrate on the development of more sophisticated preconditioners that are different from diagonal matrices. References [] A. Bhusnurmath and C. Taylor. Graph cuts via l norm imization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:866 87, 008. [] S. Boyd and L. Vandenberghe. Conve Optimization. Cambridge University Press, 004. [3] Y. Boykov and V. Kolmogorov. An eperimental comparison of cut/maflow algorithms for energy imization in computer vision. In A. J. M. Figueiredo, J. Zerubia, editor, Energy Minimization Methods in Computer Vision and Pattern Recognition, volume 34 of LNCS, pages Springer, 00. [4] A. Chambolle. Total variation imization and a class of binary MRF models. In Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 36 5, 005. [5] A. Chambolle and T. Pock. A firstorder primaldual algorithm for conve problems with applications to imaging. Journal of Mathematical Imaging and Vision, 00. online first. [6] T. Chan and S. Esedoglu. Aspects of total variation regularized L function approimation. SIAM J. Appl. Math., 655):87 837, 004. [7] J. Eckstein. Splitting Methods for Monotone Operators with Application to Parallel Optimization. PhD thesis, MIT, 989. [8] E. Esser, X. Zhang, and T. Chan. A general framework for a class of first order primaldual algorithms for conve optimization in imaging science. SIAM J. Imaging Sci., 3:05 046, 00. [9] L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, Princeton, New Jersey, 96. [0] B. He and X. Yuan. Convergence analysis of primaldual algorithms for total variation image restoration. Technical report, Nanjing University, China, 00. [] J. Lellmann, D. Breitenreicher, and C. Schnörr. Fast and eact primaldual iterations for variational problems in computer vision. In K. Daniilidis, P. Maragos, and N. Paragios, editors, European Conference on Computer Vision ECCV), volume 63 of LNCS, pages Springer Berlin / Heidelberg, 00. [] M. Nikolova. A variational approach to remove outliers and impulse noise. Journal of Mathematical Imaging and Vision, 0):99 0, 004. [3] T. Pock, D. Cremers, H. Bischof, and A. Chambolle. An algorithm for imizing the MumfordShah functional. In International Conference on Computer Vision ICCV), 009. [4] R. T. Rockafellar. Monotone operators and the proimal point algorithm. SIAM Journal on Control and Optimization, 45): , 976. [5] T. Schoenemann, F. Kahl, and D. Cremers. Curvature regularity for regionbased image segmentation and inpainting: A linear programg relaation. In IEEE International Conference on Computer Vision ICCV), Kyoto, Japan, 009. [6] X. Zhang, M. Burger, and S. Osher. A unified primaldual algorithm framework based onbregman iteration. Journal of Scientific Computing, 46:0 46, 0.
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationRECENTLY, there has been a great deal of interest in
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 187 An Affine Scaling Methodology for Best Basis Selection Bhaskar D. Rao, Senior Member, IEEE, Kenneth KreutzDelgado, Senior Member,
More informationAn Experimental Comparison of MinCut/MaxFlow Algorithms for Energy Minimization in Vision
In IEEE Transactions on PAMI, Vol. 26, No. 9, pp. 11241137, Sept. 2004 p.1 An Experimental Comparison of MinCut/MaxFlow Algorithms for Energy Minimization in Vision Yuri Boykov and Vladimir Kolmogorov
More informationOptimization with SparsityInducing Penalties. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with SparsityInducing Penalties By
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationOrthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More informationA RateDistortion OneClass Model and its Applications to Clustering
and its Applications to Clustering Koby Crammer crammer@cis.upenn.edu Partha Pratim Talukdar partha@cis.upenn.edu Department of Computer & Information Science, University of Pennsylvania, Philadelphia,
More informationFast Solution of l 1 norm Minimization Problems When the Solution May be Sparse
Fast Solution of l 1 norm Minimization Problems When the Solution May be Sparse David L. Donoho and Yaakov Tsaig October 6 Abstract The minimum l 1 norm solution to an underdetermined system of linear
More informationProceedings of International Conference on Computer Vision (ICCV), Sydney, Australia, Dec. 2013 p.1. GrabCut in One Cut
Proceedings of International Conference on Computer Vision (ICCV), Sydney, Australia, Dec. 203 p. GrabCut in One Cut Meng Tang Lena Gorelick Olga Veksler Yuri Boykov University of Western Ontario Canada
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationNew Results on Hermitian Matrix RankOne Decomposition
New Results on Hermitian Matrix RankOne Decomposition Wenbao Ai Yongwei Huang Shuzhong Zhang June 22, 2009 Abstract In this paper, we present several new rankone decomposition theorems for Hermitian
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationA Flexible New Technique for Camera Calibration
A Flexible New Technique for Camera Calibration Zhengyou Zhang December 2, 1998 (updated on December 14, 1998) (updated on March 25, 1999) (updated on Aug. 1, 22; a typo in Appendix B) (updated on Aug.
More informationObject Segmentation by Long Term Analysis of Point Trajectories
Object Segmentation by Long Term Analysis of Point Trajectories Thomas Brox 1,2 and Jitendra Malik 1 1 University of California at Berkeley 2 AlbertLudwigsUniversity of Freiburg, Germany {brox,malik}@eecs.berkeley.edu
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationCOSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES
COSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES D NEEDELL AND J A TROPP Abstract Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationFrom Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract
To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter
More informationObject Detection with Discriminatively Trained Part Based Models
1 Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an object detection system based on mixtures
More informationCompressed Sensing with Coherent and Redundant Dictionaries
Compressed Sensing with Coherent and Redundant Dictionaries Emmanuel J. Candès 1, Yonina C. Eldar 2, Deanna Needell 1, and Paige Randall 3 1 Departments of Mathematics and Statistics, Stanford University,
More informationHighRate Codes That Are Linear in Space and Time
1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 HighRate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multipleantenna systems that operate
More informationFeature Sensitive Surface Extraction from Volume Data
Feature Sensitive Surface Extraction from Volume Data Leif P. Kobbelt Mario Botsch Ulrich Schwanecke HansPeter Seidel Computer Graphics Group, RWTHAachen Computer Graphics Group, MPI Saarbrücken Figure
More informationOptimization by Direct Search: New Perspectives on Some Classical and Modern Methods
SIAM REVIEW Vol. 45,No. 3,pp. 385 482 c 2003 Society for Industrial and Applied Mathematics Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods Tamara G. Kolda Robert Michael
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationHYBRID systems involve a combination of discrete and continuous
UNPUBLISHED REPORT Formal Semantics and Analysis Methods for Simulink Stateflow Models A. Tiwari Abstract Embedded control systems typically comprise continuous control laws combined with discrete mode
More informationComputing Personalized PageRank Quickly by Exploiting Graph Structures
Computing Personalized PageRank Quickly by Exploiting Graph Structures Takanori Maehara Takuya Akiba Yoichi Iwata Ken ichi Kawarabayashi National Institute of Informatics, The University of Tokyo JST,
More information