Universal Regularizers For Robust Sparse Coding and Modeling

Size: px
Start display at page:

Download "Universal Regularizers For Robust Sparse Coding and Modeling"

Transcription

1 1 Universl Regulrizers For Robust Sprse Coding nd Modeling Igncio Rmírez nd Guillermo Spiro Deprtment of Electricl nd Computer Engineering University of Minnesot Abstrct rxiv: v2 [cs.it] 3 Aug 21 Sprse dt models, where dt is ssumed to be well represented s liner combintion of few elements from dictionry, hve gined considerble ttention in recent yers, nd their use hs led to stte-of-the-rt results in mny signl nd imge processing tsks. It is now well understood tht the choice of the sprsity regulriztion term is criticl in the success of such models. Bsed on codelength minimiztion interprettion of sprse coding, nd using tools from universl coding theory, we propose frmework for designing sprsity regulriztion terms which hve theoreticl nd prcticl dvntges when compred to the more stndrd l or l 1 ones. The presenttion of the frmework nd theoreticl foundtions is complemented with exmples tht show its prcticl dvntges in imge denoising, zooming nd clssifiction. I. INTRODUCTION Sprse modeling clls for constructing succinct representtion of some dt s combintion of few typicl ptterns (toms) lerned from the dt itself. Significnt contributions to the theory nd prctice of lerning such collections of toms (usully clled dictionries or codebooks), e.g., [1], [14], [33], nd of representing the ctul dt in terms of them, e.g., [8], [11], [12], hve been developed in recent yers, leding to stte-of-the-rt results in mny signl nd imge processing tsks [24], [26], [27], [34]. We refer the reder for exmple to [4] for recent review on the subject. A criticl component of sprse modeling is the ctul sprsity of the representtion, which is controlled by regulriztion term (regulrizer for short) nd its ssocited prmeters. The choice of the functionl form of the regulrizer nd its prmeters is chllenging tsk. Severl solutions to this problem hve been proposed in the literture, rnging from the utomtic tuning of the prmeters [2] to Byesin models, where these prmeters re themselves considered s rndom vribles [17], [2], [51]. In this work we dopt the interprettion of sprse coding s codelength minimiztion problem. This is nturl nd objective method for ssessing the qulity of sttisticl model for describing given dt, nd which is bsed on the Minimum Description Length (MDL) principle [37]. In this frmework, the regulriztion term in the sprse coding formultion is interpreted s the cost in bits of describing the sprse liner coefficients used to reconstruct the dt. Severl works on imge coding using this pproch were developed in the 199 s under the nme of complexity-bsed or compression-bsed coding, following the populriztion of MDL s powerful sttisticl modeling tool [9], [31], [4]. The focus on these erly works ws in denoising using wvelet bsis, using either generic symptotic results from MDL or fixed probbility models, in order to compute the description length of the coefficients. A lter, mjor brekthrough in MDL theory ws the doption of universl coding tools to compute optiml codelengths. In this work, we improve nd extend on previous results in this line of work by designing regulriztion terms bsed on such universl codes for imge coefficients, mening tht the codelengths obtined when encoding the coefficients of ny (nturl) imge with such codes will be close to the shortest codelengths tht cn be obtined with ny model fitted specificlly for tht prticulr instnce of coefficients. The resulting frmework not only formlizes sprse coding from the MDL nd universl coding perspectives but lso leds to fmily of universl regulrizers which we show to consistently improve results in imge processing tsks such s denoising nd clssifiction. These models lso enjoy severl desirble theoreticl nd prcticl properties such s sttisticl consistency (in certin cses), improved robustness to outliers in the dt, nd improved sprse signl recovery (e.g., decoding of sprse signls from compressive sensing point of view [5]) when compred with the trditionl l nd l 1 -bsed techniques in prctice. These models lso yield to the use of simple nd efficient optimiztion technique for solving the corresponding sprse coding

2 2 problems s series of weighted l 1 subproblems, which in turn, cn be solved with off-the-shelf lgorithms such s LARS [12] or IST [11]. Detils re given in the sequel. Finlly, we pply our universl regulrizers not only for coding using fixed dictionries, but lso for lerning the dictionries themselves, leding to further improvements in ll the forementioned tsks. The reminder of this pper is orgnized s follows: in Section II we introduce the stndrd frmework of sprse modeling. Section III is dedicted to the derivtion of our proposed universl sprse modeling frmework, while Section IV dels with its implementtion. Section V presents experimentl results showing the prcticl benefits of the proposed frmework in imge denoising, zooming nd clssifiction tsks. Concluding remrks re given in Section VI. II. SPARSE MODELING AND THE NEED FOR BETTER MODELS Let X R M N be set of N column dt smples x j R M, D R M K dictionry of K column toms d k R M, nd A R K N, j R K, set of reconstruction coefficients such tht X = D A. We use T k to denote the k-th row of A, the coefficients ssocited to the k-th tom in D. For ech j = 1,..., N we define the ctive set of j s A j = {k : kj, 1 k K}, nd j = A j s its crdinlity. The gol of sprse modeling is to design dictionry D such tht for ll or most dt smples x j, there exists coefficients vector j such tht x j D j nd j is smll (usully below some threshold L K). Formlly, we would like to solve the following problem N ψ( j ) s.t. x j D j 2 2 ɛ, j = 1,..., N, (1) min D,A j=1 where ψ( ) is regulriztion term which induces sprsity in the columns of the solution A. Usully the constrint d k 2 1, k = 1,..., K, is dded, since otherwise we cn lwys decrese the cost function rbitrrily by multiplying D by lrge constnt nd dividing A by the sme constnt. When D is fixed, the problem of finding sprse j for ech smple x j is clled sprse coding, j = rg min ψ( j ) s.t. x j D j 2 2 ɛ. (2) Among possible choices of ψ( ) re the l pseudo-norm, ψ( ) =, nd the l 1 norm. The former tries to solve directly for the sprsest j, but since it is non-convex, it is commonly replced by the l 1 norm, which is its closest convex pproximtion. Furthermore, under certin conditions on (fixed) D nd the sprsity of j, the solutions to the l nd l 1 -bsed sprse coding problems coincide (see for exmple [5]). The problem (1) is lso usully formulted in Lgrngin form, min D,A long with its respective sprse coding problem when D is fixed, N x j D j λψ( j), (3) j=1 j = rg min x j D λψ(). (4) Even when the regulrizer ψ( ) is convex, the sprse modeling problem, in ny of its forms, is jointly non-convex in (D, A). Therefore, the stndrd pproch to find n pproximte solution is to use lternte minimiztion: strting with n initil dictionry D (), we minimize (3) lterntively in A vi (2) or (4) (sprse coding step), nd D (dictionry updte step). The sprse coding step cn be solved efficiently when ψ( ) = 1 using for exmple IST [11] or LARS [12], or with OMP [28] when ψ( ) =. The dictionry updte step cn be done using for exmple MOD [14] or K-SVD [1]. A. Interprettions of the sprse coding problem We now turn our ttention to the sprse coding problem: given fixed dictionry D, for ech smple vector x j, compute the sprsest vector of coefficients j tht yields good pproximtion of x j. The sprse coding problem dmits severl interprettions. Wht follows is summry of these interprettions nd the insights tht they provide into the properties of the sprse models tht re relevnt to our derivtion.

3 3 1) Model selection in sttistics: Using the l norm s ψ( ) in (4) is known in the sttistics community s the Akike s Informtion Criterion (AIC) when λ = 1, or the Byes Informtion Criterion (BIC) when λ = 1 2 log M, two populr forms of model selection (see [22, Chpter 7]). In this context, the l 1 regulrizer ws introduced in [43], gin s convex pproximtion of the bove model selection methods, nd is commonly known (either in its constrined or Lgrngin forms) s the Lsso. Note however tht, in the regression interprettion of (4), the role of D nd X is very different. 2) Mximum posteriori: Another interprettion of (4) is tht of mximum posteriori (MAP) estimtion of j in the logrithmic scle, tht is j = rg mx {log P ( x j )} = rg mx {log P (x j ) + log P ()} = rg min { log P (x j ) log P ()}, (5) where the observed smples x j re ssumed to be contminted with dditive, zero men, IID Gussin noise with vrince σ 2, P (x j ) e 1 2σ 2 x j D 2 2, nd prior probbility model on with the form P () e θψ() is considered. The energy term in Eqution (4) follows by plugging the previous two probbility models into (5) nd fctorizing 2σ 2 into λ = 2σ 2 θ. According to (5), the l 1 regulrizer corresponds to n IID Lplcin prior with men nd inverse-scle prmeter θ, P () = K k=1 θe θ k = θ K e θ 1, which hs specil mening in signl processing tsks such s imge or udio compression. This is due to the widely ccepted fct tht representtion coefficients derived from predictive coding of continuous-vlued signls, nd, more generlly, responses from zeromen filters, re well modeled using Lplcin distributions. For exmple, for the specil cse of DCT coefficients of imge ptches, n nlyticl study of this phenomenon is provided in [25], long with further references on the subject. 3) Codelength minimiztion: Sprse coding, in ll its forms, hs yet nother importnt interprettion. Suppose tht we hve fixed dictionry D nd tht we wnt to use it to compress n imge, either losslessly by encoding the reconstruction coefficients A nd the residul X DA, or in lossy mnner, by obtining good pproximtion X DA nd encoding only A. Consider for exmple the ltter cse. Most modern compression schemes consist of two prts: probbility ssignment stge, where the dt, in this cse A, is ssigned probbility P (A), nd n encoding stge, where code C(A) of length L(A) bits is ssigned to the dt given its probbility, so tht L(A) is s short s possible. The techniques known s Arithmetic nd Huffmn coding provide the best possible solution for the encoding step, which is to pproximte the Shnnon idel codelength L(A) = log P (A) [1, Chpter 5]. Therefore, modern compression theory dels with finding the coefficients A tht mximize P (A), or, equivlently, tht minimize log P (A). Now, to encode X lossily, we obtin coefficients A such tht ech dt smple x j is pproximted up to certin l 2 distortion ɛ, x j D j 2 2 ɛ. Therefore, given model P () for vector of reconstruction coefficients, nd ssuming tht we encode ech smple independently, the optimum vector of coefficients j for ech smple x j will be the solution to the optimiztion problem j = rg min log P () s.t. x j D j 2 2 ɛ, (6) which, for the choice P () e ψ() coincides with the error constrined sprse coding problem (2). Suppose now tht we wnt lossless compression. In this cse we lso need to encode the reconstruction residul x j D j. Since P (x, ) = P (x )P (), the combined codelength will be L(x j, j ) = log P (x j, j ) = log P (x j j ) log P ( j ). (7) Therefore, obtining the best coefficients j mounts to solving min L(x j, j ), which is precisely the MAP formultion of (5), which in turn, for proper choices of P (x ) nd P (), leds to the Lgrngin form of sprse coding (4). 1 1 Lplcin models, s well s Gussin models, re probbility distributions over R, chrcterized by continuous probbility density functions, f() = F (), F () = P (x ). If the reconstruction coefficients re considered rel numbers, under ny of these distributions, ny instnce of A R K N will hve mesure, tht is, P (A) =. In order to use such distributions s our models for the dt, we ssume tht the coefficients in A re quntized to precision, smll enough for the density function f() to be pproximtely constnt in ny intervl [ /2, + /2], R, so tht we cn pproximte P () f(), R. Under these ssumptions, log P () log f() log, nd the effect of on the codelength produced by ny model is the sme. Therefore, we will omit in the sequel, nd tret density functions nd probbility distributions interchngebly s P ( ). Of course, in rel compression pplictions, needs to be tuned.

4 4 Fig. 1. Stndrd 8 8 DCT dictionry (), globl empiricl distribution of the coefficients in A (b, log scle), empiricl distributions of the coefficients ssocited to ech of the K = 64 DCT toms (c, log scle). The distributions in (c) hve similr hevy tiled shpe (hevier thn Lplcin), but the vrince in ech cse cn be significntly different. (d) Histogrm of the K = 64 different ˆθ k vlues obtined by fitting Lplcin distribution to ech row T k of A. Note tht there re significnt occurrences between ˆθ = 5 to ˆθ = 25. The coefficients A used in (b-d) were obtined from encoding ptches (fter removing their DC component) rndomly smpled from the Pscl 26 dtset of nturl imges [15]. (e) Histogrms showing the sptil vribility of the best locl estimtions of ˆθ k for few rows of A cross different regions of n imge. In this cse, the coefficients A correspond to the sprse encoding of ll 8 8 ptches from single imge, in scn-line order. For ech k, ech vlue of ˆθ k ws computed from rndom contiguous block of 25 smples from T k. The procedure ws repeted 4 times to obtin n empiricl distribution. The wide supports of the empiricl distributions indicte tht the estimted ˆθ cn hve very different vlues, even for the sme tom, depending on the region of the dt from where the coefficients re tken. As one cn see, the codelength interprettion of sprse coding is ble to unify nd interpret both the constrined nd unconstrined formultions into one consistent frmework. Furthermore, this frmework offers nturl nd objective mesure for compring the qulity of different models P (x ) nd P () in terms of the codelengths obtined. 4) Remrks on relted work: As mentioned in the introduction, the codelength interprettion of signl coding ws lredy studied in the context of orthogonl wvelet-bsed denoising. An erly exmple of this line of work considers regulriztion term which uses the Shnnon Entropy function p i log p i to give mesure of the sprsity of the solution [9]. However, the Entropy function is not used s mesure of the idel codelength for describing the coefficients, but s mesure of the sprsity (ctully, group sprsity) of the solution. The MDL principle ws pplied to the signl estimtion problem in [4]. In this cse, the codelength term includes the description of both the loction nd the mgnitude of the nonzero coefficients. Although pioneering effort, the model ssumed in [4] for the coefficient mgnitude is uniform distribution on [, 1], which does not exploit priori knowledge of imge coefficient sttistics, nd the description of the support is slightly wsteful. Furthermore, the codelength expression used is n symptotic result, ctully equivlent to BIC (see Section II-A1) which cn be misleding when working with smll smple sizes, such s when encoding smll imge ptches, s in current stte of the rt imge processing pplictions. The uniform distribution ws lter replced by the universl code for integers [38] in [31]. However, s in [4], the model is so generl tht it does not perform well for the specific cse of coefficients rising from imge decompositions, leding to poor results. In contrst, our models re derived following creful nlysis of imge coefficient sttistics. Finlly, probbility models suitble to imge coefficient sttistics of the form P () e β (known s generlized Gussins) were pplied to the MDL-bsed signl coding nd estimtion frmework in [31]. The justifiction for such models is bsed on the empiricl observtion tht sprse coefficients sttistics exhibit hevy tils (see next section). However, the choice is d hoc nd no optimlity criterion is vilble to compre it with other possibilities. Moreover, there is no closed form solution for performing prmeter estimtion on such fmily of models, requiring numericl optimiztion techniques. In Section III, we derive number of probbility models for which prmeter estimtion cn be computed efficiently in closed form, nd which re gurnteed to optimlly describe imge coefficients. B. The need for better model As explined in the previous subsection, the use of the l 1 regulrizer implies tht ll the coefficients in A shre the sme Lplcin prmeter θ. However, s noted in [25] nd references therein, the empiricl vrince of coefficients ssocited to different toms, tht is, of the different rows T k of A, vries gretly with k = 1..., K. This is clerly seen in Figures 1(-c), which show the empiricl distribution of DCT coefficients of 8 8 ptches.

5 5 As the vrince of Lplcin is 2/θ 2, different vrinces indicte different underlying θ. The histogrm of the set {ˆθk, k = 1,..., K} of estimted Lplcin prmeters for ech row k, Figure 1(d), shows tht this is indeed the cse, with significnt occurrences of vlues of ˆθ in rnge of 5 to 25. The strightforwrd modifiction suggested by this phenomenon is to use model where ech row of A hs its own weight ssocited to it, leding to weighted l 1 regulrizer. However, from modeling perspective, this results in K prmeters to be djusted insted of just one, which often results in poor generliztion properties. For exmple, in the cses studied in Section V, even with thousnds of imges for lerning these prmeters, the results of pplying the lerned model to new imges were lwys significntly worse (over 1dB in estimtion problems) when compred to those obtined using simpler models such s n unweighted l 1. 2 One reson for this filure my be tht rel imges, s well s other types of signls such s udio smples, re fr from sttionry. In this cse, even if ech tom k is ssocited to its own θ k (λ k ), the optiml vlue of θ k cn hve significnt locl vritions t different positions or times. This effect is shown in Figure 1(e), where, for ech k, ech θ k ws re-estimted severl times using smples from different regions of n imge, nd the histogrm of the different estimted vlues of ˆθ k ws computed. Here gin we used the DCT bsis s the dictionry D. The need for flexible model which t the sme time hs smll number of prmeters leds nturlly to Byesin formultions where the different possible λ k re mrginlized out by imposing n hyper-prior distribution on λ, smpling λ using its posterior distribution, nd then verging the estimtes obtined with the smpled sprsecoding problems. Exmples of this recent line of work, nd the closely relted Byesin Compressive Sensing, re developed for exmple in [23], [44], [49], [48]. Despite of its promising results, the Byesin pproch is often criticized due to the potentilly expensive smpling process (something which cn be reduced for certin choices of the priors involved [23]), rbitrriness in the choice of the priors, nd lck of proper theoreticl justifiction for the proposed models [48]. In this work we pursue the sme gol of deriving more flexible nd ccurte sprse model thn the trditionl ones, while voiding n increse in the number of prmeters nd the burden of possibly solving severl smpled instnces of the sprse coding problem. For this, we deploy tools from the very successful informtion-theoretic field of universl coding, which is n extension of the compression scenrio summrized bove in Section II-A, when the probbility model for the dt to be described is itself unknown nd hs to be described s well. III. UNIVERSAL MODELS FOR SPARSE CODING Following the discussion in the preceding section, we now hve severl possible scenrios to del with. First, we my still wnt to consider single vlue of θ to work well for ll the coefficients in A, nd try to design sprse coding scheme tht does not depend on prior knowledge on the vlue of θ. Secondly, we cn consider n independent (but not identiclly distributed) Lplcin model where the underlying prmeter θ cn be different for ech tom d k, k = 1,..., K. In the most extreme scenrio, we cn consider ech single coefficient kj in A to hve its own unknown underlying θ kj nd yet, we would like to encode ech of these coefficients (lmost) s if we knew its hidden prmeter. The first two scenrios re the ones which fit the originl purpose of universl coding theory [29], which is the design of optiml codes for dt whose probbility models re unknown, nd the models themselves re to be encoded s well in the compressed representtion. Now we develop the bsic ides nd techniques of universl coding pplied to the first scenrio, where the problem is to describe A s n IID Lplcin with unknown prmeter θ. Assuming known prmetric form for the prior, with unknown prmeter θ, leds to the concept of model clss. In our cse, we consider the clss M = {P (A θ) : θ Θ} of ll IID Lplcin models over A R K N, where N K P (A θ) = P ( kj θ), P ( kj θ) = θe θ kj j=1 k=1 nd Θ R +. The gol of universl coding is to find probbility model Q(A) which cn fit A s well s the model in M tht best fits A fter hving observed it. A model Q(A) with this property is clled universl (with respect to the model M). 2 Note tht this is the cse when the weights re found by mximum likelihood. Other pplictions of weighted l 1 regulrizers, using other types of weighting strtegies, re known to improve over l 1-bsed ones for certin pplictions (see e.g. [51]).

6 6 For simplicity, in the following discussion we consider the coefficient mtrix A to be rrnged s single long column vector of length n = K N, = ( 1,..., n ). We lso use the letter without sub-index to denote the vlue of rndom vrible representing coefficient vlues. First we need to define criterion for compring the fitting qulity of different models. In universl coding theory this is done in terms of the codelengths L() required by ech model to describe. If the model consists of single probbility distribution P ( ), we know from Section II-A3 tht the optimum codelength corresponds to L P () = log P (). Moreover, this reltionship defines one-to-one correspondence between distributions nd codelengths, so tht for ny coding scheme L Q (), Q() = 2 LQ(). Now suppose tht we re restricted to clss of models M, nd tht we need choose the model ˆP M tht ssigns the shortest codelength to prticulr instnce of. We then hve tht ˆP is the model in M tht ssigns the mximum probbility to. For clss M prmetrized by θ, this corresponds to ˆP = P ( ˆθ()), where ˆθ() is the mximum likelihood estimtor (MLE) of the model clss prmeter θ given (we will usully omit the rgument nd just write ˆθ). Unfortuntely, we lso need to include the vlue of ˆθ in the description of for the decoder to be ble to reconstruct it from the code C(). Thus, we hve tht ny model Q() inducing vlid codelengths L Q () will hve L Q () > log P ( ˆθ). The overhed of L Q () with respect to log P ( ˆθ) is known s the codelength regret, R(, Q) := L Q () ( log P ( ˆθ())) = log Q() + log P ( ˆθ())). A model Q() (or, more precisely, sequence of models, one for ech dt length n) is clled universl if R(, Q) grows sublinerly in n for ll possible reliztions of, tht is 1 n R(, Q), Rn, so tht the codelength regret with respect to the MLE becomes symptoticlly negligible. There re number of wys to construct universl probbility models. The simplest one is the so clled twoprt code, where the dt is described in two prts. The first prt describes the optiml prmeter ˆθ() nd the second prt describes the dt ccording to the model with the vlue of the estimted prmeter ˆθ, P ( ˆθ()). For uncountble prmeter spces Θ, such s compct subset of R, the vlue of ˆθ hs to be quntized in order to be described with finite number of bits d. We cll the quntized prmeter ˆθ d. The regret for this model is thus R(, Q) = L(ˆθ d ) + L( ˆθ d ) L( ˆθ) = L(ˆθ d ) log P ( ˆθ d ) ( log P ( ˆθ)). The key for this model to be universl is in the choice of the quntiztion step for the prmeter ˆθ, so tht both its description L(ˆθ d ), nd the difference log P ( ˆθ d ) ( log P ( ˆθ)), grow sublinerly. This cn be chieved by letting the quntiztion step shrink s O(1/ n) [37], thus requiring d = O(.5 log n) bits to describe ech dimension of ˆθ d. This gives us totl regret for two-prt codes which grows s dim(θ) 2 log n, where dim(θ) is the dimension of the prmeter spce Θ. Another importnt universl code is the so clled Normlized Mximum Likelihood (NML) [42]. In this cse the universl model Q () corresponds to the model tht minimizes the worst cse regret, Q () = min Q mx { log Q() + log P ( ˆθ())}, which cn be written in closed form s Q () = P ( ˆθ()) C(M,n), where the normliztion constnt C(M, n) := P ( ˆθ())d R n is the vlue of the minimx regret nd depends only on M nd the length of the dt n. 3 Note tht the NML model requires C(M, n) to be finite, something which is often not the cse. The two previous exmples re good for ssigning probbility to coefficients tht hve lredy been computed, but they cnnot be used s model for computing the coefficients themselves since they depend on hving observed them in the first plce. For this nd other resons tht will become clerer lter, we concentrte our work on third importnt fmily of universl codes derived from the so clled mixture models (lso clled Byesin mixtures). In 3 The minimx optimlity of Q () derives from the fct tht it defines complete uniquely decodble code for ll dt of length n, tht is, it stisfies the Krft inequlity with equlity. R n 2 L Q () = 1. Since every uniquely decodble code with lengths {L Q() : R n } must stisfy the Krft inequlity (see [1, Chpter 5]), if there exists vlue of such tht L Q() < L Q () (tht is 2 L Q() > 2 L Q () ), then there exists vector for which L Q( ) > L Q ( ) for the Krft inequlity to hold. Therefore the regret of Q for is necessrily greter thn C(M, n), which shows tht Q is minimx optiml.

7 7 mixture model, Q() is convex mixture of ll the models P ( θ) in M, indexed by the model prmeter θ, Q() = Θ P ( θ)w(θ)dθ, where w(θ) specifies the weight of ech model. Being convex mixture implies tht w(θ) nd Θ w(θ)dθ = 1, thus w(θ) is itself probbility mesure over Θ. We will restrict ourselves to the prticulr cse when is considered sequence of independent rndom vribles, 4 n Q() = Q j ( j ), Q j ( j ) = P ( j θ)w j (θ)dθ, (8) j=1 where the mixing function w j (θ) cn be different for ech smple j. An importnt prticulr cse of this scheme is the so clled Sequentil Byes code, in which w j (θ) is computed sequentilly s posterior distribution bsed on previously observed smples, tht is w j (θ) = P (θ 1, 2,..., n 1 ) [21, Chpter 6]. In this work, for simplicity, we restrict ourselves to the cse where w j (θ) = w(θ) is the sme for ll j. The result is n IID model where the probbility of ech smple j is mixture of some probbility mesure over R, Q j ( j ) = Q( j ) = P ( j θ)w(θ)dθ, j = 1,..., N. (9) Θ A well known result for IID mixture (Byesin) codes sttes tht their symptotic regret is O( dim(θ) 2 log n), thus stting their universlity, s long s the weighting function w(θ) is positive, continuous nd unimodl over Θ (see for exmple [21, Theorem 8.1],[41]). This gives us gret flexibility on the choice of weighting function w(θ) tht gurntees universlity. Of course, the results re symptotic nd the o(log n) terms cn be lrge, so tht the choice of w(θ) cn hve prcticl impct for smll smple sizes. In the following discussion we derive severl IID mixture models for the Lplcin model clss M. For this purpose, it will be convenient to consider the corresponding one-sided counterprt of the Lplcin, which is the exponentil distribution over the bsolute vlue of the coefficients,, nd then symmetrize bck to obtin the finl distribution over the signed coefficients. A. The conjugte prior In generl, (9) cn be computed in closed form if w(θ) is the conjugte prior of P ( θ). When P ( θ) is n exponentil (one-sided Lplcin), the conjugte prior is the Gmm distribution, w(θ κ, β) = Γ(κ) 1 θ κ 1 β κ e βθ, θ R +, where κ nd β re its shpe nd scle prmeters respectively. Plugging this in (9) we obtin the Mixture of exponentils model (MOE), which hs the following form (see Appendix A for the full derivtion), Θ Q MOE ( β, κ) = κβ κ ( + β) (κ+1), R +. (1) With some buse of nottion, we will lso denote the symmetric distribution on s MOE, Q MOE ( β, κ) = 1 2 κβκ ( + β) (κ+1), R. (11) Although the resulting prior hs two prmeters to del with insted of one, we know from universl coding theory tht, in principle, ny choice of κ nd β will give us model whose codelength regret is symptoticlly smll. Furthermore, being IID models, ech coefficient of itself is modeled s mixture of exponentils, which mkes the resulting model over very well suited to the most flexible scenrio where the underlying θ cn be different for ech j. In Section V-B we will show tht single MOE distribution cn fit ech of the K rows of A better thn K seprte Lplcin distributions fine-tuned to these rows, with totl of K prmeters to be estimted. Thus, not only we cn del with one single unknown θ, but we cn ctully chieve mximum flexibility with only two prmeters (κ nd β). This property is prticulr of the mixture models, nd does not pply to the other universl models presented. 4 More sophisticted models which include dependencies between the elements of re out of the scope of this work.

8 8 Finlly, if desired, both κ nd β cn be esily estimted using the method of moments (see Appendix A). Given smple estimtes of the first nd second non-centrl moments, ˆµ 1 = 1 n n j=1 j nd ˆµ 2 = 1 n n j=1 j 2, we hve tht ˆκ = 2(ˆµ 2 ˆµ 2 1)/(ˆµ 2 2ˆµ 2 1) nd ˆβ = (ˆκ 1)ˆµ1. (12) When the MOE prior is plugged into (5) insted of the stndrd Lplcin, the following new sprse coding formultion is obtined, K j = rg min x j D λ MOE log ( k + β), (13) where λ MOE = 2σ 2 (κ + 1). An exmple of the MOE regulrizer, nd the thresholding function it induces, is shown in Figure 2 (center column) for κ = 2.5, β =.5. Smooth, differentible non-convex regulrizers such s the one in in (13) hve become minstrem robust lterntive to the l 1 norm in sttistics [16], [51]. Furthermore, it hs been shown tht the use of such regulrizers in regression leds to consistent estimtors which re ble to identify the relevnt vribles in regression model (orcle property) [16]. This is not lwys the cse for the l 1 regulrizer, s ws proved in [51]. The MOE regulrizer hs lso been recently proposed in the context of compressive sensing [6], where it is conjectured to be better thn the l 1 -term t recovering sprse signls in compressive sensing pplictions. 5 This conjecture ws prtilly confirmed recently for non-convex regulrizers of the form ψ() = r with < r < 1 in [39], [18], nd for more generl fmily of non-convex regulrizers including the one in (13) in [47]. In ll cses, it ws shown tht the conditions on the sensing mtrix (here D) cn be significntly relxed to gurntee exct recovery if non-convex regulrizers re used insted of the l 1 norm, provided tht the exct solution to the non-convex optimiztion problem cn be computed. In prctice, this regulrizer is being used with success in number of pplictions here nd in [7], [46]. 6 Our experimentl results in Section V provide further evidence on the benefits of the use of non-convex regulrizers, leding to much improved recovery ccurcy of sprse coefficients compred to l 1 nd l. We lso show in Section V tht the MOE prior is much more ccurte thn the stndrd Lplcin to model the distribution of reconstruction coefficients drwn from lrge dtbse of imge ptches. We lso show in Section V how these improvements led to better results in pplictions such s imge estimtion nd clssifiction. B. The Jeffreys prior The Jeffreys prior for prmetric model clss M = {P ( θ), θ Θ}, is defined s I(θ) w(θ) =, θ Θ, (14) I(ξ) dξ where I(θ) is the determinnt of the Fisher informtion mtrix ]} I(θ) = {E P ( θ) [ 2 log P ( θ) θ 2 Θ k=1 θ=θ. (15) The Jeffreys prior is well known in Byesin theory due to three importnt properties: it virtully elimintes the hyper-prmeters of the model, it is invrint to the originl prmetriztion of the distribution, nd it is non-informtive prior, mening tht it represents well the lck of prior informtion on the unknown prmeter θ [3]. It turns out tht, for quite different resons, the Jeffreys prior is lso of prmount importnce in the theory of universl coding. For instnce, it hs been shown in [2] tht the worst cse regret of the mixture code obtined using the Jeffreys prior pproches tht of the NML s the number of smples n grows. Thus, by using Jeffreys, one cn ttin the minimum worst cse regret symptoticlly, while retining the dvntges of mixture (not needing hindsight of ), which in our cse mens to be ble to use it s model for computing vi sprse coding. For the exponentil distribution we hve tht I(θ) = 1 θ 2. Clerly, if we let Θ = (, ), the integrl in (14) evlutes to. Therefore, in order to obtin proper integrl, we need to exclude nd from Θ (note tht 5 In [6], the logrithmic regulrizer rises from pproximting the l pseudo-norm s n l 1-normlized element-wise sum, without the insight nd theoreticl foundtion here reported. 6 While these works support the use of such non-convex regulrizers, none of them formlly derives them using the universl coding frmework s in this pper.

9 9 Fig. 2. Left to right: l 1 (green), MOE (red) nd JOE (blue) regulrizers nd their corresponding thresholding functions thres(x) := rg min {(x ) 2 + λψ( )}. The unbisedness of MOE is due to the fct tht lrge coefficients re not shrnk by the thresholding function. Also, lthough the JOE regulrizer is bised, the shrinkge of lrge coefficients cn be much smller thn the one pplied to smll coefficients. this ws not needed for the conjugte prior). We choose to define Θ = [θ 1, θ 2 ], < θ 1 < θ 2 <, leding to 1 1 w(θ) = ln(θ 2/θ 1) θ, θ [θ 1, θ 2 ]. The resulting mixture, fter being symmetrized round, hs the following form (see Appendix A): 1 1 ( Q JOE ( θ 1, θ 2 ) = e θ1 e θ2 ), R +. (16) 2 ln(θ 2 /θ 1 ) We refer to this prior s Jeffreys mixture of exponentils (JOE), nd gin overlod this cronym to refer to the symmetric cse s well. Note tht lthough Q JOE is not defined for =, its limit when is finite nd θ evlutes to 2 θ 1 2 ln(θ. Thus, by defining Q 2/θ 1) JOE() = θ2 θ1 2 ln(θ 2/θ 1), we obtin prior tht is well defined nd continuous for ll R. When plugged into (5), we get the JOE-bsed sprse coding formultion, min x j D λ JOE K {log k log(e θ1 k e θ2 k )}, (17) k=1 where, ccording to the convention just defined for Q JOE (), we define ψ JOE () := log(θ 2 θ 1 ). According to the MAP interprettion we hve tht λ JOE = 2σ 2, coming from the Gussin ssumption on the pproximtion error s explined in Section II-A. As with MOE, the JOE-bsed regulrizer, ψ JOE ( ) = log Q JOE ( ), is continuous nd differentible in R +, nd its derivtive converges to finite vlue t zero, lim ψ JOE() = θ2 2 θ2 1 θ 2 θ 1. As we will see lter in Section IV, these properties re importnt to gurntee the convergence of sprse coding lgorithms using non-convex priors. Note from (17) tht we cn rewrite the JOE regulrizer s ψ JOE ( k ) = log k log e θ1 (1 e (θ2 θ1) ) = θ 1 k + log k log(1 e (θ2 θ1) k ), so tht for sufficiently lrge k, log(1 e (θ2 θ1) k ), θ 1 k log k, nd we hve tht ψ JOE ( k ) θ 1 k. Thus, for lrge k, the JOE regulrizer behves like l 1 with λ = 2σ 2 θ 1. In terms of the probbility model, this mens tht the tils of the JOE mixture behve like Lplcin with θ = θ 1, with the region where this hppens determined by the vlue of θ 2 θ 1. The fct tht the non-convex region of ψ JOE ( ) is confined to neighborhood round could help to void flling in bd locl minim during the optimiztion (see Section IV for more detils on the optimiztion spects). Finlly, lthough hving Lplcin tils mens tht the estimted will be bised [16], the shrper pek t llows us to perform more ggressive thresholding of smll vlues, without excessively clipping lrge coefficients, which leds to the typicl over-smoothing of signls recovered using n l 1 regulrizer. See Figure 2 (rightmost column) for n exmple regulrizer bsed on JOE with prmeters θ 1 = 2, θ 2 = 1, nd the thresholding function it induces. The JOE regulrizer hs two hyper-prmeters (θ 1, θ 2 ) which define Θ nd tht, in principle, need to be tuned. One possibility is to choose θ 1 nd θ 2 bsed on the physicl properties of the dt to be modeled, so tht the possible vlues of θ never fll outside of the rnge [θ 1, θ 2 ]. For exmple, in modeling ptches from gryscle imges with limited dynmic rnge of [, 255] in DCT bsis, the mximum vrince of the coefficients cn never exceed The sme is true for the minimum vrince, which is defined by the quntiztion noise. Hving sid this, in prctice it is dvntgeous to djust [θ 1, θ 2 ] to the dt t hnd. In this cse, lthough no closed form solutions exist for estimting [θ 1, θ 2 ] using MLE or the method of moments, stndrd optimiztion techniques cn be esily pplied to obtin them. See Appendix A for detils.

10 1 C. The conditionl Jeffreys A recent pproch to del with the cse when the integrl over Θ in the Jeffreys prior is improper, is the conditionl Jeffreys [21, Chpter 11]. The ide is to construct proper prior, bsed on the improper Jeffreys prior nd the first few n smples of, ( 1, 2,..., n ), nd then use it for the remining dt. The key observtion is tht lthough the normlizing integrl I(θ)dθ in the Jeffreys prior is improper, the unnormlized prior w(θ) = I(θ) cn be used s mesure to weight P ( 1, 2,..., n θ), w(θ) = P ( 1, 2,..., n θ) I(θ) Θ P ( 1, 2,..., n ξ) I(ξ)dξ. (18) It turns out tht the integrl in (18) usully becomes proper for smll n in the order of dim(θ). In our cse we hve tht for ny n 1, the resulting prior is Gmm(κ, β ) distribution with κ := n nd β := n j=1 j (see Appendix A for detils). Therefore, using the conditionl Jeffreys prior in the mixture leds to prticulr instnce of MOE, which we denote by CMOE (lthough the functionl form is identicl to MOE), where the Gmm prmeters κ nd β re utomticlly selected from the dt. This my explin in prt why the Gmm prior performs so well in prctice, s we will see in Section V. Furthermore, we observe tht the vlue of β obtined with this pproch (β ) coincides with the one estimted using the method of moments for MOE if the κ in MOE is fixed to κ = κ + 1 = n + 1. Indeed, if computed from n smples, the method of moments for MOE gives β = (κ 1)µ 1, with µ 1 = 1 n j, which gives us β = n+1 1 n j = β. It turns out in prctice tht the vlue of κ estimted using the method of moments gives vlue between 2 nd 3 for the type of dt tht we del with (see Section V), which is just bove the minimum cceptble vlue for the CMOE prior to be defined, which is n = 1. This justifies our choice of n = 2 when pplying CMOE in prctice. As n becomes lrge, so does κ = n, nd the Gmm prior w(θ) obtined with this method converges to Kronecker delt t the men vlue of the Gmm distribution, δ κ/β ( ). Consequently, when w(θ) δ κ/β (θ), the mixture Θ P ( θ)w(θ)dθ will be close to P ( κ /β ). Moreover, from the definition of κ nd β we hve tht κ /β is exctly the MLE of θ for the Lplcin distribution. Thus, for lrge n, the conditionl Jeffreys method pproches the MLE Lplcin model. Although from universl coding point of view this is not problem, for lrge n the conditionl Jeffreys model will loose its flexibility to del with the cse when different coefficients in A hve different underlying θ. On the other hnd, smll n cn led to prior w(θ) tht is overfitted to the locl properties of the first smples, which for non-sttionry dt such s imge ptches, cn be problemtic. Ultimtely, n defines trde-off between the degree of flexibility nd the ccurcy of the resulting model. IV. OPTIMIZATION AND IMPLEMENTATION DETAILS All of the mixture models discussed so fr yield non-convex regulrizers, rendering the sprse coding problem nonconvex in. It turns out however tht these regulrizers stisfy certin conditions which mke the resulting sprse coding optimiztion well suited to be pproximted using sequence of successive convex sprse coding problems, technique known s Locl Liner Approximtion (LLA) [52] (see lso [46], [19] for lterntive optimiztion techniques for such non-convex sprse coding problems). In nutshell, suppose we need to obtin n pproximte solution to K j = rg min x j D λ ψ( k ), (19) where ψ( ) is non-convex function over R +. At ech LLA itertion, we compute (t+1) j expnsion of ψ( ) round the K elements of the current estimte (t) kj, ψ (t) k k=1 ( ) ( ) = ψ( (t) kj ) + ψ ( (t) kj ) (t) kj = ψ ( (t) kj ) + c k, by doing first order

11 11 nd solving the convex weighted l 1 problem tht results fter discrding the constnt terms c k, (t+1) K j = rg min x j D λ = rg min k=1 ψ (t) k ( k ) K x j D λ ψ ( (t) K kj ) k = rg min x j D k=1 k=1 λ (t) k k. (2) where we hve defined λ (t) k := λψ ( (t) kj ). If ψ ( ) is continuous in (, + ), nd right-continuous nd finite t, then the LLA lgorithm converges to sttionry point of (19) [51]. These conditions re met for both the MOE nd JOE regulrizers. Although, for the JOE prior, the derivtive ψ ( ) is not defined t, it converges to the limit θ2 2 θ2 1 2(θ when, which is well defined for θ 2 θ 1) 2 θ 1. If θ 2 = θ 1, the JOE mixing function is Kronecker delt nd the prior becomes Lplcin with prmeter θ = θ 1 = θ 2. Therefore we hve tht for ll of the mixture models studied, the LLA method converges to sttionry point. In prctice, we hve observed tht 5 itertions re enough to converge. Thus, the cost of sprse coding, with the proposed non-convex regulrizers, is t most 5 times tht of single l 1 sprse coding, nd could be less in prctice if wrm restrts re used to begin ech itertion. Of course we need strting point () j, nd, being non-convex problem, this choice will influence the pproximtion tht we obtin. One resonble choice, used in this work, is to define () kj =, k = 1,..., K, j = 1,..., N, where is sclr so tht ψ ( ) = E w [θ], tht is, so tht the first sprse coding corresponds to Lplcin regulrizer whose prmeter is the verge vlue of θ s given by the mixing prior w(θ). Finlly, note tht lthough the discussion here hs revolved round the Lgrngin formultion to sprse coding of (4), this technique is lso pplicble to the constrined formultion of sprse-coding given by Eqution (1) for fixed dictionry D. Expected pproximtion error: Since we re solving convex pproximtion to the ctul trget optimiztion problem, it is of interest to know how good this pproximtion is in terms of the originl cost function. To give n ide of this, fter n pproximte solution is obtined, we compute the expected vlue of the difference between the true nd pproximte regulriztion term vlues. The expecttion is tken, nturlly, in terms of the ssumed distribution of the coefficients in. Since the regulrizers re seprble, [ we cn ] compute the error in seprble wy s n expecttion over ech k-th coefficient, ζ q ( k ) = E ν q ψk (ν) ψ(ν), where ψ k ( ) is the pproximtion of ψ k ( ) round the finl estimte of k. For the cse of q = MOE, the expression obtined is (see Appendix) [ ] ζ MOE ( k, κ, β) = E ν MOE(κ,β) ψk (ν) ψ(ν) = log( k + β) + 1 [ k + β ] log β 1 k + β κ 1 κ. In the MOE cse, for κ nd β fixed, the minimum of ζ MOE occurs when k = β κ 1 = µ(β, κ). We lso hve ζ MOE () = (κ 1) 1 κ 1. The function ζ q ( ) cn be evluted on ech coefficient of A to give n ide of its qulity. For exmple, in the experiments from Section V, we obtined n verge vlue of.16, which lies between ζ MOE () =.19 nd min ζ MOE () =.9. Depending on the experiment, this represents 6% to 7% of the totl sprse coding cost function vlue, showing the efficiency of the proposed optimiztion. Comments on prmeter estimtion: All the universl models presented so fr, with the exception of the conditionl Jeffreys, depend on hyper-prmeters which in principle should be tuned for optiml performnce (remember tht they do not influence the universlity of the model). If tuning is needed, it is importnt to remember tht the proposed universl models re intended for reconstruction coefficients of clen dt, nd thus their hyperprmeters should be computed from sttistics of clen dt, or either by compensting the distortion in the sttistics cused by noise (see for exmple [3]). Finlly, note tht when D is linerly dependent nd rnk(d) = R M, the coefficients mtrix A resulting from n exct reconstruction of X will hve mny zeroes which re not properly explined by ny continuous distribution such s Lplcin. We sidestep this issue by computing the sttistics only from the non-zero coefficients in A. Deling properly with the cse P ( = ) > is beyond the scope of this work.

12 12 V. EXPERIMENTAL RESULTS In the following experiments, the testing dt X re 8 8 ptches drwn from the Pscl VOC26 testing subset, 7 which re high qulity RGB imges with 8 bits per chnnel. For the experiments, we converted the 26 imges to gryscle by verging the chnnels, nd scled the dynmic rnge to lie in the [, 1] intervl. Similr results to those shown here re lso obtined for other ptch sizes. A. Dictionry lerning For the experiments tht follow, unless otherwise stted, we use globl overcomplete dictionry D with K = 4M = 256 toms trined on the full VOC26 trining subset using the method described in [35], [36], which seeks to minimize the following cost during trining, 8 min D,A 1 N N j=1 { } x j D j λψ( j) + µ D T D 2 F, (21) where F denotes Frobenius norm. The dditionl term, µ D T D 2, encourges incoherence in the lerned F dictionry, tht is, it forces the toms to be s orthogonl s possible. Dictionries with lower coherence re well known to hve severl theoreticl dvntges such s improved bility to recover sprse signls [11], [45], nd fster nd better convergence to the solution of the sprse coding problems (1) nd (3) [13]. Furthermore, in [35] it ws shown tht dding incoherence leds to improvements in vriety of sprse modeling pplictions, including the ones discussed below. We used MOE s the regulrizer in (21), with λ =.1 nd µ = 1, both chosen empiriclly. See [1], [26], [35] for detils on the optimiztion of (3) nd (21). B. MOE s prior for sprse coding coefficients We begin by compring the performnce of the Lplcin nd MOE priors for fitting single globl distribution to the whole mtrix A. We compute A using (1) with ɛ nd then, following the discussion in Section IV, restrict our study to the nonzero elements of A. The empiricl distribution of A is plotted in Figure 3(), long with the best fitting Lplcin, MOE, JOE, nd prticulrly good exmple of the conditionl Jeffreys (CMOE) distributions. 9 The MLE for the Lplcin fit is ˆθ = N 1 / A 1 = 27.2 (here N 1 is the number of nonzero elements in A). For MOE, using (12), we obtined κ = 2.8 nd β =.7. For JOE, θ 1 = 2.4 nd θ 2 = According to the discussion in Section III-C, we used the vlue κ = 2.8 obtined using the method of moments for MOE s hint for choosing n = 2 (κ = n + 1 = 3 2.8), yielding β =.7, which coincides with the β obtined using the method of moments. As observed in Figure 3(), in ll cses the proposed mixture models fit the dt better, significntly better for both Gmm-bsed mixtures, MOE nd CMOE, nd slightly better for JOE. This is further confirmed by the Kullbck-Leibler divergence (KLD) obtined in ech cse. Note tht JOE fils to significntly improve on the Lplcin mode due to the excessively lrge estimted rnge [θ 1, θ 2 ]. In this sense, it is cler tht the JOE model is very sensitive to its hyper-prmeters, nd better nd more robust estimtion would be needed for it to be useful in prctice. Given these results, herefter we concentrte on the best cse which is the MOE prior (which, s detiled bove, cn be derived from the conditionl Jeffreys s well, thus representing both pproches). From Figure 1(e) we know tht the optiml ˆθ vries loclly cross different regions, thus, we expect the mixture models to perform well lso on per-tom bsis. This is confirmed in Figure 3(b), where we show, for ech row k, k = 1,..., K, the difference in KLD between the globlly fitted MOE distribution nd the best per-tom fitted MOE, the globlly fitted Lplcin, nd the per-tom fitted Lplcins respectively. As cn be observed, the KLD obtined with the globl MOE is significntly smller thn the globl Lplcin in ll cses, nd even the per-tom Lplcins in most of the cses. This shows tht MOE, with only two prmeters (which cn be esily estimted, s While we could hve used off-the-shelf dictionries such s DCT in order to test our universl sprse coding frmework, it is importnt to use dictionries tht led to the stte-of-the-rt results in order to show the dditionl potentil improvement of our proposed regulrizers. 9 To compute the empiricl distribution, we quntized the elements of A uniformly in steps of 2 8, which for the mount of dt vilble, gives us enough detil nd t the sme time relible sttistics for ll the quntized vlues.

13 13 Fig. 3. () Empiricl distribution of the coefficients in A for imge ptches (blue dots), best fitting Lplcin (green), MOE (red), CMOE (ornge) nd JOE (yellow) distributions. The Lplcin (KLD=.17 bits) is clerly not fitting the tils properly, nd is not sufficiently peked t zero either. The two models bsed on Gmm prior, MOE (KLD=.1 bits) nd CMOE (KLD=.1 bits), provide n lmost perfect fit. The fitted JOE (KLD=.14) is the most shrply peked t, but doest not fit the tils s tight s desired. As reference, the entropy of the empiricl distribution is H = 3. bits. (b) KLD for the best fitting globl Lplcin (drk green), per-tom Lplcin (light green), globl MOE (drk red) nd per-tom MOE (light red), reltive to the KLD between the globlly fitted MOE distribution nd the empiricl distribution. The horizontl xis represents the indexes of ech tom, k = 1,..., K, ordered ccording to the difference in KLD between the globl MOE nd the per-tom Lplcin model. Note how the globl MOE outperforms both the globl nd per-tom Lplcin models in ll but the first 4 cses. (c) ctive set recovery ccurcy of l 1 nd MOE, s defined in Section V-C, for L = 5 nd L = 1, s function of σ. The improvement of MOE over l 1 is fctor of 5 to 9. (d) PSNR of the recovered sprse signls with respect to the true signls. In this cse significnt improvements cn be observed t the high SNR rnge, specilly for highly sprse (L = 5) signls. The performnce of both methods is prcticlly the sme for σ 1. detiled in the text), is much better model thn K Lplcins (requiring K criticl prmeters) fitted specificlly to the coefficients ssocited to ech tom. Whether these modeling improvements hve prcticl impct is explored in the next experiments. C. Recovery of noisy sprse signls Here we compre the ctive set recovery properties of the MOE prior, with those of the l 1 -bsed one, on dt for which the sprsity ssumption A j L holds exctly for ll j. To this end, we obtin sprse pproximtions to ech smple x j using the l -bsed Orthogonl Mtching Pursuit lgorithm (OMP) on D [28], nd record the resulting ctive sets A j s ground truth. The dt is then contminted with dditive Gussin noise of vrince σ nd the recovery is performed by solving (1) for A with ɛ = CMσ 2 nd either the l 1 or the MOE-bsed regulrizer for ψ( ). We use C = 1.32, which is stndrd vlue in denoising pplictions (see for exmple [27]). For ech smple j, we mesure the error of ech method in recovering the ctive set s the Hmming distnce between the true nd estimted support of the corresponding reconstruction coefficients. The ccurcy of the method is then given s the percentge of the smples for which this error flls below certin threshold T. Results re shown in Figure 3(c) for L = (5, 1) nd T = (2, 4) respectively, for vrious vlues of σ. Note the very significnt improvement obtined with the proposed model. Given the estimted ctive set A j, the estimted clen ptch is obtined by projecting x j onto the subspce defined by the toms tht re ctive ccording to A j, using lest squres (which is the stndrd procedure for denoising once the ctive set is determined). We then mesure the PSNR of the estimted ptches with respect to the true ones. The results re shown in Figure 3(d), gin for vrious vlues of σ. As cn be observed, the MOE-bsed recovery is significntly better, specilly in the high SNR rnge. Notoriously, the more ccurte ctive set recovery of MOE does not seem to improve the denoising performnce in this cse. However, s we will see next, it does mke difference when denoising rel life signls, s well s for clssifiction tsks. D. Recovery of rel signls with simulted noise This experiment is n nlogue to the previous one, when the dt re the originl nturl imge ptches (without forcing exct sprsity). Since for this cse the sprsity ssumption is only pproximte, nd no ground truth is vilble for the ctive sets, we compre the different methods in terms of their denoising performnce. A criticl strtegy in imge denoising is the use of overlpping ptches, where for ech pixel in the imge ptch is extrcted with tht pixel s its center. The ptches re denoised independently s M-dimensionl signls

14 14 Fig. 4. Smple imge denoising results. Top: Brbr, σ = 3. Bottom: Bots, σ = 4. From left to right: noisy, `1 /OMP, `1 /`1, MOE/MOE. The reconstruction obtined with the proposed model is more ccurte, s evidenced by better reconstruction of the texture in Brbr, nd shrp edges in Bots, nd does not produce the rtifcts seen in both the `1 nd ` reconstructions, which pper s blck/white speckles ll over Brbr, nd ringing on the edges in Bots. σ = 1 lerning coding brbr bot len peppers mn AVERAGE `1 ` 3.4/ / / / / /34.2 σ = 2 `1 31.2/ / / / / /33.9 ` 3.5/ / / / / /34.2 `1 [1] MOE MOE 3.9/ / / / / / ` 26.5/ / / / / /3.6 σ = 3 `1 26.9/ / / / / /3.4 ` 26.8/ / / / / /3.6 `1 [1] MOE MOE 27./ / / / / / ` 24.5/ / / / / /28.3 [1] MOE `1 24.8/ / / / / /28.5 ` 24.8/ / / / / /28.4 MOE 24.9/ / / / / / TABLE I D ENOISING RESULTS : IN EACH TABLE, EACH COLUMN SHOWS THE DENOISING PERFORMANCE OF A LEARNING + CODING COMBINATION. R ESULTS ARE SHOWN IN PAIRS, WHERE THE LEFT NUMBER IS THE PSNR BETWEEN THE CLEAN AND RECOVERED INDIVIDUAL PATCHES, AND THE RIGHT NUMBER IS THE PSNR BETWEEN THE CLEAN AND RECOVERED IMAGES. B EST RESULTS ARE IN BOLD. T HE PROPOSED MOE PRODUCES BETTER FINAL RESULTS OVER BOTH THE ` AND `1 ONES IN ALL CASES, AND AT PATCH LEVEL FOR ALL σ > 1. N OTE THAT THE AVERAGE VALUES REPORTED ARE THE PSNR OF THE AVERAGE MSE, AND NOT THE PSNR AVERAGE. nd then recombined into the finl denoised imges by simple verging. Although this consistently improves the finl result in ll cses, the improvement is very different depending on the method used to denoise the individul ptches. Therefore, we now compre the denoising performnce of ech method t two levels: individul ptches nd finl imge. To denoise ech imge, the globl dictionry described in Section V-A is further dpted to the noisy imge ptches using (21) for few itertions, nd used to encode the noisy ptches vi (2) with = CM σ 2. We repeted the experiment for two lerning vrints (`1 nd MOE regulrizers), nd two coding vrints ((2) with the regulrizer used for lerning, nd ` vi OMP. The four vrints were pplied to the stndrd imges Brbr, Bots, Len, Mn nd Peppers, nd the results summrized in Tble I. We show smple results in Figure 4. Although the quntittive improvements seen in Tble I re smll compred to `1, there is significnt improvement t the visul level, s cn be seen in Figure 4. In ll cses the PSNR obtined coincides or surpsses the ones reported in [1].1 E. Zooming As n exmple of signl recovery in the bsence of noise, we took the previous set of imges, plus prticulrly chllenging one (Tools), nd subsmpled them to hlf ech side. We then simulted zooming effect by upsmpling 1 Note tht in [1], the denoised imge is finlly blended with the noisy imge using n empiricl weight, providing n extr improvement to the finl PSNR in some cses. The results in I re lredy better without this extr step.

15 15 imge cubic l l 1 MOE brbr bot len peppers mn tools AVER Fig. 5. Zooming results. Left to right: summry, Tools imge, detil of zooming results for the frmed region, top to bottom nd left to right: cubic, l, l 1, MOE. As cn be seen, the MOE result is s shrp s l but produces less rtifcts. This is reflected in the.1db overll improvement obtined with MOE, s seen in the leftmost summry tble. them nd estimting ech of the 75% missing pixels (see e.g., [5] nd references therein). We use technique similr to the one used in [32]. The imge is first interpolted nd then deconvoluted using Wiener filter. The deconvoluted imge hs rtifcts tht we tret s noise in the reconstruction. However, since there is no rel noise, we do not perform verging of the ptches, using only the center pixel of ˆx j to fill in the missing pixel t j. The results re summrized in Figure 5, where we gin observe tht using MOE insted of l nd l 1 improves the results. F. Clssifiction with universl sprse models In this section we pply our proposed universl models to clssifiction problem where ech smple x j is to be ssigned clss lbel y j = 1,..., c, which serves s n index to the set of possible clsses, {C 1, C 2,..., C c }. We follow the procedure of [36], where the clssifier ssigns ech smple x j by mens of the mximum posteriori criterion (5) with the term log P () corresponding to the ssumed prior, nd the dictionries representing ech clss re lerned from trining smples using (21) with the corresponding regulrizer ψ() = log P (). Ech experiment is repeted for the bseline Lplcin model, implied in the l 1 regulrizer, nd the universl model MOE, nd the results re then compred. In this cse we expect tht the more ccurte prior model for the coefficients will result in n improved likelihood estimtion, which in turn should improve the ccurcy of the system. We begin with clssic texture clssifiction problem, where ptches hve to be identified s belonging to one out of number of possible textures. In this cse we experimented with smples of c = 2 nd c = 3 textures drwn t rndom from the Brodtz dtbse, 11, the ones ctully used shown in Figure 6. In ech cse the experiment ws repeted 1 times. In ech repetition, dictionry of K = 3 toms ws lerned from ll ptches of the leftmost hlves of ech smple texture. We then clssified the ptches from the rightmost hlves of the texture smples. For the c = 2 we obtined n verge error rte of 5.13% using l 1 ginst 4.12% when using MOE, which represents reduction of 2% in clssifiction error. For c = 3 the verge error rte obtined ws 13.54% using l 1 nd 11.48% using MOE, which is 15% lower. Thus, using the universl model insted of l 1 yields significnt improvement in this cse (see for exmple [26] for other results in clssifiction of Brodtz textures). The second smple problem presented is the Grz 2 bike detection problem, 12 where ech pixel of ech testing imge hs to be clssified s either bckground or s prt of bike. In the Grz 2 dtset, ech of the pixels cn belong to one of two clsses: bike or bckground. On ech of the trining imges (which by convention re the first 15 even-numbered imges), we re given msk tht tells us whether ech pixel belongs to bike or to the bckground. We then trin dictionry for bike ptches nd nother for bckground ptches. Ptches tht contin pixels from both clsses re ssigned to the clss corresponding to the mjority of their pixels trnden/brodtz.html 12

16 16 Fig. 6. Textures used in the texture clssifiction exmple. Fig. 7. Clssifiction results. Left to right: precision vs. recll curve, smple imge from the Grz 2 dtset, its ground truth, nd the corresponding estimted mps obtined with `1 nd MOE for fixed threshold. The precision vs. recll curve shows tht the mixture model gives better precision in ll cses. In the exmple, the clssifiction obtined with MOE yields less flse positives nd more true positives thn the one obtined with `1. In Figure 7 we show the precision vs. recll curves obtined with the detection frmework when either the `1 or the MOE regulrizers were used in the system. As cn be seen, the MOE-bsed model outperforms the `1 in this clssifiction tsk s well, giving better precision for ll recll vlues. In the bove experiments, the prmeters for the `1 prior (λ), the MOE model (λmoe ) nd the incoherence term (µ) were ll djusted by cross vlidtion. The only exception is the MOE prmeter β, which ws chosen bsed on the fitting experiment s β =.7. VI. C ONCLUDING REMARKS A frmework for designing sprse modeling priors ws introduced in this work, using tools from universl coding, which formlizes sprse coding nd modeling from MDL perspective. The priors obtined led to models with both theoreticl nd prcticl dvntges over the trditionl ` nd `1 -bsed ones. In ll derived cses, the designed non-convex problems re suitble to be efficiently (pproximtely) solved vi few itertions of (weighted) `1 subproblems. We lso showed tht these priors re ble to fit the empiricl distribution of sprse codes of imge ptches significntly better thn the trditionl IID Lplcin model, nd even the non-identiclly distributed independent Lplcin model where different Lplcin prmeter is djusted to the coefficients ssocited to ech tom, thus showing the flexibility nd ccurcy of these proposed models. The dditionl flexibility, furthermore, comes t smll cost of only 2 prmeters tht cn be esily nd efficiently tuned (either (κ, β) in the MOE model, or (θ1, θ2 ) in the JOE model), insted of K (dictionry size), s in weighted `1 models. The dditionl ccurcy of the proposed models ws shown to hve significnt prcticl impct in ctive set recovery of sprse signls, imge denoising, nd clssifiction pplictions. Compred to the Byesin pproch, we void the potentil burden of solving severl smpled sprse problems, or being forced to use conjugte prior for computtionl resons (lthough in our cse, fortiori, the conjugte prior does provide us with good model). Overll, s demonstrted in this pper, the introduction of informtion theory tools cn led to formlly ddressing criticl spects of sprse modeling.

17 17 Future work in this direction includes the design of priors tht tke into ccount the nonzero mss t = tht ppers in overcomplete models, nd online lerning of the model prmeters from noisy dt, following for exmple the technique in [3]. ACKNOWLEDGMENTS Work prtilly supported by NGA, ONR, ARO, NSF, NSSEFF, nd FUNDACIBA-ANTEL. We wish to thnk Julien Mirl for providing us with his fst sprse modeling toolbox, SPAMS. 13 We lso thnk Federico Lecumberry for his prticiption on the incoherent dictionry lerning method, nd helpful comments. APPENDIX DERIVATION OF THE MOE MODEL In this cse we hve P ( θ) = θe θ nd w(θ κ, β) = 1 Γ(κ) θκ 1 β κ e βθ, which, when plugged into (9), gives Q( β, κ) = θ= θe θ 1 Γ(κ) θκ 1 β κ e βθ dθ = βκ Γ(κ) θ= e θ(+β) θ κ dθ. After the chnge of vribles u := ( + β)θ (u() =, u( ) = ), the integrl cn be written s ( ) Q( β, κ) = βκ u k e u du Γ(κ) + β + β = βκ ( + β) (κ+1) e u u k du Γ(κ) θ= = βκ Γ(κ) ( + β) (κ+1) Γ(κ + 1) = βκ Γ(κ) ( + β) (κ+1) κγ(κ), obtining Q( β, κ) = κβ κ ( + β) (κ+1), since the integrl on the second line is precisely the definition of Γ(κ + 1). The symmetriztion is obtined by substituting by nd dividing the normliztion constnt by two, Q( β, κ) =.5κβ κ ( + β) (κ+1). The men of the MOE distribution (which is defined only for κ > 1) cn be esily computed using integrtion by prts, µ(β, κ) = κβ κ [ u du = κβ (u + β) (κ+1) u κ(u + β) κ + 1 κ θ= ] du (u + β) k = β κ 1. In the sme wy, it is esy to see tht the non-centrl moments of order i re µ i = ( κ 1 i ). The MLE estimtes of κ nd β cn be obtined using ny nonliner optimiztion technique such s Newton method, using for exmple the estimtes obtined with the method of moments s strting point. In prctice, however, we hve not observed ny significnt improvement in using the MLE estimtes over the moments-bsed ones. β Expected pproximtion error in cost function As mentioned in the optimiztion section, the LLA pproximtes the MOE regulrizer s weighted l 1. Here we develop n expression for the expected error between the true function nd the pproximte convex one, where the expecttion is tken (nturlly) with respect to the MOE distribution. Given the vlue of the current iterte (t) =, (ssumed positive, since the function nd its pproximtion re symmetric), the pproximted regulrizer is ψ (t) () = log( + β) + 1 ( +β ). We hve ] E MOE(κ,β) [ψ (t) () ψ() = 13 κβ κ ( + κ) κ+1 = log( + β) + + β + = log( + β) + + β + [ log( + β) + 1 κβκ + β ] + β ( ) log( + β) d κβκ ( + β) κ+1 β ( + β)(κ 1) log β 1 κ. d log( + β) d ( + β) κ+1

18 18 DERIVATION OF THE CONSTRAINED JEFFREYS (JOE) MODEL In the cse of the exponentil distribution, the Fisher Informtion Mtrix in (15) evlutes to [ ]} { [ ]} 2 I(θ) = {E P ( θ) ( log θ + θ log ) = E θ 2 P ( θ) = 1 θ2 1 θ 2. 1 By plugging this result into (14) with Θ = [θ 1, θ 2 ], < θ 1 < θ 2 < we obtin w(θ) = derive the (one-sided) JOE probbility density function by plugging this w(θ) in (9), Q() = θ2 θ 1 θe θ 1 ln(θ 2 /θ 1 ) dθ θ = 1 ln(θ 2 /θ 1 ) θ2 θ=θ θ 1 e θ dθ = 1 1 ln(θ 2 /θ 1 ) θ=θ 1 ln(θ 2/θ 1) θ ( e θ1 e θ2).. We now Although Q() cnnot be evluted t =, the limit for exists nd is finite, so we cn just define Q() s this limit, which is 1 [ lim Q() = lim 1 θ1 + o( 2 ) (1 θ 2 + o( 2 )) ] = θ 2 θ 1 ln(θ 2 /θ 1 ) ln(θ 2 /θ 1 ). Agin, if desired, prmeter estimtion cn be done for exmple using mximum likelihood (vi nonliner optimiztion), or using the method of moments. However, in this cse, the method of moments does not provide closed form solution for (θ 1, θ 2 ). The non-centrl moments of order i re + i 1 [ µ i = e θ1 e θ1] { } d= i 1 e θ1 d i 1 e θ2 d. (22) ln(θ 2 /θ 1 ) ln(θ 2 /θ 1 ) 1 For i = 1, both integrls in (22) re trivilly evluted, yielding µ 1 = ln(θ 2/θ 1) (θ 1 cn be solved using integrtion by prts: + µ + i = i 1 e θ1 d = i ( θ 1 ) e θ1 1 (i 1) ( θ 1 ) + µ i = i 1 e θ2 d = i ( θ 2 ) e θ2 1 (i 1) ( θ 2 ) 1 θ ). For i > 1, these integrls i 2 e θ1 d i 2 e θ2 d, where the first term in the right hnd side of both equtions evlutes to for i > 1. Therefore, for i > 1 we obtin the recursions µ + i = i 1 θ 1 µ + i 1, µ i = i 1 θ 2 µ i 1, which, combined with the result for i = 1, give the finl expression for ll the moments of order i > ( (i 1)! 1 µ i = ln(θ 2 /θ 1 ) θ1 i 1 ) θ2 i, i = 1, 2,.... In prticulr, for i = 1 nd i = 2 we hve θ 1 = ( ln(θ 2 /θ 1 )µ 1 + θ2 1 ) 1, θ2 = ( ln(θ 2 /θ 1 )µ 2 + θ1 2 ) 1, which, when combined, give us 2µ 1 2µ 1 θ 1 = µ 2 + ln(θ 2 /θ 1 )µ 2, θ 2 = 1 µ 2 ln(θ 2 /θ 1 )µ 2. (23) 1 One possibility is to solve the nonliner eqution θ 2 /θ 1 = µ2+ln(θ2/θ1)µ2 1 µ 2 ln(θ 2/θ 1)µ for u = θ 2 1 /θ 2 by finding the roots of the 1 nonliner eqution u = µ2+ln uµ2 1 µ 2 ln uµ nd choosing one of them bsed on some side informtion. Another possibility 2 1 is to simply fix the rtio θ 2 /θ 1 beforehnd nd solve for θ 1 nd θ 2 using (23). DERIVATION OF THE CONDITIONAL JEFFREYS (CMOE) MODEL The conditionl Jeffreys method defines proper prior w(θ) by ssuming tht n smples from the dt to be modeled were lredy observed. Plugging the Fisher informtion for the exponentil distribution, I(θ) = θ 2, into (18) we obtin w(θ) = P ( n θ)θ 1 Θ P ξ)ξ 1 dξ = ( n j=1 )θ 1 θe θj + (n ( n j=1 )ξ 1 dξ ξe ξj n θn 1e θ j=1 j = + ξ n 1 e ξ n j=1 j dξ.

19 19 Denoting S = n j=1 j nd performing the chnge of vribles u := S ξ we obtin w(θ) = S n θ n 1 e Sθ + u n 1 e u du = S n θn 1 e Sθ Γ(n ) where the lst eqution derives from the definition of the Gmm function, Γ(n ). We see tht the resulting prior w(θ) is Gmm distribution Gmm(κ, β ) with κ = n nd β = S = n j=1 j. REFERENCES [1] M. Ahron, M. Eld, nd A. Bruckstein. The K-SVD: An lgorithm for designing of overcomplete dictionries for sprse representtions. IEEE Trns. SP, 54(11): , Nov. 26. [2] A. Brron, J. Rissnen, nd B. Yu. The minimum description length principle in coding nd modeling. IEEE Trns. IT, 44(6): , [3] J. Bernrdo nd A. Smith. Byesin Theory. Wiley, [4] A. Bruckstein, D. Donoho, nd M. Eld. From sprse solutions of systems of equtions to sprse modeling of signls nd imges. SIAM Review, 51(1):34 81, Feb. 29. [5] E. J. Cndès. Compressive smpling. Proc. of the Interntionl Congress of Mthemticins, 3, Aug. 26. [6] E. J. Cndès, M. Wkin, nd S. Boyd. Enhncing sprsity by reweighted l 1 minimiztion. J. Fourier Anl. Appl., 14(5):877 95, Dec. 28. [7] R. Chrtrnd. Fst lgorithms for nonconvex compressive sensing: MRI reconstruction from very few dt. In IEEE ISBI, June 29. [8] S. Chen, D. Donoho, nd M. Sunders. Atomic decomposition by bsis pursuit. SIAM Journl on Scientific Computing, 2(1):33 61, [9] R. Coifmn nd M. Wickenhuser. Entropy-bsed lgorithms for best bsis selection. IEEE Trns. IT, 38: , [1] T. Cover nd J. Thoms. Elements of informtion theory. John Wiley nd Sons, Inc., 2 edition, 26. [11] I. Dubechies, M. Defrise, nd C. De Mol. An itertive thresholding lgorithm for liner inverse problems with sprsity constrint. Comm. on Pure nd Applied Mthemtics, 57: , 24. [12] B. Efron, T. Hstie, I. Johnstone, nd R. Tibshirni. Lest ngle regression. Annls of Sttistics, 32(2):47 499, 24. [13] M. Eld. Optimized projections for compressed-sensing. IEEE Trns. SP, 55(12): , Dec. 27. [14] K. Engn, S. Ase, nd J. Husoy. Multi-frme compression: Theory nd design. Signl Processing, 8(1): , Oct. 2. [15] M. Everinghm, A. Zissermn, C. Willims, nd L. Vn Gool. The PASCAL Visul Object Clsses Chllenge 26 (VOC26) Results. [16] J. Fn nd R. Li. Vrible selection vi nonconcve penlized likelihood nd its orcle properties. Journl Am. Stt. Assoc., 96(456): , Dec. 21. [17] M. Figueiredo. Adptive sprseness using Jeffreys prior. In Thoms G. Dietterich, Suznn Becker, nd Zoubin Ghhrmni, editors, Adv. NIPS, pges MIT Press, Dec. 21. [18] S. Foucrt nd M. Li. Sprsest solutions of underdetermined liner systems vi l q-minimiztion for < q 1. Applied nd Computtionl Hrmonic Anlysis, 3(26):395 47, 29. [19] G. Gsso, A. Rkotommonjy, nd S. Cnu. Recovering sprse signls with non-convex penlties nd DC progrmming. IEEE Trns. SP, 57(12): , 29. [2] R. Giryes, Y. Eldr, nd M. Eld. Automtic prmeter setting for itertive shrinkge methods. In IEEE 25-th Convention of Electronics nd Electricl Engineers in Isrel (IEEEI 8), Dec. 28. [21] P. Grünwld. The Minimum Description Length Principle. MIT Press, June 27. [22] T. Hstie, R. Tibshirni, nd J. Friedmn. The Elements of Sttisticl Lerning: Dt Mining, Inference nd Prediction. Springer, 2 edition, Feb. 29. [23] S. Ji, Y. Xue, nd L. Crin. Byesin compressive sensing. IEEE Trns. SP, 56(6): , 28. [24] B. Krishnpurm, L. Crin, M. Figueiredo, nd A. Hrtemink. Sprse multinomil logistic regression: Fst lgorithms nd generliztion bounds. IEEE Trns. PAMI, 27(6): , 25. [25] E. Lm nd J. Goodmn. A mthemticl nlysis of the DCT coefficient distributions for imges. IEEE Trns. IP, 9(1): , 2. [26] J. Mirl, F. Bch, J. Ponce, G. Spiro, nd A. Zissermn. Supervised dictionry lerning. In D. Koller, D. Schuurmns, Y. Bengio, nd L. Bottou, editors, Adv. NIPS, volume 21, Dec. 29. [27] J. Mirl, G. Spiro, nd M. Eld. Lerning multiscle sprse representtions for imge nd video restortion. SIAM MMS, 7(1): , April 28. [28] S. Mllt nd Z. Zhng. Mtching pursuit in time-frequency dictionry. IEEE Trns. SP, 41(12): , [29] N. Merhv nd M. Feder. Universl prediction. IEEE Trns. IT, 44(6): , Oct [3] G. Mott, E. Ordentlich, I. Rmirez, G. Seroussi, nd M. Weinberger. The DUDE frmework for gryscle imge denoising. Technicl report, HP lbortories, [31] P. Moulin nd J. Liu. Anlysis of multiresolution imge denoising schemes using generlized-gussin nd complexity priors. IEEE Trns. IT, April [32] R. Neelmni, H. Choi, nd R. Brniuk. Forwrd: Fourier-wvelet regulrized deconvolution for ill-conditioned systems. IEEE Trns. SP, 52(2): , 24. [33] B. Olshusen nd D. Field. Sprse coding with n overcomplete bsis set: A strtegy employed by V1? Vision Reserch, 37: , 1997.,

20 [34] R. Rin, A. Bttle, H. Lee, B. Pcker, nd A. Ng. Self-tught lerning: trnsfer lerning from unlbeled dt. In ICML, pges , June 27. [35] I. Rmirez, F. Lecumberry, nd G. Spiro. Universl priors for sprse modeling. In CAMSAP, Dec. 29. [36] I. Rmírez, P. Sprechmnn, nd G. Spiro. Clssifiction nd clustering vi dictionry lerning with structured incoherence nd shred fetures. In CVPR, June 21. [37] J. Rissnen. Universl coding, informtion, prediction nd estimtion. IEEE Trns. IT, 3(4), July [38] J. Rissnen. Stochstic complexity in sttisticl inquiry. Singpore: World Scientific, [39] R. Sb, R. Chrtrnd, nd O. Yilmz. Stble sprse pproximtion vi nonconvex optimiztion. In ICASSP, April 28. [4] N. Sito. Simultneous noise suppression nd signl compression using librry of orthonorml bses nd the MDL criterion. In E. Foufoul-Georgiou nd P. Kumr, editors, Wvelets in Geophysics, pges New York: Acdemic, [41] G. Schwrtz. Estimting the dimension of model. Annls of Sttistics, 6(2): , [42] Y. Shtrkov. Universl sequentil coding of single messges. Probl. Inform. Trnsm., 23(3):3 17, July [43] R. Tibshirni. Regression shrinkge nd selection vi the LASSO. Journl of the Royl Sttisticl Society: Series B, 58(1): , [44] M. Tipping. Sprse byesin lerning nd the relevnce vector mchine. Journl of Mchine Lerning, 1: , 21. [45] J. Tropp. Greed is good: Algorithmic results for sprse pproximtion. IEEE Trns. IT, 5(1): , Oct. 24. [46] J. Trzsko nd A. Mnduc. Highly undersmpled mgnetic resonnce imge reconstruction vi homotopic l -minimiztion. IEEE Trns. MI, 28(1):16 121, Jn. 29. [47] J. Trzsko nd A. Mnduc. Relxed conditions for sprse signl recovery with generl concve priors. IEEE Trns. SP, 57(11): , 29. [48] D. Wipf, J. Plmer, nd B. Ro. Perspectives on sprse byesin lerning. In Adv. NIPS, Dec. 23. [49] D. Wipf nd B. Ro. An empiricl byesin strtegy for solving the simultneous sprse pproximtion problem. IEEE Trns. IP, 55(7-2): , 27. [5] G. Yu, G. Spiro, nd S. Mllt. Solving inverse problems with piecewise liner estimtors: From Gussin mixture models to structured sprsity. Preprint rxiv: [51] H. Zou. The dptive LASSO nd its orcle properties. Journl Am. Stt. Assoc., 11: , 26. [52] H. Zou nd R. Li. One-step sprse estimtes in nonconcve penlized likelihood models. Annls of Sttistics, 36(4): , 28. 2

All pay auctions with certain and uncertain prizes a comment

All pay auctions with certain and uncertain prizes a comment CENTER FOR RESEARC IN ECONOMICS AND MANAGEMENT CREAM Publiction No. 1-2015 All py uctions with certin nd uncertin prizes comment Christin Riis All py uctions with certin nd uncertin prizes comment Christin

More information

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( ) Polynomil Functions Polynomil functions in one vrible cn be written in expnded form s n n 1 n 2 2 f x = x + x + x + + x + x+ n n 1 n 2 2 1 0 Exmples of polynomils in expnded form re nd 3 8 7 4 = 5 4 +

More information

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999 Economics Letters 65 (1999) 9 15 Estimting dynmic pnel dt models: guide for q mcroeconomists b, * Ruth A. Judson, Ann L. Owen Federl Reserve Bord of Governors, 0th & C Sts., N.W. Wshington, D.C. 0551,

More information

Factoring Polynomials

Factoring Polynomials Fctoring Polynomils Some definitions (not necessrily ll for secondry school mthemtics): A polynomil is the sum of one or more terms, in which ech term consists of product of constnt nd one or more vribles

More information

EQUATIONS OF LINES AND PLANES

EQUATIONS OF LINES AND PLANES EQUATIONS OF LINES AND PLANES MATH 195, SECTION 59 (VIPUL NAIK) Corresponding mteril in the ook: Section 12.5. Wht students should definitely get: Prmetric eqution of line given in point-direction nd twopoint

More information

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one. 5.2. LINE INTEGRALS 265 5.2 Line Integrls 5.2.1 Introduction Let us quickly review the kind of integrls we hve studied so fr before we introduce new one. 1. Definite integrl. Given continuous rel-vlued

More information

Helicopter Theme and Variations

Helicopter Theme and Variations Helicopter Theme nd Vritions Or, Some Experimentl Designs Employing Pper Helicopters Some possible explntory vribles re: Who drops the helicopter The length of the rotor bldes The height from which the

More information

4.11 Inner Product Spaces

4.11 Inner Product Spaces 314 CHAPTER 4 Vector Spces 9. A mtrix of the form 0 0 b c 0 d 0 0 e 0 f g 0 h 0 cnnot be invertible. 10. A mtrix of the form bc d e f ghi such tht e bd = 0 cnnot be invertible. 4.11 Inner Product Spces

More information

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany Lecture Notes to Accompny Scientific Computing An Introductory Survey Second Edition by Michel T Heth Boundry Vlue Problems Side conditions prescribing solution or derivtive vlues t specified points required

More information

Graphs on Logarithmic and Semilogarithmic Paper

Graphs on Logarithmic and Semilogarithmic Paper 0CH_PHClter_TMSETE_ 3//00 :3 PM Pge Grphs on Logrithmic nd Semilogrithmic Pper OBJECTIVES When ou hve completed this chpter, ou should be ble to: Mke grphs on logrithmic nd semilogrithmic pper. Grph empiricl

More information

Reasoning to Solve Equations and Inequalities

Reasoning to Solve Equations and Inequalities Lesson4 Resoning to Solve Equtions nd Inequlities In erlier work in this unit, you modeled situtions with severl vriles nd equtions. For exmple, suppose you were given usiness plns for concert showing

More information

Integration. 148 Chapter 7 Integration

Integration. 148 Chapter 7 Integration 48 Chpter 7 Integrtion 7 Integrtion t ech, by supposing tht during ech tenth of second the object is going t constnt speed Since the object initilly hs speed, we gin suppose it mintins this speed, but

More information

Distributions. (corresponding to the cumulative distribution function for the discrete case).

Distributions. (corresponding to the cumulative distribution function for the discrete case). Distributions Recll tht n integrble function f : R [,] such tht R f()d = is clled probbility density function (pdf). The distribution function for the pdf is given by F() = (corresponding to the cumultive

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Lecture 3 Gussin Probbility Distribution Introduction l Gussin probbility distribution is perhps the most used distribution in ll of science. u lso clled bell shped curve or norml distribution l Unlike

More information

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management Journl of Mchine Lerning Reserch 9 (2008) 2079-2 Submitted 8/08; Published 0/08 Vlue Function Approximtion using Multiple Aggregtion for Multittribute Resource Mngement Abrhm George Wrren B. Powell Deprtment

More information

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES DAVID WEBB CONTENTS Liner trnsformtions 2 The representing mtrix of liner trnsformtion 3 3 An ppliction: reflections in the plne 6 4 The lgebr of

More information

9 CONTINUOUS DISTRIBUTIONS

9 CONTINUOUS DISTRIBUTIONS 9 CONTINUOUS DISTIBUTIONS A rndom vrible whose vlue my fll nywhere in rnge of vlues is continuous rndom vrible nd will be ssocited with some continuous distribution. Continuous distributions re to discrete

More information

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers. 2 Rtionl Numbers Integers such s 5 were importnt when solving the eqution x+5 = 0. In similr wy, frctions re importnt for solving equtions like 2x = 1. Wht bout equtions like 2x + 1 = 0? Equtions of this

More information

Treatment Spring Late Summer Fall 0.10 5.56 3.85 0.61 6.97 3.01 1.91 3.01 2.13 2.99 5.33 2.50 1.06 3.53 6.10 Mean = 1.33 Mean = 4.88 Mean = 3.

Treatment Spring Late Summer Fall 0.10 5.56 3.85 0.61 6.97 3.01 1.91 3.01 2.13 2.99 5.33 2.50 1.06 3.53 6.10 Mean = 1.33 Mean = 4.88 Mean = 3. The nlysis of vrince (ANOVA) Although the t-test is one of the most commonly used sttisticl hypothesis tests, it hs limittions. The mjor limittion is tht the t-test cn be used to compre the mens of only

More information

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Byesin Updting with Continuous Priors Clss 3, 8.05, Spring 04 Jeremy Orloff nd Jonthn Bloom Lerning Gols. Understnd prmeterized fmily of distriutions s representing continuous rnge of hypotheses for the

More information

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions. Lerning Objectives Loci nd Conics Lesson 3: The Ellipse Level: Preclculus Time required: 120 minutes In this lesson, students will generlize their knowledge of the circle to the ellipse. The prmetric nd

More information

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity Bbylonin Method of Computing the Squre Root: Justifictions Bsed on Fuzzy Techniques nd on Computtionl Complexity Olg Koshelev Deprtment of Mthemtics Eduction University of Texs t El Pso 500 W. University

More information

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding 1 Exmple A rectngulr box without lid is to be mde from squre crdbord of sides 18 cm by cutting equl squres from ech corner nd then folding up the sides. 1 Exmple A rectngulr box without lid is to be mde

More information

Protocol Analysis. 17-654/17-764 Analysis of Software Artifacts Kevin Bierhoff

Protocol Analysis. 17-654/17-764 Analysis of Software Artifacts Kevin Bierhoff Protocol Anlysis 17-654/17-764 Anlysis of Softwre Artifcts Kevin Bierhoff Tke-Awys Protocols define temporl ordering of events Cn often be cptured with stte mchines Protocol nlysis needs to py ttention

More information

CHAPTER 11 Numerical Differentiation and Integration

CHAPTER 11 Numerical Differentiation and Integration CHAPTER 11 Numericl Differentition nd Integrtion Differentition nd integrtion re bsic mthemticl opertions with wide rnge of pplictions in mny res of science. It is therefore importnt to hve good methods

More information

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE Skndz, Stockholm ABSTRACT Three methods for fitting multiplictive models to observed, cross-clssified

More information

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100 hsn.uk.net Higher Mthemtics UNIT 3 OUTCOME 1 Vectors Contents Vectors 18 1 Vectors nd Sclrs 18 Components 18 3 Mgnitude 130 4 Equl Vectors 131 5 Addition nd Subtrction of Vectors 13 6 Multipliction by

More information

Basic Analysis of Autarky and Free Trade Models

Basic Analysis of Autarky and Free Trade Models Bsic Anlysis of Autrky nd Free Trde Models AUTARKY Autrky condition in prticulr commodity mrket refers to sitution in which country does not engge in ny trde in tht commodity with other countries. Consequently

More information

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Decision Rule Extraction from Trained Neural Networks Using Rough Sets Decision Rule Extrction from Trined Neurl Networks Using Rough Sets Alin Lzr nd Ishwr K. Sethi Vision nd Neurl Networks Lbortory Deprtment of Computer Science Wyne Stte University Detroit, MI 48 ABSTRACT

More information

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

and thus, they are similar. If k = 3 then the Jordan form of both matrices is Homework ssignment 11 Section 7. pp. 249-25 Exercise 1. Let N 1 nd N 2 be nilpotent mtrices over the field F. Prove tht N 1 nd N 2 re similr if nd only if they hve the sme miniml polynomil. Solution: If

More information

Econ 4721 Money and Banking Problem Set 2 Answer Key

Econ 4721 Money and Banking Problem Set 2 Answer Key Econ 472 Money nd Bnking Problem Set 2 Answer Key Problem (35 points) Consider n overlpping genertions model in which consumers live for two periods. The number of people born in ech genertion grows in

More information

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes The Sclr Product 9.3 Introduction There re two kinds of multipliction involving vectors. The first is known s the sclr product or dot product. This is so-clled becuse when the sclr product of two vectors

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: Write polynomils in stndrd form nd identify the leding coefficients nd degrees of polynomils Add nd subtrct polynomils Multiply

More information

Vectors 2. 1. Recap of vectors

Vectors 2. 1. Recap of vectors Vectors 2. Recp of vectors Vectors re directed line segments - they cn be represented in component form or by direction nd mgnitude. We cn use trigonometry nd Pythgors theorem to switch between the forms

More information

How To Understand The Theory Of Inequlities

How To Understand The Theory Of Inequlities Ostrowski Type Inequlities nd Applictions in Numericl Integrtion Edited By: Sever S Drgomir nd Themistocles M Rssis SS Drgomir) School nd Communictions nd Informtics, Victori University of Technology,

More information

Euler Euler Everywhere Using the Euler-Lagrange Equation to Solve Calculus of Variation Problems

Euler Euler Everywhere Using the Euler-Lagrange Equation to Solve Calculus of Variation Problems Euler Euler Everywhere Using the Euler-Lgrnge Eqution to Solve Clculus of Vrition Problems Jenine Smllwood Principles of Anlysis Professor Flschk My 12, 1998 1 1. Introduction Clculus of vritions is brnch

More information

Math 135 Circles and Completing the Square Examples

Math 135 Circles and Completing the Square Examples Mth 135 Circles nd Completing the Squre Exmples A perfect squre is number such tht = b 2 for some rel number b. Some exmples of perfect squres re 4 = 2 2, 16 = 4 2, 169 = 13 2. We wish to hve method for

More information

Regular Sets and Expressions

Regular Sets and Expressions Regulr Sets nd Expressions Finite utomt re importnt in science, mthemtics, nd engineering. Engineers like them ecuse they re super models for circuits (And, since the dvent of VLSI systems sometimes finite

More information

Experiment 6: Friction

Experiment 6: Friction Experiment 6: Friction In previous lbs we studied Newton s lws in n idel setting, tht is, one where friction nd ir resistnce were ignored. However, from our everydy experience with motion, we know tht

More information

MODULE 3. 0, y = 0 for all y

MODULE 3. 0, y = 0 for all y Topics: Inner products MOULE 3 The inner product of two vectors: The inner product of two vectors x, y V, denoted by x, y is (in generl) complex vlued function which hs the following four properties: i)

More information

Warm-up for Differential Calculus

Warm-up for Differential Calculus Summer Assignment Wrm-up for Differentil Clculus Who should complete this pcket? Students who hve completed Functions or Honors Functions nd will be tking Differentil Clculus in the fll of 015. Due Dte:

More information

The Velocity Factor of an Insulated Two-Wire Transmission Line

The Velocity Factor of an Insulated Two-Wire Transmission Line The Velocity Fctor of n Insulted Two-Wire Trnsmission Line Problem Kirk T. McDonld Joseph Henry Lbortories, Princeton University, Princeton, NJ 08544 Mrch 7, 008 Estimte the velocity fctor F = v/c nd the

More information

Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach

Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach Rel Time Robust 1 Trcker Using Accelerted Proximl Grdient Approch Chenglong Bo 1,YiWu 2, Hibin ing 2, nd Hui Ji 1 1 Deprtment of Mthemtics, Ntionl University of Singpore, Singpore,11976 2 Deprtment of

More information

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Or more simply put, when adding or subtracting quantities, their uncertainties add. Propgtion of Uncertint through Mthemticl Opertions Since the untit of interest in n eperiment is rrel otined mesuring tht untit directl, we must understnd how error propgtes when mthemticl opertions re

More information

Health insurance exchanges What to expect in 2014

Health insurance exchanges What to expect in 2014 Helth insurnce exchnges Wht to expect in 2014 33096CAEENABC 02/13 The bsics of exchnges As prt of the Affordble Cre Act (ACA or helth cre reform lw), strting in 2014 ALL Americns must hve minimum mount

More information

Health insurance marketplace What to expect in 2014

Health insurance marketplace What to expect in 2014 Helth insurnce mrketplce Wht to expect in 2014 33096VAEENBVA 06/13 The bsics of the mrketplce As prt of the Affordble Cre Act (ACA or helth cre reform lw), strting in 2014 ALL Americns must hve minimum

More information

Learning to Search Better than Your Teacher

Learning to Search Better than Your Teacher Ki-Wei Chng University of Illinois t Urbn Chmpign, IL Akshy Krishnmurthy Crnegie Mellon University, Pittsburgh, PA Alekh Agrwl Microsoft Reserch, New York, NY Hl Dumé III University of Mrylnd, College

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Section 5-4 Trigonometric Functions

Section 5-4 Trigonometric Functions 5- Trigonometric Functions Section 5- Trigonometric Functions Definition of the Trigonometric Functions Clcultor Evlution of Trigonometric Functions Definition of the Trigonometric Functions Alternte Form

More information

Integration by Substitution

Integration by Substitution Integrtion by Substitution Dr. Philippe B. Lvl Kennesw Stte University August, 8 Abstrct This hndout contins mteril on very importnt integrtion method clled integrtion by substitution. Substitution is

More information

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist Techniques for Requirements Gthering nd Definition Kristin Persson Principl Product Specilist Requirements Lifecycle Mngement Elicit nd define business/user requirements Vlidte requirements Anlyze requirements

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Unit 29: Inference for Two-Way Tables

Unit 29: Inference for Two-Way Tables Unit 29: Inference for Two-Wy Tbles Prerequisites Unit 13, Two-Wy Tbles is prerequisite for this unit. In ddition, students need some bckground in significnce tests, which ws introduced in Unit 25. Additionl

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

4 Approximations. 4.1 Background. D. Levy

4 Approximations. 4.1 Background. D. Levy D. Levy 4 Approximtions 4.1 Bckground In this chpter we re interested in pproximtion problems. Generlly speking, strting from function f(x) we would like to find different function g(x) tht belongs to

More information

How To Network A Smll Business

How To Network A Smll Business Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

SPECIAL PRODUCTS AND FACTORIZATION

SPECIAL PRODUCTS AND FACTORIZATION MODULE - Specil Products nd Fctoriztion 4 SPECIAL PRODUCTS AND FACTORIZATION In n erlier lesson you hve lernt multipliction of lgebric epressions, prticulrly polynomils. In the study of lgebr, we come

More information

Small Business Cloud Services

Small Business Cloud Services Smll Business Cloud Services Summry. We re thick in the midst of historic se-chnge in computing. Like the emergence of personl computers, grphicl user interfces, nd mobile devices, the cloud is lredy profoundly

More information

ORBITAL MANEUVERS USING LOW-THRUST

ORBITAL MANEUVERS USING LOW-THRUST Proceedings of the 8th WSEAS Interntionl Conference on SIGNAL PROCESSING, ROBOICS nd AUOMAION ORBIAL MANEUVERS USING LOW-HRUS VIVIAN MARINS GOMES, ANONIO F. B. A. PRADO, HÉLIO KOII KUGA Ntionl Institute

More information

2005-06 Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration

2005-06 Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration Source: http://www.mth.cuhk.edu.hk/~mt26/mt26b/notes/notes3.pdf 25-6 Second Term MAT26B 1 Supplementry Notes 3 Interchnge of Differentition nd Integrtion The theme of this course is bout vrious limiting

More information

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process An Undergrdute Curriculum Evlution with the Anlytic Hierrchy Process Les Frir Jessic O. Mtson Jck E. Mtson Deprtment of Industril Engineering P.O. Box 870288 University of Albm Tuscloos, AL. 35487 Abstrct

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes of revolution, prt ii Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem ) nd the ccumultion process is to determine so-clled volumes of

More information

JaERM Software-as-a-Solution Package

JaERM Software-as-a-Solution Package JERM Softwre-s--Solution Pckge Enterprise Risk Mngement ( ERM ) Public listed compnies nd orgnistions providing finncil services re required by Monetry Authority of Singpore ( MAS ) nd/or Singpore Stock

More information

A.7.1 Trigonometric interpretation of dot product... 324. A.7.2 Geometric interpretation of dot product... 324

A.7.1 Trigonometric interpretation of dot product... 324. A.7.2 Geometric interpretation of dot product... 324 A P P E N D I X A Vectors CONTENTS A.1 Scling vector................................................ 321 A.2 Unit or Direction vectors...................................... 321 A.3 Vector ddition.................................................

More information

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

Hillsborough Township Public Schools Mathematics Department Computer Programming 1 Essentil Unit 1 Introduction to Progrmming Pcing: 15 dys Common Unit Test Wht re the ethicl implictions for ming in tody s world? There re ethicl responsibilities to consider when writing computer s. Citizenship,

More information

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY MAT 0630 INTERNET RESOURCES, REVIEW OF CONCEPTS AND COMMON MISTAKES PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY Contents 1. ACT Compss Prctice Tests 1 2. Common Mistkes 2 3. Distributive

More information

The mean-variance optimal portfolio

The mean-variance optimal portfolio ALEXANDRE S. DA SILVA is vice president in the Quntittive Investment Group t Neuberger ermn in New York, NY. lexndre.dsilv@nb.com WAI LEE is the chief investment officer nd hed of the Quntittive Investment

More information

Review guide for the final exam in Math 233

Review guide for the final exam in Math 233 Review guide for the finl exm in Mth 33 1 Bsic mteril. This review includes the reminder of the mteril for mth 33. The finl exm will be cumultive exm with mny of the problems coming from the mteril covered

More information

Exponential and Logarithmic Functions

Exponential and Logarithmic Functions Nme Chpter Eponentil nd Logrithmic Functions Section. Eponentil Functions nd Their Grphs Objective: In this lesson ou lerned how to recognize, evlute, nd grph eponentil functions. Importnt Vocbulr Define

More information

The Black-Litterman Model For Active Portfolio Management Forthcoming in Journal of Portfolio Management Winter 2009

The Black-Litterman Model For Active Portfolio Management Forthcoming in Journal of Portfolio Management Winter 2009 The lck-littermn Model For Active Portfolio Mngement Forthcoming in Journl of Portfolio Mngement Winter 009 Alexndre Schutel D Silv Senior Vice President, Quntittive Investment Group Neuberger ermn Wi

More information

Multiple Testing in a Two-Stage Adaptive Design With Combination Tests Controlling FDR

Multiple Testing in a Two-Stage Adaptive Design With Combination Tests Controlling FDR This rticle ws downloded by: [New Jersey Institute of Technology] On: 28 Februry 204, At: 08:46 Publisher: Tylor & Frncis Inform Ltd Registered in nglnd nd Wles Registered Number: 072954 Registered office:

More information

Abstract. This paper introduces new algorithms and data structures for quick counting for machine

Abstract. This paper introduces new algorithms and data structures for quick counting for machine Journl of Artiæcil Intelligence Reserch 8 è998è 67-9 Submitted 7è97; published è98 Cched Suæcient Sttistics for Eæcient Mchine Lerning with Lrge Dtsets Andrew Moore Mry Soon Lee School of Computer Science

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Lectures 8 and 9 1 Rectangular waveguides

Lectures 8 and 9 1 Rectangular waveguides 1 Lectures 8 nd 9 1 Rectngulr wveguides y b x z Consider rectngulr wveguide with 0 < x b. There re two types of wves in hollow wveguide with only one conductor; Trnsverse electric wves

More information

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment ClerPeks Customer Cre Guide Business s Usul (BU) Services Pece of mind for your BI Investment ClerPeks Customer Cre Business s Usul Services Tble of Contents 1. Overview...3 Benefits of Choosing ClerPeks

More information

Optiml Control of Seril, Multi-Echelon Inventory (E&I) & Mixed Erlng demnds

Optiml Control of Seril, Multi-Echelon Inventory (E&I) & Mixed Erlng demnds Optiml Control of Seril, Multi-Echelon Inventory/Production Systems with Periodic Btching Geert-Jn vn Houtum Deprtment of Technology Mngement, Technische Universiteit Eindhoven, P.O. Box 513, 56 MB, Eindhoven,

More information

Physics 43 Homework Set 9 Chapter 40 Key

Physics 43 Homework Set 9 Chapter 40 Key Physics 43 Homework Set 9 Chpter 4 Key. The wve function for n electron tht is confined to x nm is. Find the normliztion constnt. b. Wht is the probbility of finding the electron in. nm-wide region t x

More information

VoIP for the Small Business

VoIP for the Small Business Reducing your telecommunictions costs VoIP (Voice over Internet Protocol) offers low cost lterntive to expensive trditionl phone services nd is rpidly becoming the communictions system of choice for smll

More information

belief Propgtion Lgorithm in Nd Pent Penta

belief Propgtion Lgorithm in Nd Pent Penta IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2012 375 Itertive Trust nd Reputtion Mngement Using Belief Propgtion Ermn Aydy, Student Member, IEEE, nd Frmrz Feri, Senior

More information

How To Study The Effects Of Music Composition On Children

How To Study The Effects Of Music Composition On Children C-crcs Cognitive - Counselling Reserch & Conference Services (eissn: 2301-2358) Volume I Effects of Music Composition Intervention on Elementry School Children b M. Hogenes, B. Vn Oers, R. F. W. Diekstr,

More information

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn 33337_0P03.qp 2/27/06 24 9:3 AM Chpter P Pge 24 Prerequisites P.3 Polynomils nd Fctoring Wht you should lern Polynomils An lgeric epression is collection of vriles nd rel numers. The most common type of

More information

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur Module Anlysis of Stticlly Indeterminte Structures by the Mtrix Force Method Version CE IIT, Khrgpur esson 9 The Force Method of Anlysis: Bems (Continued) Version CE IIT, Khrgpur Instructionl Objectives

More information

MATH 150 HOMEWORK 4 SOLUTIONS

MATH 150 HOMEWORK 4 SOLUTIONS MATH 150 HOMEWORK 4 SOLUTIONS Section 1.8 Show tht the product of two of the numbers 65 1000 8 2001 + 3 177, 79 1212 9 2399 + 2 2001, nd 24 4493 5 8192 + 7 1777 is nonnegtive. Is your proof constructive

More information

Small Businesses Decisions to Offer Health Insurance to Employees

Small Businesses Decisions to Offer Health Insurance to Employees Smll Businesses Decisions to Offer Helth Insurnce to Employees Ctherine McLughlin nd Adm Swinurn, June 2014 Employer-sponsored helth insurnce (ESI) is the dominnt source of coverge for nonelderly dults

More information

Contextualizing NSSE Effect Sizes: Empirical Analysis and Interpretation of Benchmark Comparisons

Contextualizing NSSE Effect Sizes: Empirical Analysis and Interpretation of Benchmark Comparisons Contextulizing NSSE Effect Sizes: Empiricl Anlysis nd Interprettion of Benchmrk Comprisons NSSE stff re frequently sked to help interpret effect sizes. Is.3 smll effect size? Is.5 relly lrge effect size?

More information

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report DlNBVRGH + + THE CITY OF EDINBURGH COUNCIL Sickness Absence Monitoring Report Executive of the Council 8fh My 4 I.I...3 Purpose of report This report quntifies the mount of working time lost s result of

More information

Space Vector Pulse Width Modulation Based Induction Motor with V/F Control

Space Vector Pulse Width Modulation Based Induction Motor with V/F Control Interntionl Journl of Science nd Reserch (IJSR) Spce Vector Pulse Width Modultion Bsed Induction Motor with V/F Control Vikrmrjn Jmbulingm Electricl nd Electronics Engineering, VIT University, Indi Abstrct:

More information

How To Set Up A Network For Your Business

How To Set Up A Network For Your Business Why Network is n Essentil Productivity Tool for Any Smll Business TechAdvisory.org SME Reports sponsored by Effective technology is essentil for smll businesses looking to increse their productivity. Computer

More information

INTERCHANGING TWO LIMITS. Zoran Kadelburg and Milosav M. Marjanović

INTERCHANGING TWO LIMITS. Zoran Kadelburg and Milosav M. Marjanović THE TEACHING OF MATHEMATICS 2005, Vol. VIII, 1, pp. 15 29 INTERCHANGING TWO LIMITS Zorn Kdelburg nd Milosv M. Mrjnović This pper is dedicted to the memory of our illustrious professor of nlysis Slobodn

More information

Data replication in mobile computing

Data replication in mobile computing Technicl Report, My 2010 Dt repliction in mobile computing Bchelor s Thesis in Electricl Engineering Rodrigo Christovm Pmplon HALMSTAD UNIVERSITY, IDE SCHOOL OF INFORMATION SCIENCE, COMPUTER AND ELECTRICAL

More information

A Decision Theoretic Framework for Ranking using Implicit Feedback

A Decision Theoretic Framework for Ranking using Implicit Feedback A Decision Theoretic Frmework for Rnking using Implicit Feedbck Onno Zoeter Michel Tylor Ed Snelson John Guiver Nick Crswell Mrtin Szummer Microsoft Reserch Cmbridge 7 J J Thomson Avenue Cmbridge, United

More information

2 DIODE CLIPPING and CLAMPING CIRCUITS

2 DIODE CLIPPING and CLAMPING CIRCUITS 2 DIODE CLIPPING nd CLAMPING CIRCUITS 2.1 Ojectives Understnding the operting principle of diode clipping circuit Understnding the operting principle of clmping circuit Understnding the wveform chnge of

More information

piecewise Liner SLAs and Performance Timetagment

piecewise Liner SLAs and Performance Timetagment i: Incrementl Cost bsed Scheduling under Piecewise Liner SLAs Yun Chi NEC Lbortories Americ 18 N. Wolfe Rd., SW3 35 Cupertino, CA 9514, USA ychi@sv.nec lbs.com Hyun Jin Moon NEC Lbortories Americ 18 N.

More information

Algebra Review. How well do you remember your algebra?

Algebra Review. How well do you remember your algebra? Algebr Review How well do you remember your lgebr? 1 The Order of Opertions Wht do we men when we write + 4? If we multiply we get 6 nd dding 4 gives 10. But, if we dd + 4 = 7 first, then multiply by then

More information

Solving BAMO Problems

Solving BAMO Problems Solving BAMO Problems Tom Dvis tomrdvis@erthlink.net http://www.geometer.org/mthcircles Februry 20, 2000 Abstrct Strtegies for solving problems in the BAMO contest (the By Are Mthemticl Olympid). Only

More information

QUADRATURE METHODS. July 19, 2011. Kenneth L. Judd. Hoover Institution

QUADRATURE METHODS. July 19, 2011. Kenneth L. Judd. Hoover Institution QUADRATURE METHODS Kenneth L. Judd Hoover Institution July 19, 2011 1 Integrtion Most integrls cnnot be evluted nlyticlly Integrls frequently rise in economics Expected utility Discounted utility nd profits

More information

Performance analysis model for big data applications in cloud computing

Performance analysis model for big data applications in cloud computing Butist Villlpndo et l. Journl of Cloud Computing: Advnces, Systems nd Applictions 2014, 3:19 RESEARCH Performnce nlysis model for big dt pplictions in cloud computing Luis Edurdo Butist Villlpndo 1,2,

More information

Clipping & Scan Conversion. CSE167: Computer Graphics Instructor: Steve Rotenberg UCSD, Fall 2005

Clipping & Scan Conversion. CSE167: Computer Graphics Instructor: Steve Rotenberg UCSD, Fall 2005 Clipping & Scn Conersion CSE167: Computer Grphics Instructor: Stee Rotenberg UCSD, Fll 2005 Project 2 Render 3D hnd (mde up of indiidul boxes) using hierrchicl trnsformtions (push/pop) The hnd should perform

More information

Review Problems for the Final of Math 121, Fall 2014

Review Problems for the Final of Math 121, Fall 2014 Review Problems for the Finl of Mth, Fll The following is collection of vrious types of smple problems covering sections.,.5, nd.7 6.6 of the text which constitute only prt of the common Mth Finl. Since

More information

Fast Demand Learning for Display Advertising Revenue Management

Fast Demand Learning for Display Advertising Revenue Management Fst Demnd Lerning for Disply Advertising Revenue Mngement Drgos Florin Ciocn Vivek F Fris April 30, 2014 Abstrct The present pper is motivted by the network revenue mngement problems tht occur in online

More information