AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi
|
|
- Lesley Neal
- 8 years ago
- Views:
Transcription
1 AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output learning For slides and up-to-date information: / 64 Outline Outline Beyond classification: structured output SVMs Beyond classification: structured output SVMs Learning formulations Learning formulations Optimisation Optimisation A complete example A complete example Further insights on optimisation Further insights on optimisation 3 / 64 4 / 64
2 Beyond classification Support Vector Regression / Consider now the general problem of learning a function f : X Y, x y, where both the input and output spaces are general. Examples: Ranking. given a set of objects (o,..., o k ) as input x, return a order as output y. Pose estimation. given an image of a human as input x, return the parameters (p,..., p k ) of his/her pose as output y. mage segmentation. given an image from Flikr as input x, return a mask highlighting the foreground object as output y. A real function R d R can be approximated directly by the SVM score: f (x) w, Φ(x). Think of the feature map Φ(x) as a collection of basis functions. For instance, if x R, one can use the basis of second order polynomials: Φ(x) = [ x x ] w, Φ(x) = w + w x + 3 x. The goal is to find w (e.g. polynomial coefficients) such that the score fits the example data w, Φ(x i ) y i by minimising the L error L i (w) = y i w, Φ(x i ). Support Vector Regression / 5 / 64 A general approach: learning the graph 6 / 64 SVR is just a variant of regularised regressions: method loss regul. objective function SVR l l n n i= y i w Φ(x i ) + λ w least square l n none n i= (y i w Φ(x i )) ridge regression l l n n i= (y i w Φ(x i )) + λ w lassoo l l n n i= (y i w Φ(x i )) + λ w An aside: ɛ-insensitive L loss Limitation: only real functions! Actually, SVR makes use of a slightly more general loss Use a binary SVM to classify which pairs (x, y) X Y belongs to the graph of the function (treat the output as an input!): Joint feature map y = f (x) w, Ψ(x, y) >. n order to classify pairs (x, y), these must be encoded as vectors. To this end, we need a joint feature map: Φ : (x, y) Φ(x, y) R d As long as this feature can be designed, the nature of x and y is irrelevant. L i (w) = max{, y i w, x i ɛ} which is insensitive to error below a threshold ɛ. One can set ɛ = though [Smola and Scholkopf, 4]. 7 / 64 8 / 64
3 y y y Example: learning the graph of a real function / Algorithm: learned function f - x scoring function - x. Start from the true pairs (x i, y i ) (green squares) where the graph should pass.. Add many false pairs (x i, y i ) (red dots) where the graph should not pass. 3. Learn a scoring function w, Ψ(x, y) to fit these points. 4. Define the learned function graph to be the collection of points such that w, Ψ(x, y) > (green areas) learned function f - x The good and the bad scoring function - x The good: works for any type of inputs and outputs! (Not just real functions.) The Bad: Not one-to-one. For each x, there are multiple outputs y with positive score. Not complete. There are x for which all the outputs have negative score. Very large negative example set / 64 / 64 Example: learning the graph of a real function / learned function f - x scoring function - x n this example the joint feature map is a Fourier basis (note the ringing!) cos(f x x + f y y + φ ) cos(f x x + f y y + φ ) Ψ(x, y) =., for appropriate (f i, f i, φ i ). cos(f dx x + f dy y + φ d ) Structured output SVMs Structured output SVM. ssues and can be fixed by choosing the highest scoring output for each input: ntuition The scoring function ŷ(x; w) = argmax w, Ψ(x, y) w, Ψ(x, y) is somewhat analogous to a posterior probability density function P(y x) but it does not have any probabilistic meaning. / 64 / 64
4 Example: real function learned function f scoring function nference problem column rescaled nference problem. Evaluating a structured SVM requires solving the problem argmaxhw, Ψ(x, y)i. - The efficiency of using a structured SVM (after learning) depends on how quickly the inference problem can be solved. f (x) = y that maximises the score along column x. f (x) is now uniquely and completely defined. Note: only the relative values of the score along a column really matter (see rescaled version on the right). 3 / 64 4 / 64 Example: binary linear SVM Example: object localisation Let x be an image and y Y R4 a rectangular window. The goal is to find the window containing a given object. Standard SVMs can be easily interpreted as a structured SVMs: Output space: y Y = {, +}. w Feature map: Ψ(x, y ) = (x, y) y x. nference: restriction visual features x y (x, y) Rd Let x y denote an image window (crop). Standard SVM: score one window: Φ(x y ) = histogram of SFT features, y hw, xi = signhw, xi. y {,+} y (x; w) = argmax hw, Φ(x y )i = window score. Structured SVM: try all windows and pick the best one: y (x; w) = argmaxhw, Ψ(x, y)i = argmaxhw, Φ(x y )i. 5 / 64 6 / 64
5 Example: pose estimation Example: ranking / Let x be an image and y = (p, p, p3, p4, p5 ) the pose of a human, expressed as the D location of five parts. (x p ) (p, x p p ) x p (x p ) (x p ) (x p ) (x ) (x, y) = 6 p5 7 6 (p, p ) (p5, p6 ) Consider the problem of ranking a list of objects x = (o,..., on ) (input). The output y is an ranking (total order). This can be represented as a matrix y such that yij = +, oi has higher rank than oj, yij =, otherwise. A joint feature map for ranking Ψ(x, y) = X ij yij hφ(oi ) Φ(oj ), wi. nituition The score hw, Ψ(x, y)i reflects how well the five image parts match their appearance models and whether the deformation is reasonable or not. 8 / 64 7 / 64 Example: ranking / Outline This structured SVM ranks the objects by decreasing score hφ(oi ), wi: y ij (x; w) = sign hφ(oi ), wi hφ(oj ), wi. Beyond classification: structured output SVMs n fact the score of this output X hw, Ψ(x, y (x; w))i = yij hφ(oi ) Φ(oj ), wi Learning formulations ij = X ij = X ij signhφ(oi ) Φ(oj ), wihφ(oi ) Φ(oj ), wi Optimisation hφ(oi ) Φ(oj ), wi A complete example is maximum. Further insights on optimisation 9 / 64 / 64
6 Summary so far and what remains to be done Learning formulation / nput-output relation The SVM defines an input-output relation based on maximising the joint score: ŷ(x; w) = argmax w, Ψ(x, y). Next: how to fit the input-output relation to data. Given n example input-output pairs (x, y ),..., (x n, y n ), find w such that the structured SVM approximately fit them ŷ(x i ; w) y i, i =,..., n, while controlling the complexity of the estimated function. Objective function (non-convex) E (w) = λ w + n (y i, ŷ(x i ; w)) i= Notation reminder: is the loss function, ŷ the output estimated by the SVM, y i the ground truth output, and x i the ground truth input. Loss function / 64 Example: a ranking loss / 64 The loss function measures the fit quality: (y, ŷ) such that (y, ŷ) and (y, ŷ) = if, and only if, y = ŷ. Examples: For a binary SVM the loss is (y, ŷ) = {, y ŷ,, otherwise. n object localisation the loss could be one minus the ratio of the areas of the intersection and union of the rectangles y and ŷ: n ranking... (y, ŷ) = y ŷ y ŷ. n ranking, suitable losses include the ROC-AUC, the precision-recall AUC, k,... The ROC curve plots the true positive rate against the true negative rate. true positive rate Area under ROC = true negative rate Given the true ranking y and the estimated ŷ, we can define (y, ŷ) = ROCAUC(y, ŷ) One can show that this is simply the number of incorrectly ranked pairs, i.e. (y, ŷ) = n [y ij ŷ ij ] i,j= 3 / 64 4 / 64
7 Learning formulation / The surrogate loss The goal of learning is to find the minimiser w of: E (w) = λ w + n (y i, ŷ(x i ; w)), i= where ŷ(x i ; w) = argmax w, Φ(x i, y). The dependency of the loss on w is very complex: is non-convex and is composed with argmax! Objective function (convex) Given a convex surrogate loss L i (w) (y i, ŷ(x i ; w)) we consider the objective E(w) = λ w + L i (w). n i= The key in the success of the structured SVMs is the existence of good surrogates. There are standard constructions that work well in a variety of cases (but not always!). The aim is to make minimising L i (w) have the same effect as minimising (y i, ŷ(x i ; w)). Bounding property: (y i, ŷ(x i ; w)) L i (w). Tightness f we can find w s.t. L i (w ) =, then (y i, y(x i ; w )) =. But can we? Not always! Consider setting L i (w) = very large constant. We need a tight bound. E.g.: (y i, y(x i ; w )) = L i (w ) =. Margin rescaling surrogate 5 / 64 Margin condition 6 / 64 Margin rescaling is the first standard surrogate construction: L i (w) = sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w. This surrogate bounds the loss: (y i, ŷ(x i ; w)) (y i, ŷ(x i ; w)) + because ŷ(x i ; w) maximises the score by definition. { }} { Ψ(x i, ŷ(x i ; w)), w Ψ(x i, y i ), w sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w = L i (w) s margin rescaling a tight approximation? The following margin condition holds score of g.t. output score of any other output margin { }} { { }} { { }} { L i (w ) = y Y : Ψ(x i, y i ), w Ψ(x i, y), w + (y i, y) Tightness The surrogate is not tight in the sense above: (y i, y(x i ; w )) = L i (w ) =. n order to minimise the surrogate, the more stringent margin condition has to be satisfied! But this is usually good enough, and in fact beneficial (implies robustness). 7 / 64 8 / 64
8 Slack rescaling surrogate Augmented inference Slack rescaling is the second standard surrogate construction: L i (w) = sup (y i, y) [ + Ψ(x i, y), w Ψ(x i, y i ), w ]. May give better results than marging rescaling. However, it is often significantly harder to treat in calculations. The margin condition is L i (w ) = y y i : score of g.t. output score of any other output margin { }} { { }} { {}}{ Ψ(x i, y i ), w Ψ(x i, y), w + Evaluating the objective E(w) requires computing the supremum in the augment loss sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w. Maximising this quantity is the augmented inference problem due to its similarity with the inference problem max Ψ(x i, y), w Augmented inference can be significantly harder than inference, especially for slack rescaling. Example: binary linear SVM 9 / 64 The good and the bad of convex surrogates 3 / 64 Recall that for a binary linear SVM: Y = {, +}, Ψ(x, y) = y x, (y i, ŷ) = [y i y]. Then in the margin rescaling construction, solving the augmented inference problem yields L i (w) = sup [y i y] + y y {,} x iw y i x i, w = max [y i y] + y y i x i, w y { y i,y i } = max{, y i x i, w }, Good: Convex surrogates separate the ground truth outputs y i from other outputs y by a margin modulated by the loss. Bad: Despite their construction, they can be poor approximations of the original loss. They are unimodal, and therefore cannot model situations in which different outputs are equally acceptable. f the ground truth y i is not separable, they may be incapable of identifying which is the best output that can actually be achieved instead no graceful fallback. i.e. the same loss of a standard SVM. n this case, slack rescaling yields the same result. 3 / 64 3 / 64
9 Outline Summary so far and what remains to be done Beyond classification: structured output SVMs Learning formulations nput-output relation The SVM defines an input-output relation based on maximising the joint score: ŷ(x; w) = argmax w, Ψ(x, y). Optimisation A complete example Further insights on optimisation Convex surrogate objective The joint score can be designed to fit the data (x, y ),..., (x n, y n ) by optimising E(w) = λ w + L i (w). n i= Next: how to solve this optimisation problem. A (naive) direct approach / 33 / 64 A (naive) direct approach / 34 / 64 Learning a structured SVM requires solving an optimisation problem of the type: E(w) = λ w + n L i (w), i= L i (w) = sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w. This problem can be rewritten as a constrained quadratic program in the parameters w and the slack variables ξ: E(w, ξ) = λ w + n ξ i, i= ξ i b iy a iy, w i =,..., n, y Y. Can we use a standard quadratic solver (e.g. quadprog in MATLAB)? More in general, this can be rewritten as E(w) = λ w + n L i (w), i= L i (w) = sup b iy a iy, w. The size of this problem There is one set of constraints for each data point (x i, y i ). Each set of constraints contains one linear constraint for each output y. Way too large (even infinite!) to be directly fed to a quadratic solver. 35 / / 64
10 A second look Subgradient and subdifferential Let s look again to the original problem is a slightly different form: E(w) = λ w + L(w), L(w) = n i= sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w. L(w) g L(w) is a convex, non-smooth function, with bounded Lipschitz constant (i.e., it does not vary too fast). Optimisation of such functions is extensively studied in operational research. We are going to discuss the Bundle Method for Regularized Risk Minimization (BMRM) method, a special case of bundle method for regularised loss functions, which in turns is a stabilised variant of cutting Assumption: L(w) convex, not necessarily smooth, with bounded Lipschitz constant G. A subgradient of L(w) at w is any vector g such that g G. w : L(w ) L(w) + g, w w. The subdifferential L(w) is the set of all subgradients and contains only the gradient L(w) if the function is differentiable. w Cutting planes 37 / 64 Cutting plane algorithm 38 / 64 L(w) L (t) (w) Goal: minimize a convex non-necessarily smooth function L(w). Method: incrementally construct a lower approximation L (t) (w). At each iteration, minimise the latter to obtain w t and add a cutting plane at that point. Cutting plane algorithm w Given a point w, we approximate the convex L(w) from below by a tangent plane: L(w) b a, w, a L(w ) b = L(w ) + a, w. (a, b) is the cutting plane at w. Given the cutting planes at w,..., w t, we define the lower approximation L (t) (w) = max i=,...,t b i a i, w. w Start with w = and t =. Then repeat:. t t +.. Get a cutting plane (a t, b t ) by computing the subgradient of L(w) at w t. 3. Add the plane to the current approximation L (t) (w). 4. Set w t = argmin w L (t) (w). 5. f L(w t ) L (t) (w t ) < ɛ stop as converged. [Kiwiel, 99, Lemaréchal et al., 995, Joachims et al., 9] 39 / 64 4 / 64
11 Guarantees at convergence L(t) (w) L(w) wt w Cutting plane example w The algorithm stops when L(wt ) L (wt ) <. The true optimum L(w ) is sandwiched: (t) wt minimizes L(t) w minimizes L z } { z } { L(t) (wt ) L(t) (w ) L(w ) L(wt ) {z } L(t) L Optimizing the function L(w) = w log w in the interval [., ]. Hence when the algorithm converge one has the guarantee: L(wt ) L(w ) +. 4 / 64 4 / 64 BMRM: cutting planes with a regulariser BMRM example The standard cutting plane algorithm takes forever to converge (it is not the one used for SVM...) as it can take wild steps. Bundle methods try to regularise the steps but are generally difficult to tune. BMRM notes that one has already a regulariser in the SVM objective function: λ E(w) = kwk + L(w). BMRM algorithm Start with w = and t =. Then repeat:. t t +.. Get a cutting plane (at, bt ) by computing the subgradient of L(w) at wt. 3. Add the plane to the current approximation L(t) (w). 4. Set Et (w) = λ kwk + L(t) (w). 5. Set wt = argminw Et (w). 6. f E(wt ) Et (wt ) < stop as converged Optimizing the function E(w) = w w log w in the interval [., ]. [Teo et al., 9] but also [Kiwiel, 99, Lemare chal et al., 995, Joachims et al., 9] 43 / / 64
12 Application of BMRM to structured SVMs Outline n this case: L(w) = n i= sup (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w. L(w) is just the average of the subgradients of the terms. The subgradient g i at w of a term is computed by determining the maximally violated output ȳ i = argmax (y i, y) + Ψ(x i, y), w Ψ(x i, y i ), w, Remark. This is the augmented inference problem. Remark. Once ȳ i is obtained, the subgradient is given by g i = Ψ(x i, ȳ i ) Ψ(x i, y i ). Beyond classification: structured output SVMs Learning formulations Optimisation A complete example Further insights on optimisation Thus BMRM can be applied provided that the augmented inference problem can be solved (even when Y is infinite!). Structured SVM: fitting a real function 45 / 64 MATLAB implementation / 46 / 64 Consider the problem of learning a real function f : R [, ] by fitting points (x, y ),..., (x n, y n ). Loss Joint feature map (y, ŷ) = ŷ y. y yx Ψ(x, y) = yx yx 3. y To see why this works we will look at the resulting inference problem. First, program a callback for the loss. function delta = losscb(param, y, ybar) delta = abs(ybar - y) ; 3 end Then a callback for the feature map. function psi = featurecb(param, x, y) psi = [y ; 3 y * x ; 4 y * x^ ; 5 y * x^3 ; * y^] ; 7 psi = sparse(psi) ; 8 end 47 / / 64
13 nference Augmented inference The inference problem is ŷ(x; w) = argmax w, Ψ(x, y) y [,] = argmax y(w + w x + w 3 x + w 4 x 3 ) y [,] y w 5. Differentiate w.r.t. y and set to zero to obtain: ŷ(x; w) = w w 5 + w w 5 x + w 3 w 5 x + w 4 w 5 x 3. Note: there are some other special cases due to the fact that y [, +] and w 5 may be negative. Solving the augmented inference problem is needed to compute the value and sub-gradient of the margin-rescaling loss L i (w) = max (y, ŷ) + w, Ψ(x, y) w, Ψ(x, y i ) ŷ [,] = max ŷ y i + y(w + w x + w 3 x + w 4 x 3 ) ŷ [,] y w 5 const. The maximiser is one of at most four possibilities: { y,, z, z + } [, ], z = y(w + w x + w 3 x + w 4 x 3 ). w 5 w 5 Try the four cases and pick the one with larger augmented loss. MATLAB implementation / 49 / 64 MATLAB implementation /3 5 / 64 Finally program the augmented inference. function yhat = constraintcb(param, model, x, y) w = model.w ; 3 z = w() + w() * x + w(3) * x.^ + w(4) * x.^3 ; 4 yhat = [] ; 5 if w(5) > 6 yhat = [z -, z + ] / w(5) ; 7 yhat = max(min(yhat, ),-) ; 8 end 9 yhat = [yhat, -, ] ; aloss abs(y_ - y) + z * y_ -.5 * y_.^ * w(5) ; [drop, worse] = max(aloss(yhat)) ; 3 yhat = yhat(worse) ; 4 end Once the callbacks are coded, we use an off-the-shelf-solver ( % training examples parm.patterns = {-, -,,, } ; 3 parm.labels = {.5, -.5,.5, -.5,.5} ; 4 5 % callbacks & other parameters 6 parm.lossfn ; 7 parm.constraintfn ; 8 parm.featurefn ; 9 parm.dimension = 5 ; % call the solver and print the model model = svm_struct_learn( -c -o, parm) ; 3 model.w 5 / 64 5 / 64
14 Learning the scoring function cutting plane iteration scoring function Outline column rescaled cutting plane iteration scoring function Optimisation Learning formulations column rescaled Beyond classification: structured output SVMs After each cutting plane iteration the scoring function A complete example F (x, y) = hψ(x, y ), wi is updated. Remember: The output function is obtained by maximising the score along the columns. The relative scaling of the columns is irrelevant and rescaling them reveals the structure better. Further insights on optimisation 53 / / 64 How fast is BMRM? BMRM for structured SVMs: problem size Provably convergent to a desired approximation. BMRM decouples the data from the approximation of L(w). The convergence rates with respect to the accuracy are not bad: The number of data points n affects the cost of evaluationg L(w) and its subgradient. However, the cost of optimising L(t) (w) depends only on the iteration number t! n practice t is small and L(t) (w) may be minimised very efficiently in the dual. loss L(w) non-smooth smooth number of iterations O( ) O(log( )) accounting for λ O( λ ) O( λ log( )) Note: the convergence rate depends also on the amount of regularisation λ. Difficult learning problems (e.g. object detection) typically have large n, small λ, small. so fast convergence is not so obvious. 55 / / 64 cu
15 BMRM subproblem in the primal BMRM subproblem in the dual The problem min w λ w + L (t) (w), L (t) (w) = max i=,...,n b i a i, w reduces to the constrained quadratic program λ min w,ξ w + ξ, ξ b i a i, w, i =,..., t. Note that there is a single (scalar) slack variable. This is known as one-slack formulation. Let b = [b,..., b t ], A = [a,..., a t ] and K = A A/λ. The corresponding dual problem is where at optimum w = λ Aα. ntuition: why it is efficient max α, b α α K α, α. The original infinite constraints are approximated by just t constraints in L (t) (w). This is possible because:. The approximation needs to be good only around the optimum.. The effective dimensionality and redundancy of the data are exploited. Solving the corresponding quadratic problem is easy because t is small. Remark. BMRM is a primal solver. Switching to the dual for the subproblems is convenient but completely optional. mplementation 57 / 64 Tricks of the trade: caching / 58 / 64 An attractive aspect is the ease of implementation. A = [] ; B = [] ; 3 minimum = -inf ; 4 while getobjective(w) - minimum > epsilon 5 [a,b] = getcuttingplane(w) ; 6 A = [A, a] ; 7 B = [B, b] ; 8 [w, minimum] = quadraticsolver(lambda, A, B) ; 9 end A simple quadratic solver may do as the problem is small (e.g. MATLAB quadprog). getcuttingplane computes an average of subgradients, in turn obtained by solving the augmented inference problems. w w w 3... L (w) (a, b ) (a, b ) (a 3, b 3 )... L (w) (a, b ) (a, b ) (a 3, b 3 ).... L n (w) (a n, b ) (a n, b n ) (a n3, b n3 )... L(w) (a, b ) (a, b ) (a 3, b 3 )... For each novel w t a new constraint per example is generated by running augmented inference. The overall loss is an average of per-example losses: And so for each cutting plane: a t = n L(w) = n a it (w), i= L i (w) i= b t = n b it (w), i= 59 / 64 6 / 64
16 Tricks of the trade: caching / Tricks of the trade: caching /3 w w w 3... L (w) (a, b ) (a, b ) (a 3, b 3 )... t L (w) (a, b ) (a, b ) (a 3, b 3 )... t.. L n (w) (a n, b ) (a n, b n ) (a n3, b n3 )... tn L(w) (a, b ) (a, b ) (a 3, b 3 )... (a t+δt, b t+δt ) Caching recombines constraints generated so far to obtain a novel cutting plane without running augmented inference (expensive) [Joachims, 6, Felzenszwalb et al., 8].. For each example i =,..., n pick the most violated constraint in the cache: ti = argmax b it a it, w. t=,...,t. Now form a novel cutting plane by recombining the existing constraints: a t+δt = n i= a it i (w), b t+δt = n i= b it i (w),. Caching is very important for problems like object detection in which inference is very expensive (seconds or minutes per image). Consider for example [Felzenszwalb et al., 8] object detector. With 5 training images and five seconds / image for inference it requires an hour for one round of augmented inference! Thus the solver should be iterated until examples in the cache are correctly separated. t is pointless to fetch more before the solution has stabilised due to the huge cost. Preventive caching. During a round of inference it is also possible to return and store in the cache a small set of highly violated constraints. They may become most violated at a later iteration. Tricks of the trade: incremental training 6 / 64 Bibliography 6 / 64 Another speedup is to train the model gradually, by adding progressively more training samples. The intuition is that a lot of samples are only needed to refine the model. P. F. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. n Proc. CVPR, 8. T. Joachims. Training linear SVMs in linear time. n Proc. KDD, 6. T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(), 9. K. C. Kiwiel. Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical Programming, 46, 99. C. Lemaréchal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods. Mathematical Programming, 69, 995. Alex J. Smola and Bernhard Scholkopf. A tutorial on support vector regression. Statistics and Computing, 4(3), 4. C. H. Teo, S. V. N. Vishwanathan, A. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, (55), / / 64
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationLecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationMathematics Review for MS Finance Students
Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business Economics Marshall School of Business Lecture 1: Introductory Material Sets The Real Number System Functions,
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationMVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationCSC 411: Lecture 07: Multiclass Classification
CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More informationConsistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationLecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization
Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationOnline learning of multi-class Support Vector Machines
IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationDiscrete Optimization
Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking
More informationThe Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
More informationAutomatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationOn the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 fbach@cs.berkeley.edu Abstract
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
More informationThe Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
More informationthe points are called control points approximating curve
Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationAn interval linear programming contractor
An interval linear programming contractor Introduction Milan Hladík Abstract. We consider linear programming with interval data. One of the most challenging problems in this topic is to determine or tight
More informationRecovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
More informationGautam Appa and H. Paul Williams A formula for the solution of DEA models
Gautam Appa and H. Paul Williams A formula for the solution of DEA models Working paper Original citation: Appa, Gautam and Williams, H. Paul (2002) A formula for the solution of DEA models. Operational
More informationSmoothing Multivariate Performance Measures
Journal of Machine Learning Research 13 (2012) 3589 3646 Submitted 11/11; Revised 9/12; Published 12/12 Smoothing Multivariate Performance Measures Xinhua Zhang Department of Computing Science University
More informationConvex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationLargest Fixed-Aspect, Axis-Aligned Rectangle
Largest Fixed-Aspect, Axis-Aligned Rectangle David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: February 21, 2004 Last Modified: February
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationChapter 6: Sensitivity Analysis
Chapter 6: Sensitivity Analysis Suppose that you have just completed a linear programming solution which will have a major impact on your company, such as determining how much to increase the overall production
More informationLinear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
More information