On the ksupport and Related Norms


 Miranda Casey
 2 years ago
 Views:
Transcription
1 On the ksupport and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald and Dimitris Stamos) Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
2 Plan Problem Spectral regularization ksupport norm Box norm Link to cluster norm Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
3 Problem Learn a matrix from a set of linear measurements: y i = W, X i + noise i, i = 1,..., n Method min W R d m n (y i W, X i ) 2 + λω(w ) Matrix completion: X i = e r e c Multitask learning: X i = e r x i Regularizer Ω encourages matrix structure Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
4 Spectral Regularization min W R d m n (y i W, X i ) 2 + λω(w ) Ω favors matrix structure (low rank, low variance, clustering, etc.) Choose an OInorm: Ω(W ) W = UWV, U, V orthogonal von Neumann (1937): W = g(σ(w )), with g is an SGfunction Well studied example is trace norm: g( ) = 1 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
5 ksupport Norm [Argyriou et al. 2012] Special case of group lasso with overlap [Jacob et al., 2009] w (k) = inf v J 2 : v J = w, supp(v J ) J J k J k Includes the l 1 norm (k = 1) and l 2 norm (k = d) Unit ball of (k) is the convex hull of {card(w) k, w 2 1} k Dual norm: u,(k) = ( u i )2 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
6 Spectral ksupport Norm ksupport norm is an SGfunction, inducing the OInorm W (k) := σ(w ) (k) Proposition. Unit ball of σ( ) (k) is the convex hull of {rank(w ) k, W F 1} Includes trace norm (k = 1) and Frobenius norm (k = d) Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
7 Matrix Completion Experiment dataset norm test error r k a ML 100k tr ρ = 50% en ks box e5 ML 1M tr ρ = 50% en ks box e6 Jester1 tr per en line ks box e5 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
8 MTL Experiment Table: Multitask learning clustering on Lenk dataset, with simple thresholding. dataset norm test error k a Lenk fr (0.07) per task tr (0.04)   en (0.04)   ks (0.04) box (0.04) e3 cfr (0.08)   ctr (0.03)   cen (0.03)   cks (0.03) cbox (0.03) e3 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
9 Box Norm Let Θ R d ++, bounded and convex and consider the norm: Box norm: Θ = w 2 Θ = inf θ Θ d w 2 i θ i, { a < θ i b, u 2,Θ = sup θ Θ d θ i c} Includes ksupport norm for a = 0, b = 1, c = k d θ i ui 2 Unit ball is the convex hull of { w R d : i J J k w 2 i b + i / J } wi 2 a 1 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
10 Unit Balls Figure: Unit balls of the box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Figure: Unit balls of the dual box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
11 Cluster Norm Box norm is an SGfunction, inducing the OInorm { d W 2 Θ = σ(w ) 2 Θ = inf σ i (W ) 2 : θ (a, b] d, θ i d } θ i c Associated OInorm has been used to favour task clustering [Jacob et al. 2008]. It can be written as } W 2 Θ {tr(w = inf Σ 1 W T ) : ai Σ bi, tr Σ c Includes spectral ksupport norm for a = 0, b = 1, c = k Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
12 Interpretation of a Proposition. If c = da + k(b a), the solution of the regularization problem is given by Ŵ = ˆV + Ẑ, where ( ˆV, Ẑ) = arg min V,Z n ( 1 (y i V + Z, X i ) 2 + λ a V 2 F + 1 ) b a Z 2 (k) Parameter a balances the relative importance of the two components Cluster norm is the Moureau envelope of spectral ksupport norm: { 1 W 2 Θ = a W Z 2 F + 1 } b a Z 2 (k) min Z R d m Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
13 Computation of the Θ norm Assume w.l.o.g. w 0 with non increasing components w 2 Θ = 1 b w [1:q] c qb la w [q+1:d l] a w [l+1:d] 2 2, where q, l {0,..., d} are uniquely determined In particular: w (k) = w [1:q] k q w [q+1:d] 2 1 d where q {0,..., k 1} is determined by w q 1 k q w j > w q+1 j=q+1 Computation of norm is O(d log(d)) For ksupport improves previous O(kd) method Efficient optimization using proximalgradient methods Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
14 Extensions/Open Problems Other sets Θ allow for exact prox, e.g. Θ = {θ 1... θ d > 0}. Can give a general characterization? Online learning / stochastic optimization Kernel extensions Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
GI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More informationSparse Prediction with the ksupport Norm
Sparse Prediction with the Support Norm Andreas Argyriou École Centrale Paris argyrioua@ecp.fr Rina Foygel Department of Statistics, Stanford University rinafb@stanford.edu Nathan Srebro Toyota Technological
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationA Stochastic 3MG Algorithm with Application to 2D Filter Identification
A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, JeanChristophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge  CNRS Univ. ParisEst,
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More informationWhen Is There a Representer Theorem? Vector Versus Matrix Regularizers
Journal of Machine Learning Research 10 (2009) 25072529 Submitted 9/08; Revised 3/09; Published 11/09 When Is There a Representer Theorem? Vector Versus Matrix Regularizers Andreas Argyriou Department
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLargeScale Similarity and Distance Metric Learning
LargeScale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
More informationBilinear Prediction Using LowRank Models
Bilinear Prediction Using LowRank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with CJ.
More informationMaximumMargin Matrix Factorization
MaximumMargin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationConvex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
More informationTail inequalities for order statistics of logconcave vectors and applications
Tail inequalities for order statistics of logconcave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.TomczakJaegermann Banff, May 2011 Basic
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationBig Data Techniques Applied to Very Shortterm Wind Power Forecasting
Big Data Techniques Applied to Very Shortterm Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with
More informationA FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
More informationSensitivity analysis of utility based prices and risktolerance wealth processes
Sensitivity analysis of utility based prices and risktolerance wealth processes Dmitry Kramkov, Carnegie Mellon University Based on a paper with Mihai Sirbu from Columbia University Math Finance Seminar,
More information16.3 Fredholm Operators
Lectures 16 and 17 16.3 Fredholm Operators A nice way to think about compact operators is to show that set of compact operators is the closure of the set of finite rank operator in operator norm. In this
More informationOn sequence kernels for SVM classification of sets of vectors: application to speaker verification
On sequence kernels for SVM classification of sets of vectors: application to speaker verification Major part of the Ph.D. work of In collaboration with Jérôme Louradour Francis Bach (ARMINES) within ETEAM
More informationGalaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
More information6.231 Dynamic Programming and Stochastic Control Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.231 Dynamic Programming and Stochastic Control Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.231
More informationNMR Measurement of T1T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More information10. Proximal point method
L. Vandenberghe EE236C Spring 201314) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101 Proximal point method a conceptual algorithm for minimizing
More informationSome representability and duality results for convex mixedinteger programs.
Some representability and duality results for convex mixedinteger programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationSparse modeling: some unifying theory and wordimaging
Sparse modeling: some unifying theory and wordimaging Bin Yu UC Berkeley Departments of Statistics, and EECS Based on joint work with: Sahand Negahban (UC Berkeley) Pradeep Ravikumar (UT Austin) Martin
More informationVariational approach to restore pointlike and curvelike singularities in imaging
Variational approach to restore pointlike and curvelike singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure BlancFéraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
More informationAdvanced Stochastic Solutions for Management of Uncertainty: Incorporating Storage and Scenario Generation
CERTS R&M Review Washington DC June 910, 2016 Advanced Stochastic Solutions for Management of Uncertainty: Incorporating Storage and Scenario Generation C. Lindsay Anderson Luckny Zephyr Laurie L. Tupper
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationSTORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)
STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing
More informationThe pnorm generalization of the LMS algorithm for adaptive filtering
The pnorm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
More informationELECE8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems
Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10725/36725
Duality in General Programs Ryan Tibshirani Convex Optimization 10725/36725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine GenonCatalot To cite this version: Fabienne
More informationScheduling and Location (ScheLoc): Makespan Problem with Variable Release Dates
Scheduling and Location (ScheLoc): Makespan Problem with Variable Release Dates Donatas Elvikis, Horst W. Hamacher, Marcel T. Kalsch Department of Mathematics, University of Kaiserslautern, Kaiserslautern,
More informationThree observations regarding Schatten p classes
Three observations regarding Schatten p classes Gideon Schechtman Abstract The paper contains three results, the common feature of which is that they deal with the Schatten p class. The first is a presentation
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationProjectionfree Online Learning
Elad Hazan Technion  Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center ehazan@ie.technion.ac.il sckale@us.ibm.com Abstract The computational bottleneck in applying online learning to massive
More informationOne side James Compactness Theorem
One side James Compactness Theorem 1 1 Department of Mathematics University of Murcia Topological Methods in Analysis and Optimization. On the occasion of the 70th birthday of Prof. Petar Kenderov A birthday
More informationCyberSecurity Analysis of State Estimators in Power Systems
CyberSecurity Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTHRoyal Institute
More informationDirect Convex Relaxations of Sparse SVM
Antoni B. Chan abchan@ucsd.edu Nuno Vasconcelos nuno@ece.ucsd.edu Gert R. G. Lanckriet gert@ece.ucsd.edu Department of Electrical and Computer Engineering, University of California, San Diego, CA, 9037,
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 234) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationTopological Data Analysis Applications to Computer Vision
Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures
More informationSynaptic Learning Rules
Synaptic Learning Rules Computational Models of Neural Systems Lecture 4.1 David S. Touretzky October, 2013 Why Study Synaptic Plasticity? Synaptic learning rules determine the information processing capabilities
More informationDantzigWolfe bound and DantzigWolfe cookbook
DantzigWolfe bound and DantzigWolfe cookbook thst@man.dtu.dk DTUManagement Technical University of Denmark 1 Outline LP strength of the DantzigWolfe The exercise from last week... The DantzigWolfe
More informationLearning, Regularization and IllPosed Inverse Problems
Learning, Regularization and IllPosed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De
More informationAnalyzing The Role Of Dimension Arrangement For Data Visualization in Radviz
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa FriasMartinez 2, and Enrique FriasMartinez 2 1 Department of Computer Science, Universita di Torino,
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationChapter 6. Cuboids. and. vol(conv(p ))
Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both
More informationThe Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationPart II Redundant Dictionaries and Pursuit Algorithms
Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries
More informationMechanisms for Fair Attribution
Mechanisms for Fair Attribution Eric Balkanski Yaron Singer Abstract We propose a new framework for optimization under fairness constraints. The problems we consider model procurement where the goal is
More informationStochastic Optimization for Big Data Analytics: Algorithms and Libraries
Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationUnderstanding Big Data Spectral Clustering
Understanding Big Data Spectral Clustering Romain Couillet, Florent BenaychGeorges To cite this version: Romain Couillet, Florent BenaychGeorges Understanding Big Data Spectral Clustering 205 IEEE 6th
More informationClassifying Chess Positions
Classifying Chess Positions Christopher De Sa December 14, 2012 Chess was one of the first problems studied by the AI community. While currently, chessplaying programs perform very well using primarily
More informationThe Many Facets of Big Data
Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong ACPR 2013 Big Data 1 volume sample size is big feature dimensionality is big 2 variety multiple formats:
More informationPrimalDual methods for sparse constrained matrix completion
Yu Xin MIT CSAIL Tommi Jaakkola MIT CSAIL Abstract We develop scalable algorithms for regular and nonnegative matrix completion. In particular, we base the methods on tracenorm regularization that induces
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationA PRIMALDUAL APPROACH TO NONPARAMETRIC PRODUCTIVITY ANALYSIS: THE CASE OF U.S. AGRICULTURE. JeanPaul Chavas and Thomas L. Cox *
Copyright 1994 by JeanPaul Chavas and homas L. Cox. All rights reserved. Readers may make verbatim copies of this document for noncommercial purposes by any means, provided that this copyright notice
More informationCalculation of Minimum Distances. Minimum Distance to Means. Σi i = 1
Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in ndimensional space and the algorithm calculates the distance between
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationIFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, 2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationMorphological Diversity and Sparsity for Multichannel Data Restoration
Morphological Diversity and Sparsity for Multichannel Data Restoration J.Bobin 1, Y.Moudden 1, J.Fadili and JL.Starck 1 1 jerome.bobin@cea.fr, ymoudden@cea.fr, jstarck@cea.fr  CEADAPNIA/SEDI, Service
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationDifferential Privacy Preserving Spectral Graph Analysis
Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on
More informationRobust and datadriven approaches to call centers
Robust and datadriven approaches to call centers Dimitris Bertsimas Xuan Vinh Doan November 2008 Abstract We propose both robust and datadriven approaches to a fluid model of call centers that incorporates
More informationLeveraging Big Data and Citizen Science to Understand Sub Continental Scale Ecological Patterns
Leveraging Big Data and Citizen Science to Understand Sub Continental Scale Ecological Patterns Noah R. Lottig University of Wisconsin Center for Limnology Roadmap 1. Approach to addressing subcontinental
More informationNotes for AA214, Chapter 7. T. H. Pulliam Stanford University
Notes for AA214, Chapter 7 T. H. Pulliam Stanford University 1 Stability of Linear Systems Stability will be defined in terms of ODE s and O E s ODE: Couples System O E : Matrix form from applying Eq.
More informationOptimization with SparsityInducing Penalties. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with SparsityInducing Penalties By
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d dimensional subspace Axes of this subspace
More informationSparse Online Learning via Truncated Gradient
Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahooinc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics
More informationAnalysis and Computation of Google s PageRank
Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills IMACS p.1 Overview Goal: Compute (citation) importance of a web page Simple Web
More informationCluster Algorithms. Adriano Cruz adriano@nce.ufrj.br. 28 de outubro de 2013
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 KMeans Adriano Cruz adriano@nce.ufrj.br
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationFunctional Principal Components Analysis with Survey Data
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 1921, 2008 Functional Principal Components Analysis with Survey Data Hervé CARDOT, Mohamed CHAOUCH ( ), Camelia GOGA
More informationIntersecting Families
Intersecting Families Extremal Combinatorics Philipp Zumstein 1 The ErdsKoRado theorem 2 Projective planes Maximal intersecting families 4 Hellytype result A familiy of sets is intersecting if any two
More informationUnivariate and Multivariate Methods PEARSON. Addison Wesley
Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationNo BI without Machine Learning
No BI without Machine Learning Francis Pieraut francis@qmining.com http://fraka6.blogspot.com/ 10 March 2011 MTI820 ETS Too Much Data Supervised Learning (classification) Unsupervised Learning (clustering)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More informationLecture 5 Leastsquares
EE263 Autumn 200708 Stephen Boyd Lecture 5 Leastsquares leastsquares (approximate) solution of overdetermined equations projection and orthogonality principle leastsquares estimation BLUE property
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More information