Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions
|
|
|
- Gabriel Lyons
- 10 years ago
- Views:
Transcription
1 Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh) Les Houches January 11, / 30
2 Lock-Free (Asynchronous) Updates Between the time when x is read by any given processor and an update is computed and applied to x by it, other processors apply their updates. Other processors x 6 x 5 + update(x 3 ) Update x Update x Update x Update x x 3 x 6 time x 2 x 4 x 5 x 7 Read current x (x 3 ) Compute update Write to current x (x 5 ) Viewpoint of a single processor 2 / 30
3 Generic Parallel Lock-Free Algorithm In general: x j+1 = x j + update(x r(j) ) r(j) = index of iterate current at reading time j = index of iterate current at writing time Assumption: j r(j) τ τ + 1 # processors 3 / 30
4 The Problem and Its Structure minimize x R V [f (x) e E f e (x)] (OPT ) Set of vertices/coordinates: V (x = (x v, v V ), dim x = V ) Set of edges: E 2 V Set of blocks: B (a collection of sets forming a partition of V ) Assumption: f e depends on x v, v e, only Example (convex f : R 5 R): f (x) = 7(x 1 + x 3 ) 2 + 5(x 2 x 3 + x 4 ) 2 + (x 4 x 5 ) 2 } {{ } } {{ } } {{ } f e1 (x) f e2 (x) f e3 (x) V = {1, 2, 3, 4, 5}, V = 5, e 1 = {1, 3}, e 2 = {2, 3, 4}, e 3 = {4, 5} 4 / 30
5 Applications structured stochastic optimization (via Sample Average Approximation) learning sparse least-squares sparse SVMs, matrix completion, graph cuts (see Niu-Recht-Ré-Wright (2011)) truss topology design optimal statistical designs 5 / 30
6 PART 1: LOCK-FREE HYBRID SGD/RCD METHODS based on: P. R. and M. Takáč, Lock-free randomized first order methods, manuscript, / 30
7 Problem-Specific Constants function definition average maximum Edge-Vertex Degree ω e = e = {v V : v e} ω ω (# vertices incident with an edge) (relevant if B = V ) Edge-Block Degree (# blocks incident with an edge) (relevant if B > 1) Vertex-Edge Degree (# edges incident with a vertex) (not needed!) Edge-Edge Degree (# edges incident with an edge) (relevant if E > 1) σ e = {b B : b e } σ σ δ v = {e E : v e} δ δ ρ e = {e E : e e } ρ ρ Remarks: Our results depend on: σ (avg Edge-Block degree) and ρ (avg Edge-Edge degree) First and second row are identical if B = V (blocks correspond to vertices/coordinates) 7 / 30
8 Example A = A T 1 A T 2 A T 3 A T 4 = f (x) = 1 2 Ax 2 2 = R4 3 4 (A T i x) 2, E = 4, V = 3 i=1 8 / 30
9 Example A = A T 1 A T 2 A T 3 A T 4 = f (x) = 1 2 Ax 2 2 = 1 2 Computation of ω and ρ: R4 3 4 (A T i x) 2, E = 4, V = 3 i=1 v 1 v 2 v 3 ω ei ρ ei e e e e δ vj ω = = 1.5, ρ = = 3 ω e = e, ρ e = {e E : e e }, δ v = {e E : v e} 8 / 30
10 Algorithm Iteration j + 1 looks as follows: x j+1 = x j γ E σ e b f e (x r(j) ) Viewpoint of the processor performing this iteration: Pick edge e E, uniformly at random Pick block b intersecting edge e, uniformly at random Read current x (enough to read x v for v e) Compute b f e (x) Apply update: x x α b f e (x) with α = γ E σ e and γ > 0 Do not wait (no synchronization!) and start again! Easy to show that E[ E σ e b f e (x)] = f (x) 9 / 30
11 Main Result Setup: c = strong convexity parameter of f L = Lipschitz constant of f f (x) 2 M for x visited by the method Starting point: x 0 R V 0 < ɛ < L 2 x 0 x 2 2 constant stepsize: γ := cɛ ( σ+2τ ρ/ E )L 2 M 2 10 / 30
12 Main Result Setup: c = strong convexity parameter of f L = Lipschitz constant of f f (x) 2 M for x visited by the method Starting point: x 0 R V 0 < ɛ < L 2 x 0 x 2 2 constant stepsize: γ := cɛ ( σ+2τ ρ/ E )L 2 M 2 Result: Under the above assumptions, for ( ) ( 2τ ρ LM 2 k σ + E c 2 ɛ log L x0 x 2 2 ɛ ) 1, we have min E{f (x j) f } ɛ. 0 j k 10 / 30
13 Special Cases General result: 2τ ρ ) LM 2 E ( σ + } {{ } Λ ( ) log 2L x0 x 2 1 c 2 ɛ ɛ } {{ } common to all special cases special case lock-free parallel version of... Λ E = 1 Randomized Block Coordinate Descent B + 2τ E B = 1 B = V Incremental Gradient Descent (Hogwild! as implemented) RAINCODE: RAndomized INcremental COordinate DEscent (Hogwild! as analyzed) 1 + ω + 2τ ρ E 2τ ρ E E = B = 1 Gradient Descent 1 + 2τ 11 / 30
14 Analysis via a New Recurrence Let a j = 1 2 E[ x j x 2 ] Nemirovski-Juditsky-Lan-Shapiro: a j+1 (1 2cγ j )a j γ2 j M 2 Niu-Recht-Ré-Wright (Hogwild!): R.-Takáč: a j+1 (1 cγ)a j + γ 2 ( 2cω Mτ(δ ) 1/2 )a 1/2 j γ2 M 2 Q, where Q = ω + 2τ ρ ρ E + 4ω E τ + 2τ 2 (ω ) 2 (δ ) 1/2 a j+1 (1 2cγ)a j γ2 ( σ + 2τ ρ E )M2 12 / 30
15 Parallelization Speedup Factor PSF = Λ of serial version (Λ of parallel version)/τ = σ ( σ + 2τ ρ E )/τ = 1 1 τ + 2 ρ σ E 13 / 30
16 Parallelization Speedup Factor PSF = Three modes: Λ of serial version (Λ of parallel version)/τ = σ ( σ + 2τ ρ E )/τ = 1 1 Brute force (many processors; τ very large): PSF σ E 2 ρ Favorable structure ( ρ σ E 1 τ ; fixed τ): PSF τ τ + 2 ρ σ E Special τ (τ = E ρ ): PSF = E ρ σ σ + 2 τ 13 / 30
17 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) 14 / 30
18 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) Niu-Recht-Ré-Wright (Hogwild!, 2011): R.-Takáč: Λ = 4ω + 24τ ρ E + 24τ 2 ω (δ ) 1/2 Λ = ω + 2τ ρ E 14 / 30
19 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) Niu-Recht-Ré-Wright (Hogwild!, 2011): R.-Takáč: Advantages of our approach: Λ = 4ω + 24τ ρ E + 24τ 2 ω (δ ) 1/2 Λ = ω + 2τ ρ E Dependence on averages and not maxima! (ω ω, ρ ρ) Better constants (4 1, 24 2) The third large term is not present (no dependence on τ 2 and δ ) Introduction of blocks ( cover also block coordinate descent, gradient descent, SGD) Simpler analysis 14 / 30
20 Modified Algorithm: Global Reads and Local Writes Partition vertices (coordinates) into τ + 1 blocks V = b 1 b 2 b τ+1 and assign block b i to processor i, i = 1, 2,..., τ + 1. Processor i will (asynchronously) do: Pick edge e {e E : e b i }, uniformly at random (edge intersecting with block owned by processor i) Update: Pros and cons: x j+1 = x j α bi f e (x r(j) ) + good if global reads and local writes are cheap, but global writes are expensive (NUMA = Non Uniform Memory Access) - do not have an analysis Idea proposed by Ben Recht. 15 / 30
21 Experiment 1: rcv size = 1.2 GB, features = V = 47,236, training: E = 677,399, testing: 20, CPU, Asyn. 1 CPU, Syn. 4 CPU, Asyn. 4 CPU, Syn. 16 CPU, Asyn. 16 CPU, Syn. Train Error Epoch 16 / 30
22 Experiment 2 Artificial problem instance: minimize f (x) = 1 2 Ax 2 = m i=1 1 2 (AT i x) 2. A R m n ; m = E = 500, 000; n = V = 50, 000 Three methods: Synchronous, all = parallel synchronous method with B = 1 Asynchronous, all = parallel asynchronous method with B = 1 Asynchronous, block = parallel asynchronous method with B = τ (no need for atomic operations additional speedup) We measure elapsed time needed to perform 20m iterations (20 epochs) 17 / 30
23 Uniform instance: e = 10 for all edges 18 / 30
24 PART 2: PARALLEL BLOCK COORDINATE DESCENT based on: P. R. and M. Takáč, Parallel coordinate descent methods for big data optimization, arxiv:1212:0873, / 30
25 Overview A rich family of synchronous parallel block coordinate descent methods 20 / 30
26 Overview A rich family of synchronous parallel block coordinate descent methods Theory and algorithms work for convex composite functions with block-separable regularizer: minimize: F (x) f e (x) +λ Ψ b (x). e E b B } {{ } } {{ } f Ψ Decomposition f = e E fe does not need to be known! f : convex or strongly convex (complexity for both) 20 / 30
27 Overview A rich family of synchronous parallel block coordinate descent methods Theory and algorithms work for convex composite functions with block-separable regularizer: minimize: F (x) f e (x) +λ Ψ b (x). e E b B } {{ } } {{ } f Ψ Decomposition f = e E fe does not need to be known! f : convex or strongly convex (complexity for both) All parameters for running the method according to theory are easy to compute: block Lipschitz constants L1,..., L B ω 20 / 30
28 ACDC: Lock-Free Parallel Coordinate Descent C++ code Can solve a LASSO problem with V = 10 9, E = , ω = 35, on a machine with τ = 24 processors, to ɛ = accuracy, in 2 hours, starting with initial gap / 30
29 Complexity Results First complexity analysis of parallel coordinate descent: P(F (x k ) F ɛ) 1 p Convex functions: k ( 2β α ) x 0 x 2 L ɛ log F (x 0) F ɛp Strongly convex functions (with parameters µ f and µ Ψ ): k Leading constants matter! β+µ Ψ log F (x 0) F α(µ f +µ Ψ ) ɛp 22 / 30
30 Parallelization Speedup Factors Closed-form formulas for parallelization speedup factors (PSFs): PSFs are functions of ω, τ and B, and depend on sampling Example 1: fully parallel sampling (all blocks are updated, i.e., τ = B ): PSF = B ω. Example 2: τ-nice sampling (all subsets of τ blocks are chosen with the same probability): PSF = τ 1 + (ω 1)(τ 1) B / 30
31 A Problem with Billion Variables LASSO problem: F (x) = 1 2 Ax b 2 + λ x 1 The instance: A has E = m = rows V = n = 10 9 columns (= # of variables) exactly 20 nonzeros in each column on average 10 and at most 35 nonzeros in each row (ω = 35) optimal solution x has 10 5 nonzeros λ = 1 Solver: Asynchronous parallel coordinate descent method with independent nice sampling and τ = 1, 8, 16 cores 24 / 30
32 # Coordinate Updates / n 25 / 30
33 # Iterations / n 26 / 30
34 Wall Time 27 / 30
35 Billion Variables 1 Core k/n F (x k ) F x k 0 time [hours] 0 < < ,016, < ,761, < ,858, < ,384, < ,550, < ,280, < ,256, < ,496, < ,781, < ,580, < ,353, < , < , < , < , < , < , < , < , < , / 30
36 Billion Variables 1, 8 and 16 Cores F (x k ) F Elapsed Time (k τ)/n 1 core 8 cores 16 cores 1 core 8 cores 16 cores e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e / 30
37 References P. R. and M. Takáč, Lock-free randomized first order methods, arxiv:1301:xxxx. P. R. and M. Takáč, Parallel coordinate descent methods for big data optimization, arxiv:1212:0873, P. R. and M. Takáč, Iteration complexity of block coordinate descent methods for minimizing a composite function, Mathematical Programming, Series A, P. R. and M. Takáč, Efficient serial and parallel coordinate descent methods for huge-scale truss topology design, Operations Research Proceedings, F. Niu, B. Recht, C. Ré, and S. Wright, Hogwild!: A lock-free approach to parallelizing stochastic gradient descent, NIPS A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Opt., (4): , M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent, NIPS Yu. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Opt. 22(2): , / 30
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Parallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
Scalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent Feng Niu, Benjamin Recht, Christopher Ré and Stephen J. Wright Computer Sciences Department, University of Wisconsin-Madison
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
MapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. [email protected] Scott Gilpin Google Inc. [email protected] Gideon Mann Google Inc. [email protected] Abstract With large data
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Accelerated Parallel Optimization Methods for Large Scale Machine Learning
Accelerated Parallel Optimization Methods for Large Scale Machine Learning Haipeng Luo Princeton University [email protected] Patrick Haffner and Jean-François Paiement AT&T Labs - Research {haffner,jpaiement}@research.att.com
Machine learning challenges for big data
Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012
Big learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher
Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher Outline 2 1. Retrospection 2. Stratosphere Plans 3. Comparison with Hadoop 4. Evaluation 5. Outlook Retrospection
Stochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine
Randomized Coordinate Descent Methods for Big Data Optimization
Randomized Coordinate Descent Methods for Big Data Optimization Doctoral thesis Martin Takáč Supervisor: Dr. Peter Richtárik Doctor of Philosophy The University of Edinburgh 2014 Randomized Coordinate
Beyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,
Stochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine
Machine learning and optimization for massive data
Machine learning and optimization for massive data Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Eric Moulines - IHES, May 2015 Big data revolution?
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2
4. Basic feasible solutions and vertices of polyhedra Due to the fundamental theorem of Linear Programming, to solve any LP it suffices to consider the vertices (finitely many) of the polyhedron P of the
STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)
STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing
Federated Optimization: Distributed Optimization Beyond the Datacenter
Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh [email protected] H. Brendan McMahan Google, Inc. Seattle, WA 98103
Large-Scale Machine Learning with Stochastic Gradient Descent
Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou NEC Labs America, Princeton NJ 08542, USA [email protected] Abstract. During the last decade, the data sizes have grown faster than
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
Warshall s Algorithm: Transitive Closure
CS 0 Theory of Algorithms / CS 68 Algorithms in Bioinformaticsi Dynamic Programming Part II. Warshall s Algorithm: Transitive Closure Computes the transitive closure of a relation (Alternatively: all paths
Distributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
Parallel Selective Algorithms for Nonconvex Big Data Optimization
1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,
Sketch As a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Convex Optimization for Big Data
Convex Optimization for Big Data Volkan Cevher, Stephen Becker, and Mark Schmidt September 2014 This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Big Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
SOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
L3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
Bilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
Weekly Sales Forecasting
Weekly Sales Forecasting! San Diego Data Science and R Users Group June 2014 Kevin Davenport! http://kldavenport.com [email protected] @KevinLDavenport Thank you to our sponsors: The competition
Principles and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Online Semi-Supervised Learning
Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu [email protected] Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised
Google Cloud Data Platform & Services. Gregor Hohpe
Google Cloud Data Platform & Services Gregor Hohpe All About Data We Have More of It Internet data more easily available Logs user & system behavior Cheap Storage keep more of it 3 Beyond just Relational
Distributed Architecture of Oracle Database In-memory
Distributed Architecture of Oracle Database In-memory Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar
Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.
Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.
Randomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Algorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
Stochastic Optimization for Big Data Analytics: Algorithms and Libraries
Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State
Graphical degree sequences and realizations
swap Graphical and realizations Péter L. Erdös Alfréd Rényi Institute of Mathematics Hungarian Academy of Sciences MAPCON 12 MPIPKS - Dresden, May 15, 2012 swap Graphical and realizations Péter L. Erdös
Big graphs for big data: parallel matching and clustering on billion-vertex graphs
1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS
COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS Alexander Burstein Department of Mathematics Howard University Washington, DC 259, USA [email protected] Sergey Kitaev Mathematics
It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU
It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary
A permutation can also be represented by describing its cycles. What do you suppose is meant by this?
Shuffling, Cycles, and Matrices Warm up problem. Eight people stand in a line. From left to right their positions are numbered,,,... 8. The eight people then change places according to THE RULE which directs
TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines
The problem: load balancing with bounded look-ahead Input: integer m 2: number of machines integer k 0: the look-ahead numbers t 1,..., t n : the job sizes Problem: assign jobs to machines machine to which
Convex Optimization for Big Data
Convex Optimization for Big Data Volkan Cevher, Stephen Becker, and Mark Schmidt September 2014 arxiv:1411.0972v1 [math.oc] 4 Nov 2014 This article reviews recent advances in convex optimization algorithms
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
A highly configurable and efficient simulator for job schedulers on supercomputers
Mitglied der Helmholtz-Gemeinschaft A highly configurable and efficient simulator for job schedulers on supercomputers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) Motivation Objective
Parallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
NETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA
A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Intelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
JBösen: A Java Bounded Asynchronous Key-Value Store for Big ML Yihua Fang
JBösen: A Java Bounded Asynchronous Key-Value Store for Big ML Yihua Fang CMU-CS-15-122 July 21st 2015 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Eric
Kommunikation in HPC-Clustern
Kommunikation in HPC-Clustern Communication/Computation Overlap in MPI W. Rehm and T. Höfler Department of Computer Science TU Chemnitz http://www.tu-chemnitz.de/informatik/ra 11.11.2005 Outline 1 2 Optimize
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Distributed R for Big Data
Distributed R for Big Data Indrajit Roy HP Vertica Development Team Abstract Distributed R simplifies large-scale analysis. It extends R. R is a single-threaded environment which limits its utility for
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
