Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Size: px
Start display at page:

Download "Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions"

Transcription

1 Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh) Les Houches January 11, / 30

2 Lock-Free (Asynchronous) Updates Between the time when x is read by any given processor and an update is computed and applied to x by it, other processors apply their updates. Other processors x 6 x 5 + update(x 3 ) Update x Update x Update x Update x x 3 x 6 time x 2 x 4 x 5 x 7 Read current x (x 3 ) Compute update Write to current x (x 5 ) Viewpoint of a single processor 2 / 30

3 Generic Parallel Lock-Free Algorithm In general: x j+1 = x j + update(x r(j) ) r(j) = index of iterate current at reading time j = index of iterate current at writing time Assumption: j r(j) τ τ + 1 # processors 3 / 30

4 The Problem and Its Structure minimize x R V [f (x) e E f e (x)] (OPT ) Set of vertices/coordinates: V (x = (x v, v V ), dim x = V ) Set of edges: E 2 V Set of blocks: B (a collection of sets forming a partition of V ) Assumption: f e depends on x v, v e, only Example (convex f : R 5 R): f (x) = 7(x 1 + x 3 ) 2 + 5(x 2 x 3 + x 4 ) 2 + (x 4 x 5 ) 2 } {{ } } {{ } } {{ } f e1 (x) f e2 (x) f e3 (x) V = {1, 2, 3, 4, 5}, V = 5, e 1 = {1, 3}, e 2 = {2, 3, 4}, e 3 = {4, 5} 4 / 30

5 Applications structured stochastic optimization (via Sample Average Approximation) learning sparse least-squares sparse SVMs, matrix completion, graph cuts (see Niu-Recht-Ré-Wright (2011)) truss topology design optimal statistical designs 5 / 30

6 PART 1: LOCK-FREE HYBRID SGD/RCD METHODS based on: P. R. and M. Takáč, Lock-free randomized first order methods, manuscript, / 30

7 Problem-Specific Constants function definition average maximum Edge-Vertex Degree ω e = e = {v V : v e} ω ω (# vertices incident with an edge) (relevant if B = V ) Edge-Block Degree (# blocks incident with an edge) (relevant if B > 1) Vertex-Edge Degree (# edges incident with a vertex) (not needed!) Edge-Edge Degree (# edges incident with an edge) (relevant if E > 1) σ e = {b B : b e } σ σ δ v = {e E : v e} δ δ ρ e = {e E : e e } ρ ρ Remarks: Our results depend on: σ (avg Edge-Block degree) and ρ (avg Edge-Edge degree) First and second row are identical if B = V (blocks correspond to vertices/coordinates) 7 / 30

8 Example A = A T 1 A T 2 A T 3 A T 4 = f (x) = 1 2 Ax 2 2 = R4 3 4 (A T i x) 2, E = 4, V = 3 i=1 8 / 30

9 Example A = A T 1 A T 2 A T 3 A T 4 = f (x) = 1 2 Ax 2 2 = 1 2 Computation of ω and ρ: R4 3 4 (A T i x) 2, E = 4, V = 3 i=1 v 1 v 2 v 3 ω ei ρ ei e e e e δ vj ω = = 1.5, ρ = = 3 ω e = e, ρ e = {e E : e e }, δ v = {e E : v e} 8 / 30

10 Algorithm Iteration j + 1 looks as follows: x j+1 = x j γ E σ e b f e (x r(j) ) Viewpoint of the processor performing this iteration: Pick edge e E, uniformly at random Pick block b intersecting edge e, uniformly at random Read current x (enough to read x v for v e) Compute b f e (x) Apply update: x x α b f e (x) with α = γ E σ e and γ > 0 Do not wait (no synchronization!) and start again! Easy to show that E[ E σ e b f e (x)] = f (x) 9 / 30

11 Main Result Setup: c = strong convexity parameter of f L = Lipschitz constant of f f (x) 2 M for x visited by the method Starting point: x 0 R V 0 < ɛ < L 2 x 0 x 2 2 constant stepsize: γ := cɛ ( σ+2τ ρ/ E )L 2 M 2 10 / 30

12 Main Result Setup: c = strong convexity parameter of f L = Lipschitz constant of f f (x) 2 M for x visited by the method Starting point: x 0 R V 0 < ɛ < L 2 x 0 x 2 2 constant stepsize: γ := cɛ ( σ+2τ ρ/ E )L 2 M 2 Result: Under the above assumptions, for ( ) ( 2τ ρ LM 2 k σ + E c 2 ɛ log L x0 x 2 2 ɛ ) 1, we have min E{f (x j) f } ɛ. 0 j k 10 / 30

13 Special Cases General result: 2τ ρ ) LM 2 E ( σ + } {{ } Λ ( ) log 2L x0 x 2 1 c 2 ɛ ɛ } {{ } common to all special cases special case lock-free parallel version of... Λ E = 1 Randomized Block Coordinate Descent B + 2τ E B = 1 B = V Incremental Gradient Descent (Hogwild! as implemented) RAINCODE: RAndomized INcremental COordinate DEscent (Hogwild! as analyzed) 1 + ω + 2τ ρ E 2τ ρ E E = B = 1 Gradient Descent 1 + 2τ 11 / 30

14 Analysis via a New Recurrence Let a j = 1 2 E[ x j x 2 ] Nemirovski-Juditsky-Lan-Shapiro: a j+1 (1 2cγ j )a j γ2 j M 2 Niu-Recht-Ré-Wright (Hogwild!): R.-Takáč: a j+1 (1 cγ)a j + γ 2 ( 2cω Mτ(δ ) 1/2 )a 1/2 j γ2 M 2 Q, where Q = ω + 2τ ρ ρ E + 4ω E τ + 2τ 2 (ω ) 2 (δ ) 1/2 a j+1 (1 2cγ)a j γ2 ( σ + 2τ ρ E )M2 12 / 30

15 Parallelization Speedup Factor PSF = Λ of serial version (Λ of parallel version)/τ = σ ( σ + 2τ ρ E )/τ = 1 1 τ + 2 ρ σ E 13 / 30

16 Parallelization Speedup Factor PSF = Three modes: Λ of serial version (Λ of parallel version)/τ = σ ( σ + 2τ ρ E )/τ = 1 1 Brute force (many processors; τ very large): PSF σ E 2 ρ Favorable structure ( ρ σ E 1 τ ; fixed τ): PSF τ τ + 2 ρ σ E Special τ (τ = E ρ ): PSF = E ρ σ σ + 2 τ 13 / 30

17 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) 14 / 30

18 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) Niu-Recht-Ré-Wright (Hogwild!, 2011): R.-Takáč: Λ = 4ω + 24τ ρ E + 24τ 2 ω (δ ) 1/2 Λ = ω + 2τ ρ E 14 / 30

19 Improvements vs Hogwild! If B = V (blocks = coordinates), then our method coincides with Hogwild! (as analyzed in Niu et al), up to stepsize choice: x j+1 = x j γ E ω e v f e (x r(j) ) Niu-Recht-Ré-Wright (Hogwild!, 2011): R.-Takáč: Advantages of our approach: Λ = 4ω + 24τ ρ E + 24τ 2 ω (δ ) 1/2 Λ = ω + 2τ ρ E Dependence on averages and not maxima! (ω ω, ρ ρ) Better constants (4 1, 24 2) The third large term is not present (no dependence on τ 2 and δ ) Introduction of blocks ( cover also block coordinate descent, gradient descent, SGD) Simpler analysis 14 / 30

20 Modified Algorithm: Global Reads and Local Writes Partition vertices (coordinates) into τ + 1 blocks V = b 1 b 2 b τ+1 and assign block b i to processor i, i = 1, 2,..., τ + 1. Processor i will (asynchronously) do: Pick edge e {e E : e b i }, uniformly at random (edge intersecting with block owned by processor i) Update: Pros and cons: x j+1 = x j α bi f e (x r(j) ) + good if global reads and local writes are cheap, but global writes are expensive (NUMA = Non Uniform Memory Access) - do not have an analysis Idea proposed by Ben Recht. 15 / 30

21 Experiment 1: rcv size = 1.2 GB, features = V = 47,236, training: E = 677,399, testing: 20, CPU, Asyn. 1 CPU, Syn. 4 CPU, Asyn. 4 CPU, Syn. 16 CPU, Asyn. 16 CPU, Syn. Train Error Epoch 16 / 30

22 Experiment 2 Artificial problem instance: minimize f (x) = 1 2 Ax 2 = m i=1 1 2 (AT i x) 2. A R m n ; m = E = 500, 000; n = V = 50, 000 Three methods: Synchronous, all = parallel synchronous method with B = 1 Asynchronous, all = parallel asynchronous method with B = 1 Asynchronous, block = parallel asynchronous method with B = τ (no need for atomic operations additional speedup) We measure elapsed time needed to perform 20m iterations (20 epochs) 17 / 30

23 Uniform instance: e = 10 for all edges 18 / 30

24 PART 2: PARALLEL BLOCK COORDINATE DESCENT based on: P. R. and M. Takáč, Parallel coordinate descent methods for big data optimization, arxiv:1212:0873, / 30

25 Overview A rich family of synchronous parallel block coordinate descent methods 20 / 30

26 Overview A rich family of synchronous parallel block coordinate descent methods Theory and algorithms work for convex composite functions with block-separable regularizer: minimize: F (x) f e (x) +λ Ψ b (x). e E b B } {{ } } {{ } f Ψ Decomposition f = e E fe does not need to be known! f : convex or strongly convex (complexity for both) 20 / 30

27 Overview A rich family of synchronous parallel block coordinate descent methods Theory and algorithms work for convex composite functions with block-separable regularizer: minimize: F (x) f e (x) +λ Ψ b (x). e E b B } {{ } } {{ } f Ψ Decomposition f = e E fe does not need to be known! f : convex or strongly convex (complexity for both) All parameters for running the method according to theory are easy to compute: block Lipschitz constants L1,..., L B ω 20 / 30

28 ACDC: Lock-Free Parallel Coordinate Descent C++ code Can solve a LASSO problem with V = 10 9, E = , ω = 35, on a machine with τ = 24 processors, to ɛ = accuracy, in 2 hours, starting with initial gap / 30

29 Complexity Results First complexity analysis of parallel coordinate descent: P(F (x k ) F ɛ) 1 p Convex functions: k ( 2β α ) x 0 x 2 L ɛ log F (x 0) F ɛp Strongly convex functions (with parameters µ f and µ Ψ ): k Leading constants matter! β+µ Ψ log F (x 0) F α(µ f +µ Ψ ) ɛp 22 / 30

30 Parallelization Speedup Factors Closed-form formulas for parallelization speedup factors (PSFs): PSFs are functions of ω, τ and B, and depend on sampling Example 1: fully parallel sampling (all blocks are updated, i.e., τ = B ): PSF = B ω. Example 2: τ-nice sampling (all subsets of τ blocks are chosen with the same probability): PSF = τ 1 + (ω 1)(τ 1) B / 30

31 A Problem with Billion Variables LASSO problem: F (x) = 1 2 Ax b 2 + λ x 1 The instance: A has E = m = rows V = n = 10 9 columns (= # of variables) exactly 20 nonzeros in each column on average 10 and at most 35 nonzeros in each row (ω = 35) optimal solution x has 10 5 nonzeros λ = 1 Solver: Asynchronous parallel coordinate descent method with independent nice sampling and τ = 1, 8, 16 cores 24 / 30

32 # Coordinate Updates / n 25 / 30

33 # Iterations / n 26 / 30

34 Wall Time 27 / 30

35 Billion Variables 1 Core k/n F (x k ) F x k 0 time [hours] 0 < < ,016, < ,761, < ,858, < ,384, < ,550, < ,280, < ,256, < ,496, < ,781, < ,580, < ,353, < , < , < , < , < , < , < , < , < , / 30

36 Billion Variables 1, 8 and 16 Cores F (x k ) F Elapsed Time (k τ)/n 1 core 8 cores 16 cores 1 core 8 cores 16 cores e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e / 30

37 References P. R. and M. Takáč, Lock-free randomized first order methods, arxiv:1301:xxxx. P. R. and M. Takáč, Parallel coordinate descent methods for big data optimization, arxiv:1212:0873, P. R. and M. Takáč, Iteration complexity of block coordinate descent methods for minimizing a composite function, Mathematical Programming, Series A, P. R. and M. Takáč, Efficient serial and parallel coordinate descent methods for huge-scale truss topology design, Operations Research Proceedings, F. Niu, B. Recht, C. Ré, and S. Wright, Hogwild!: A lock-free approach to parallelizing stochastic gradient descent, NIPS A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Opt., (4): , M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent, NIPS Yu. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Opt. 22(2): , / 30

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Parallel & Distributed Optimization. Based on Mark Schmidt s slides Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent Feng Niu, Benjamin Recht, Christopher Ré and Stephen J. Wright Computer Sciences Department, University of Wisconsin-Madison

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

MapReduce/Bigtable for Distributed Optimization

MapReduce/Bigtable for Distributed Optimization MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. [email protected] Scott Gilpin Google Inc. [email protected] Gideon Mann Google Inc. [email protected] Abstract With large data

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Accelerated Parallel Optimization Methods for Large Scale Machine Learning

Accelerated Parallel Optimization Methods for Large Scale Machine Learning Accelerated Parallel Optimization Methods for Large Scale Machine Learning Haipeng Luo Princeton University [email protected] Patrick Haffner and Jean-François Paiement AT&T Labs - Research {haffner,jpaiement}@research.att.com

More information

Machine learning challenges for big data

Machine learning challenges for big data Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher Outline 2 1. Retrospection 2. Stratosphere Plans 3. Comparison with Hadoop 4. Evaluation 5. Outlook Retrospection

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine

More information

Randomized Coordinate Descent Methods for Big Data Optimization

Randomized Coordinate Descent Methods for Big Data Optimization Randomized Coordinate Descent Methods for Big Data Optimization Doctoral thesis Martin Takáč Supervisor: Dr. Peter Richtárik Doctor of Philosophy The University of Edinburgh 2014 Randomized Coordinate

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine

More information

Machine learning and optimization for massive data

Machine learning and optimization for massive data Machine learning and optimization for massive data Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Eric Moulines - IHES, May 2015 Big data revolution?

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

More information

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2 4. Basic feasible solutions and vertices of polyhedra Due to the fundamental theorem of Linear Programming, to solve any LP it suffices to consider the vertices (finitely many) of the polyhedron P of the

More information

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly) STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing

More information

Federated Optimization: Distributed Optimization Beyond the Datacenter

Federated Optimization: Distributed Optimization Beyond the Datacenter Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh [email protected] H. Brendan McMahan Google, Inc. Seattle, WA 98103

More information

Large-Scale Machine Learning with Stochastic Gradient Descent

Large-Scale Machine Learning with Stochastic Gradient Descent Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou NEC Labs America, Princeton NJ 08542, USA [email protected] Abstract. During the last decade, the data sizes have grown faster than

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Warshall s Algorithm: Transitive Closure

Warshall s Algorithm: Transitive Closure CS 0 Theory of Algorithms / CS 68 Algorithms in Bioinformaticsi Dynamic Programming Part II. Warshall s Algorithm: Transitive Closure Computes the transitive closure of a relation (Alternatively: all paths

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

Parallel Selective Algorithms for Nonconvex Big Data Optimization

Parallel Selective Algorithms for Nonconvex Big Data Optimization 1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],

More information

Convex Optimization for Big Data

Convex Optimization for Big Data Convex Optimization for Big Data Volkan Cevher, Stephen Becker, and Mark Schmidt September 2014 This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Weekly Sales Forecasting

Weekly Sales Forecasting Weekly Sales Forecasting! San Diego Data Science and R Users Group June 2014 Kevin Davenport! http://kldavenport.com [email protected] @KevinLDavenport Thank you to our sponsors: The competition

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu [email protected] Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Google Cloud Data Platform & Services. Gregor Hohpe

Google Cloud Data Platform & Services. Gregor Hohpe Google Cloud Data Platform & Services Gregor Hohpe All About Data We Have More of It Internet data more easily available Logs user & system behavior Cheap Storage keep more of it 3 Beyond just Relational

More information

Distributed Architecture of Oracle Database In-memory

Distributed Architecture of Oracle Database In-memory Distributed Architecture of Oracle Database In-memory Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar

More information

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued. Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

More information

Randomized Robust Linear Regression for big data applications

Randomized Robust Linear Regression for big data applications Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State

More information

Graphical degree sequences and realizations

Graphical degree sequences and realizations swap Graphical and realizations Péter L. Erdös Alfréd Rényi Institute of Mathematics Hungarian Academy of Sciences MAPCON 12 MPIPKS - Dresden, May 15, 2012 swap Graphical and realizations Péter L. Erdös

More information

Big graphs for big data: parallel matching and clustering on billion-vertex graphs

Big graphs for big data: parallel matching and clustering on billion-vertex graphs 1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan

More information

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS Alexander Burstein Department of Mathematics Howard University Washington, DC 259, USA [email protected] Sergey Kitaev Mathematics

More information

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary

More information

A permutation can also be represented by describing its cycles. What do you suppose is meant by this?

A permutation can also be represented by describing its cycles. What do you suppose is meant by this? Shuffling, Cycles, and Matrices Warm up problem. Eight people stand in a line. From left to right their positions are numbered,,,... 8. The eight people then change places according to THE RULE which directs

More information

TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines

TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines The problem: load balancing with bounded look-ahead Input: integer m 2: number of machines integer k 0: the look-ahead numbers t 1,..., t n : the job sizes Problem: assign jobs to machines machine to which

More information

Convex Optimization for Big Data

Convex Optimization for Big Data Convex Optimization for Big Data Volkan Cevher, Stephen Becker, and Mark Schmidt September 2014 arxiv:1411.0972v1 [math.oc] 4 Nov 2014 This article reviews recent advances in convex optimization algorithms

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

A highly configurable and efficient simulator for job schedulers on supercomputers

A highly configurable and efficient simulator for job schedulers on supercomputers Mitglied der Helmholtz-Gemeinschaft A highly configurable and efficient simulator for job schedulers on supercomputers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) Motivation Objective

More information

Parallel Algorithm for Dense Matrix Multiplication

Parallel Algorithm for Dense Matrix Multiplication Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

JBösen: A Java Bounded Asynchronous Key-Value Store for Big ML Yihua Fang

JBösen: A Java Bounded Asynchronous Key-Value Store for Big ML Yihua Fang JBösen: A Java Bounded Asynchronous Key-Value Store for Big ML Yihua Fang CMU-CS-15-122 July 21st 2015 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Eric

More information

Kommunikation in HPC-Clustern

Kommunikation in HPC-Clustern Kommunikation in HPC-Clustern Communication/Computation Overlap in MPI W. Rehm and T. Höfler Department of Computer Science TU Chemnitz http://www.tu-chemnitz.de/informatik/ra 11.11.2005 Outline 1 2 Optimize

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Distributed R for Big Data

Distributed R for Big Data Distributed R for Big Data Indrajit Roy HP Vertica Development Team Abstract Distributed R simplifies large-scale analysis. It extends R. R is a single-threaded environment which limits its utility for

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing. Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,

More information