Distributed Structured Prediction for Big Data


 Asher Dawson
 2 years ago
 Views:
Transcription
1 Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning structured predictors from big data are the computation time and the memory demands. In this paper, we propose to handle those big data problems efficiently by distributing and parallelizing the resource reuirements. We present a distributed structured prediction learning algorithm for large scale models that cannot be effectively handled by a single cluster node. Importantly, convergence and optimality guarantees of recently developed algorithms are preserved while keeping between node communication low. Introduction In the past few years, structured models have become an important tool in domains such as natural language processing, computer vision and computational biology. The growing variability within data sets, reuires an increasing ressiveness that is achieved by modeling the influence of more and more variables. Hence memory and computational limits of desktop computers are reached uickly. In computer vision, for example, uncompressed full HD video streams produce 0 megabytes of data per second. Several structured prediction frameworks have been developed in the past. Notable examples are Conditional Random Fields (CRFs) [2], structured support vector machines (SSVMs) [7, 8] and their generalizations []. All three frameworks aim at minimizing a regularized surrogate loss. While CRFs and SSVMs are the method of choice for treestructured or submodular models, approximations, e.g., [] are in general reuired. Note that all three approaches are inherently parallel in the training data. But none of the aforementioned frameworks address the underlying memory limitations of large scale models arising from realworld problems. This is important since nowadays big data tasks of increasing volume, variety and velocity call for large models. Hence we are interested in making structured prediction algorithms practical for large scale scenarios. We present an algorithm which distributes and parallelizes the computation and memory reuirements while reducing communication between cluster nodes and conserving convergence and optimality guarantees. Our approach is based on the principle of dual decomposition, i.e., computation is done in parallel by partitioning the model and imposing agreement on independent variables that are reuired to be consistent. Thus, we split the graphbased optimization program into several local optimization problems solved in parallel, and cluster nodes exchange information occasionally to enforce consistency. 2 A Review on Structured Prediction Let us first consider a setting where X denotes the input space (e.g., a video or a document) and S is a structured label space (e.g., a video segmentation or a set of parse trees). Further, let φ : X S R F denote a mapping from the input and label space to an F dimensional feature space. When using structured prediction approaches, we are commonly interested in finding the parameters w R F of a loglinear model p w (s x) ( w φ(x, s)/ɛ ) with covariance ɛ, which best describes the possible labeling s S of x X. For training, we are given a data set D = {(x i, s i ) N i= } containing N pairs, each composed by an input space object x X and a label space object s S. In order to find the model parameters w
2 that best describe the annotations, we are often able to construct a task loss l (x,s) (ŝ) which measures the fitness of any labeling ŝ S. The vector v = (x,s) D φ(x, s) denotes the empirical mean and we commonly assume independent and identically distributed data in addition to a prior p(w) ( w p p). During learning we minimize the negative lossaugmented datalogposterior, i.e., min ɛ ln ( ) l(x,s) (ŝ) + w φ(x, ŝ) v w + C w ɛ p w p p. () (x,s) D ŝ S Note that the covariance ɛ = recovers the CRF objective [2] while ɛ 0 smoothly approximates the maxfunction, hence recovering the SSVM formulation [7, 8]. Due to the sum over all label space configurations ŝ S being generally onential in size, the unconstrained minimization problem given in E. () is NPhard in general. Elements φ r of the feature vector φ often describe interactions between subsets of random variables, i.e., φ r (x, s) = i V r,x φ r,i (x, s i ) + α E r,x φ r,α (x, s α ). Note that a labeling s = (s i ) i V S is a tuple subsuming V variables, each having S i discrete states. The sparse interactions induced by the feature functions φ r (x, s) are visually depicted by a factor graph G r,x with the individual variables i V r,x of sample (x, s) being vertices that are connected to factors α E r,x iff vertex i is a neighbor of factor α E r,x. The union graph G x = r G r,x describes the relationship over all features r and we say that vertex i V x = r V r,x is a neighbor to factor α E x = r E r,x if variable s i is part of the variable set s α in any of the features of sample (x, s), i.e., i N(α). Conversely, all factors that variable i participates in are referred to by α N(i). Approximations [] are one way to deal with the previously outlined intractability. The dual to the program given in E. () is described by means of joint distributions ranging, for each data sample (x, s), over the label space S. We describe this probability by its variable and factor marginals b (x,s),i (s i ) and b (x,s),α (s α ) and approximate the entropies of those joint distributions by its marginal entropies H(b (x,s),i ) and H(b (x,s),α ) using chosen counting numbers c i and c α for better approximation accuracy. To ensure consistency, we reuire the beliefs to fulfill marginalization constraints corresponding to the structure of the graph G x while maximizing the approximated dual cost function ɛc i H(b (x,s),i )+ ɛc α H(b (x,s),α )+ b (x,s),i (ŝ i )l (x,s),i (ŝ i )+ b (x,s),α (ŝ α )l (x,s),α (ŝ α ) (x,s) i α i,ŝ i α,ŝ α C b (x,s),i (ŝ i )φ r,i (x, ŝ i ) + b (x,s),α (ŝ α )φ r,α (x, ŝ α ) v r, (2) r (x,s),i V r,x,ŝ i (x,s),α E r,x,ŝ α with /p + / =. The sum ranging over the training samples being the first term in both the original primal (E. ()) and the approximated dual (E. (2)) suggests that computation of the gradient is inherently parallel in the data set elements. With realworld models G x often being too large for the resources provided by a single cluster node we next discuss a possibility to partition the optimization task while preserving the original convergence properties. 3 Distributed Structured Prediction To cope with current model size needs we are interested in an algorithm to maximize E. (2) while leveraging the sparsity given by the graph structure G x. In addition, we partition the vertices of the model such that each of the distributed cluster nodes solves an independent program defined on a subgraph induced by the variables of each partition (Fig. (a)). To ensure consistency for the global model, the distributed solutions are combined by exchanging information between connected subgraphs. The distributed structured prediction algorithm extends existing frameworks by introducing a highlevel factor graph (Fig. (b)) describing the cluster node interactions. Occasional exchange of information corresponds to messages being sent on this factor graph. It is important to note that we do not reuire an exchange of information at every iteration. More concretely, let P x be a partition of all the vertices i V x for sample (x, s) into disjunct subsets n x P x each containing the variables i n x that are assigned to the cluster node n x. The vertices assigned to node n x P x induce a subgraph G x,nx. As before, this subgraph describes the 2
3 (x, s ) (x 2, s 2 ) Iterations (a) (b) (c) (d) Figure : (a): 2 samples each distributed on 2 cluster nodes (color). (b): The cluster node factor graph for consistency messages. (c),(d): Convergence of the inference task w.r.t. iterations and time. marginalization constraints reuired to be enforced on cluster node n x for its assigned variable beliefs (x,s),i (ŝ i) (x, s), i n x, ŝ i and the factor beliefs (x,s),α (ŝ α) (x, s), i n x, α N(i), ŝ α, i.e., ŝ α\ŝ i (x,s),α (ŝ α) = (x,s),i (ŝ i). A factor α that is assigned to multiple subgraphs G x,nx, corresponds to a set of beliefs (x,s),α each of them optimized independently on the cluster nodes n x N Px (α). Since these distributed beliefs originate from a single b (x,s),α in E. (2) we are reuired to ensure consistency. Formally, we construct a factor graph G Px with cluster nodes n x being the vertices that are connected to shared factors α iff n x N Px (α). Conversely, we denote by N Px (n x ) all factors α that are shared between multiple nodes, one of them being n x. To keep the shared beliefs consistent, we add the constraints (x,s),α (ŝ α) = b (x,s),α (ŝ α ) (x, s), α, n x N Px (α), ŝ α. To ensure optimization of the cost function given in E. (2), we further need to balance the entropy H(b (x,s),α ), the loss l (x,s),α and the features φ r,α for those factors α, distributed onto different cluster nodes. To this end, we let ĉ α = c α / N Px (α), ˆl (x,s),α = l (x,s),α / N Px (α) and ˆφ (x,s),α = φ (x,s),α / N Px (α) for all shared factors. For the remaining factors the variables augmented by the hat symbol ˆ correspond to the original variables. Conseuently, we obtain the following maximization, euivalent to E. (2): (x,s),n x P x ɛc i H( (x,s),i ) + i G x,nx α G x,nx,ŝ α Dual Energy 4.84 x ɛĉ α H( (x,s),α ) + α G x,nx Dual Energy 4.84 x Time [s] i G x,nx,ŝ i (x,s),i (ŝ i)l (x,s),i (ŝ i )+ (x,s),α (ŝ α)ˆl (x,s),α (ŝ α ) C z v, (3) with marginalization constraints ŝ α\ŝ i (x,s),α (ŝ α) = (x,s),i (ŝ i) (x, s), n x, i, ŝ i, α N(i), consistency constraints (x,s),α (ŝ α) = b (x,s),α (ŝ α ) (x, s), n x, α N P(x,s) (n x), ŝ α and variable z r = (x,s),n x,i,ŝ i (x,s),i (ŝ i)φ r,i (x, ŝ i ) + (x,s),s,α,ŝ α (x,s),α (ŝ α) ˆφ r,α (x, ŝ α ) r = {,..., F }. We would like to utilize the structure of the graph to obtain memory efficient and fast algorithms. Since the structure is employed to ress the marginalization constraints, the dual program of E. (3), with its Lagrange multipliers λ (x,s),i α (ŝ i ) corresponding to the marginalization constraints and ν (x,s),nx α(ŝ α ) originating from the consistency constraints between different cluster nodes is our preferred task. The dual program to E. (3) is given by the following claim. Claim. Set ν (x,s),nx α = 0 for every α G Px and enforce n x N P(x,s) (α) ν (x,s),n x α(ŝ α ) = 0 (x, s), α, ŝ α. With ˆφ (x,s),i (ŝ i ) = l (x,s),i (ŝ i ) + r:i V r,x,nx w r φ r,i (x, ŝ i ) and ˆφ (x,s),α (ŝ α ) = ˆl (x,s),α (ŝ α )+ r:α E r,x,nx w r ˆφr,α (x, ŝ α ) the dual program of the approximated structured prediction dual in E. (3) reads as g = ɛc i ln ( ˆφ(x,s),i (ŝ i ) α N(i) λ ) (x,s),i α(ŝ i ) v w + C ɛc i p w p p + (x,s),n x,i G x,nx ŝ i ɛĉ α ln ( ˆφ(x,s),α (ŝ α ) + i N(α) s λ ) (x,s),i α(ŝ i ) + ν (x,s),nx α(ŝ α ). (4) ɛĉ α ŝ α (x,s),n x,α G x,nx Proof: Follows [, ]. Looking at the distributed approximated primal given in E. (4) more closely, we note that both terms involving the two types of Lagrange multipliers are now preceded by sums ranging over the samples as well as the compute nodes n x
4 To derive an efficient algorithm we perform blockcoordinate descent on this approximated primal. Fixing the consistency messages ν (x,s),nx α(ŝ α ), the optimal λ (x,s),i α (ŝ i ) is computed i G x,nx without considering current information from other cluster nodes. A status update in form of consistency messages ν (x,s),nx α(ŝ α ) is analytically computed by synchronizing messages between the different machines. The ArmijoIterations performed to optimize w r reuire computation of the beliefs as well as the primal cost function value, which is done on the distributed nodes before another synchronization. The resulting blockcoordinate descent and gradient steps are given by the following claim. Claim 2. With µ (x,s),α i (ŝ i ) = ɛĉ α ln ŝ α\ŝ i (( ˆφ (x,s),α (ŝ α ) + j N(α) s\i λ (x,s),j α(ŝ j ) + ν (x,s),nx α(ŝ α ))/(ɛĉ α )) the gradient steps in λ, ν and the gradient in w r are: λ (x,s),i α (ŝ i ) ĉ α c i + ˆφ(x,s),i (ŝ i ) + α N(i) ĉα µ (x,s),β i (ŝ i ) µ (x,s),α i (ŝ i ), ν (x,s),nx α(ŝ α ) g w r = N P(x,s) (α) (x,s),n x,i,ŝ i Proof: Follows [, ]. i N(α) λ (x,s),i α (ŝ i ) (x,s),i (ŝ i)φ r,i (x, ŝ i ) + (x,s),n x,α,ŝ α β N(i) i N(α) s λ (x,s),i α (ŝ i ), (x,s),α ˆφ r,α (x, ŝ α ) v r + C w r p sgn(w r ). Since the order of the blockcoordinate descent steps does not impact convergence guarantees, we iteratively update the λ messages within a cluster node and the model parameters w r, before exchanging information between machines in form of consistency messages. Note, that updating model parameters reuires cluster nodes to only exchange numbers, while the size of the consistency messages depends on the size of the shared factors being commonly larger than a single real value. 4 Related Work and Discussion Data parallel frameworks, like MapReduce, simplify implementation of largescale data processing but do not naturally support development of efficient learning algorithms. One of the most notable publicly available engines working towards efficient distributed algorithms is GraphLAB which, originally supporting only sharedmemory environments [3], was recently extended to distributed environments [4]. However, minimization of communication overhead between cluster nodes is not considered, which potentially reduces computational performance. Our recent work on a parallel inference tasks that licitly minimizes the communication overhead was presented in []. Fig. (c) and Fig. (d) from [] show the convergence of an inference task w.r.t. iterations and time when communicating between machines every,,0,...,00 iterations. Although convergence in terms of iterations is best when transmitting information freuently, communication overhead reduces wallclock performance when exchanging variables often. The drop in performance depends on the graph connectivity and cliue size (e.g., a common pairwise 4connected grid in our case) and the cluster infrastructure (LAN or InfiniBand connection). Since learning involves inference a similar time dependence is ected. Conclusion We have presented a distributed structured prediction algorithm that is able to process models that exceed the resource restrictions of a single cluster node. Our approach divides computation and memory reuirements onto multiple machines while convergence and optimality guarantees are preserved by introducing a new type of consistency message. Our algorithm benefits particularly from the availability of multiple cluster nodes but it is also useful on a single machine since we derive licit rules for swapping parts of the model between memory and hard disk. Extensions towards latent variable models [6] and towards automatically finding an effective partitioning of graphical models are subject to future research. 4
5 References [] T. Hazan and R. Urtasun. A PrimalDual MessagePassing Algorithm for Approximated Large Scale Structured Prediction. In Proc. NIPS, 200. [2] J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Seuence Data. In Proc. ICML, 200. [3] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A New Parallel Framework for Machine Learning. In Proc. UAI, 200. [4] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. In Proc. Very Large Data Bases, 202. [] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Distributed MessagePassing for LargeScale Graphical Models. In Proc. CVPR, 20. [6] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient Structured Prediction with Latent Variables for General Graphical Models. In Proc. ICML, 202. [7] B. Taskar, C. Guestrin, and D. Koller. MaxMargin Markov Networks. In Proc. NIPS, [8] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. In Proc. ICML, 2004.
Probabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTIChicago mcallester@ttic.edu Tamir Hazan TTIChicago tamir@ttic.edu Joseph Keshet TTIChicago jkeshet@ttic.edu Abstract In discriminative
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationProgramming Tools based on Big Data and Conditional Random Fields
Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meetup,
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationMapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data
More informationJournal of Machine Learning Research 1 (2013) 11 Submitted 8/13; Published 10/13
Journal of Machine Learning Research 1 (2013) 11 Submitted 8/13; Published 10/13 PyStruct  Learning Structured Prediction in Python Andreas C. Müller Sven Behnke Institute of Computer Science, Department
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationSemantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models LeMinh Nguyen, Akira Shimazu, and XuanHieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 11, Nomi, Ishikawa,
More informationParallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
More informationA Learning Based Method for SuperResolution of Low Resolution Images
A Learning Based Method for SuperResolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationScheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationTraining Conditional Random Fields using Virtual Evidence Boosting
Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao Tanzeem Choudhury Dieter Fox Henry Kautz University of Washington Intel Research Department of Computer Science & Engineering
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationIntroduction to Segmentation
Lecture 2: Introduction to Segmentation Jonathan Krause 1 Goal Goal: Identify groups of pixels that go together image credit: Steve Seitz, Kristen Grauman 2 Types of Segmentation Semantic Segmentation:
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
More informationMultiRelational Record Linkage
MultiRelational Record Linkage Parag and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195, U.S.A. {parag,pedrod}@cs.washington.edu http://www.cs.washington.edu/homes/{parag,pedrod}
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationStructured Learning and Prediction in Computer Vision. Contents
Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationTracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESCID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
More informationParallel Programming MapReduce. Needless to Say, We Need Machine Learning for Big Data
Case Study 2: Document Retrieval Parallel Programming MapReduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin
More informationAn Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationY. Xiang, Constraint Satisfaction Problems
Constraint Satisfaction Problems Objectives Constraint satisfaction problems Backtracking Iterative improvement Constraint propagation Reference Russell & Norvig: Chapter 5. 1 Constraints Constraints are
More informationCSL851: Algorithmic Graph Theory Semester I Lecture 4: August 5
CSL851: Algorithmic Graph Theory Semester I 201314 Lecture 4: August 5 Lecturer: Naveen Garg Scribes: Utkarsh Ohm Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationTell Me What You See and I will Show You Where It Is
Tell Me What You See and I will Show You Where It Is Jia Xu Alexander G. Schwing 2 Raquel Urtasun 2,3 University of WisconsinMadison 2 University of Toronto 3 TTI Chicago jiaxu@cs.wisc.edu {aschwing,
More informationImage Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm
Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm Lokesh Babu Rao 1 C. Elayaraja 2 1PG Student, Dept. of ECE, Dhaanish Ahmed College of Engineering,
More informationCaseFactor Diagrams for Structured Probabilistic Modeling
CaseFactor Diagrams for Structured Probabilistic Modeling David McAllester TTI at Chicago mcallester@ttic.org Michael Collins CSAIL Massachusetts Institute of Technology mcollins@ai.mit.edu Fernando
More informationLargeScale Similarity and Distance Metric Learning
LargeScale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
More informationSteven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
More informationLARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 234) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationStructured Models for FinetoCoarse Sentiment Analysis
Structured Models for FinetoCoarse Sentiment Analysis Ryan McDonald Kerry Hannan Tyler Neylon Mike Wells Jeff Reynar Google, Inc. 76 Ninth Avenue New York, NY 10011 Contact email: ryanmcd@google.com
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationDistributed Dynamic Load Balancing for IterativeStencil Applications
Distributed Dynamic Load Balancing for IterativeStencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationLecture 6: The Bayesian Approach
Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Loglinear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationBig Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013
Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 20042013 lonnitaylor Data is multimodal, multirelational, spatiotemporal,
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges
Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationLearning Organizational Principles in Human Environments
Learning Organizational Principles in Human Environments Outline Motivation: Object Allocation Problem Organizational Principles in Kitchen Environments Datasets Learning Organizational Principles Features
More informationCompact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
More informationWORKFLOW ENGINE FOR CLOUDS
WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationFinding the M Most Probable Configurations Using Loopy Belief Propagation
Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationMachine learning challenges for big data
Machine learning challenges for big data Francis Bach SIERRA Projectteam, INRIA  Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt  December 2012
More informationHow Conditional Random Fields Learn Dynamics: An ExampleBased Study
Computer Communication & Collaboration (2013) Submitted on 27/May/2013 How Conditional Random Fields Learn Dynamics: An ExampleBased Study Mohammad Javad Shafiee School of Electrical & Computer Engineering,
More informationTekniker för storskalig parsning
Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Tekniker för storskalig parsning 1(19) Generative
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? Web data (web logs, click histories) ecommerce applications (purchase histories) Retail purchase histories
More informationLecture 7: Approximation via Randomized Rounding
Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining
More informationLearning. CS461 Artificial Intelligence Pinar Duygulu. Bilkent University, Spring 2007. Slides are mostly adapted from AIMA and MIT Open Courseware
1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware 2 Learning What is learning? 3 Induction David Hume Bertrand Russell
More informationWhy graph clustering is useful?
Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as any other clustering Identification of communities in social networks Webpage clustering for better data management
More informationExponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A klabeling of vertices of a graph G(V, E) is a function V [k].
More informationA Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery
A Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
More informationPredicting Program Properties from Big Code
Predicting Program Properties from Big Code * POPL * Artifact Consistent * Complete * Well Documented * Easy to Reuse * Evaluated * AEC * Veselin Raychev Department of Computer Science ETH Zürich veselin.raychev@inf.ethz.ch
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationMapReduce for Bayesian Network Parameter Learning using the EM Algorithm
apreduce for Bayesian Network Parameter Learning using the E Algorithm Aniruddha Basak Carnegie ellon University Silicon Valley Campus NASA Research Park, offett Field, CA 94035 abasak@cmu.edu Irina Brinster
More informationCURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NONLINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NONLINEAR PROGRAMMING R.Kohila
More informationObject Separation in Xray Image Sets. Geremy Heitz, Gal Chechik Qylur Security Systems CVPR 2010 June 16, 2010
Object Separation in Xray Image Sets SATISϕ Geremy Heitz Gal Chechik Qylur Security Systems CVPR 2010 June 16 2010 Motivation Security checkpoints based on xray screening are increasingly important Our
More informationJubatus: An Open Source Platform for Distributed Online Machine Learning
Jubatus: An Open Source Platform for Distributed Online Machine Learning Shohei Hido Seiya Tokui Preferred Infrastructure Inc. Tokyo, Japan {hido, tokui}@preferred.jp Satoshi Oda NTT Software Innovation
More informationMultiClass and Structured Classification
MultiClass and Structured Classification [slides prises du cours cs29410 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294fall09 Basic Classification in ML Input Output
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationSelfPaced Learning for Latent Variable Models
SelfPaced Learning for Latent Variable Models M. Pawan Kumar Benjamin Packer Daphne Koller Computer Science Department Stanford University {pawan,bpacker,koller}@cs.stanford.edu Abstract Latent variable
More informationBig Data: Big N. V.C. 14.387 Note. December 2, 2014
Big Data: Big N V.C. 14.387 Note December 2, 2014 Examples of Very Big Data Congressional record text, in 100 GBs Nielsen s scanner data, 5TBs Medicare claims data are in 100 TBs Facebook 200,000 TBs See
More informationCSE 4351/5351 Notes 7: Task Scheduling & Load Balancing
CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wentau Yih Dav Zimak Department of Computer Science University of Illinois at UrbanaChampaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationA Review on Load Balancing Algorithms in Cloud
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Graphical Models 10708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 13. You must complete all four problems to
More informationKeywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
More informationDynamic Programming and Graph Algorithms in Computer Vision
Dynamic Programming and Graph Algorithms in Computer Vision Pedro F. Felzenszwalb and Ramin Zabih Abstract Optimization is a powerful paradigm for expressing and solving problems in a wide range of areas,
More informationScheduling Home Health Care with Separating Benders Cuts in Decision Diagrams
Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationSolving NP Hard problems in practice lessons from Computer Vision and Computational Biology
Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem www.cs.huji.ac.il/ yweiss
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationOPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering
More informationlargescale machine learning revisited Léon Bottou Microsoft Research (NYC)
largescale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationRecovering the Shape of Objects in 3D Point Clouds with Partial Occlusions
Recovering the Shape of Objects in 3D Point Clouds with Partial Occlusions Rudolph Triebel 1,2 and Wolfram Burgard 1 1 Department of Computer Science, University of Freiburg, GeorgesKöhlerAllee 79, 79108
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationApproximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and HingFung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
More information