Machine Learning over Big Data


 Meghan Knight
 2 years ago
 Views:
Transcription
1 Machine Learning over Big Presented by Fuhao Zou Jue 16, 2014 Huazhong University of Science and Technology
2 Contents Role of Machine learning Challenge of Big Analysis Distributed Machine learning Conclusion
3 Big already Happened 1 Billion Tweets Per Week 750 Million Facebook Users 6 Billion Flickr Photos 48 Hours of Video Uploaded every Minute
4 4V in Big Characteristics 1.Volume 2.Variety 3.Velocity 57% 4.Value Description of the contents
5 Knowledge "If a tree falls in a forest and no one is around to hear it, does it make a sound?"  George Berkeley Nobody Knows What Is In All These Unless Having Them Processed and Analyzed
6 Why do Machine Learning on Big? Machine learning incorporates data analyses covering predictive analytics, data mining, pattern recognition, and multivariate statistics. Basic Analytics Counts and Averages Preprocessing SQL Queries Human Scale Hand Crafted Advanced Analytics (Machine Learning) Discovering Patterns Making Predictions Multivariate Queries Machine Scale Driven Unfortunately traditional analytics tools are not well suited to capturing the value hidden in Big. 6
7 Contents Role of Machine learning Challenge of Big Analysis Distributed Machine learning Conclusion
8 Challenge #1 Massive Scale ~1B node, do not fitting into the main memory of a single machine, a familiar problem!
9 Challenge #2 Gigantic Model Size > parameters, do not fitting into the main memory of a single machine either!
10 Challenge #3 Huge Cognitive Space 1M ~ 1B categories are seen in modern extreme classification problems, knn? SVM?
11 Distributed ML is a necessity Feature Engineering Parameter Tuning Problem Definition Large Scale Application Learning Methods & Theory Learning Methods Efficient Algorithms Learning Theory Distributed Machine Learning Programming Models Parallel Computing Distributed Systems Resource Mgmt Distributed Storage and Computing
12 Contents Role of Machine learning Challenge of Big Analysis Distributed Machine learning Conclusion
13 Categories of Distributed ML Distributed ML 01.Parallel ML 02.GraphParallel ML
14 Parallel ML Linear regression with gradient descent Repeat { (for every ) }
15 Mapreduce and data parallelism Batch gradient descent: Machine 1: Use 100 temp j (1) (h (x (i) ) y (i) ) i 1 x j (i) Machine 2: Use Machine 3: Use Combine: j : j (temp (1) (2) j temp j temp (3) j temp (4) j ) Machine 4: Use
16 Mapreduce Computer 1 Training set Computer 2 Combine results Computer 3 Computer 4
17 Graph Parallel ML
18 Social Arithmetic: Label Propagation Sue Ann 50% What I list on my profile 40% Sue Ann Likes + 10% Carlos Like 40% I Like: 60% Cameras, 40% Biking Profile 80% Cameras 20% Biking Recurrence Algorithm: j Friends[i] Likes[i] W ij Likes[ j] iterate until convergence Parallelism: Compute all Likes[i] in parallel Me 50% 10% Carlos 50% Cameras 50% Biking 30% Cameras 70% Biking
19 PageRank (Centrality Measures) Iterate: Where: α is the random reset probability L[j] is the number of links on page j
20 Matrix Factorization Alternating Least Squares (ALS) m 1 Users Netflix Movi es x Users u i Movies m j User Factors (U) u 1 u 2 r 11 r 12 r 23 r 24 m 2 m 3 Movie Factors (M) Update Function computes:
21 Other Examples Statistical Inference in Relational Models Belief Propagation Gibbs Sampling Network Analysis Centrality Measures Triangle Counting Natural Language Processing CoEM Topic Modeling
22 Bulk Synchronous Parallel Iterations CPU 1 CPU 1 CPU 1 CPU 2 CPU 2 CPU 2 CPU 3 CPU 3 CPU 3 Barrier Barrier Barrier
23 Pregel: Bulk Synchronous Parallel Compute Communicate Barrier
24 Open Source Implementations PREGEL : https://plus.google.com/+pregel/ Giraph: Golden Orb:
25 Curse of the Slow Job Iterations CPU 1 CPU 1 CPU 1 CPU 2 CPU 2 CPU 2 CPU 3 CPU 3 CPU 3 Barrier Barrier Barrier
26 Tradeoffs of the BSP Model Pros: Graph Parallel Relatively easy to build Deterministic execution Cons: Doesn t exploit the graph structure Can lead to inefficient systems
27 Asynchronous Computation Model Threads proceed to next iteration without waiting. An asynchronous variant: GraphLab:
28 What is GraphLab?
29 The GraphLab Framework Graph Based Representation Update Functions User Computation Scheduler Consistency Model
30 Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. Graph: Social Network Vertex : User profile text Current interests estimates Edge : Similarity weights
31 New Perspective on Partitioning Natural graphs have poor edge separators Classic graph partitioning tools (e.g., ParMetis, Zoltan ) fail Y Y CPU 1 CPU 2 Must synchronize many edges Natural graphs have good vertex separators Y Y Must synchronize a single vertex CPU 1 CPU 2
32 Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex pagerank(i, scope){ // Get Neighborhood data (R[i], W ij, R[j]) scope; // Update the vertex data R[ i] (1 ) j N[ i] W R[ j]; // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } ji
33 The Scheduler The scheduler determines the order that vertices are updated Scheduler ba hi CPU 1 CPU 2 a h b c e f g i j d k The process repeats until the scheduler is empty
34 Consistency Through Scheduling Edge Consistency Model: Two vertices can be Updated simultaneously if they do not share an edge. Graph Coloring: Two vertices can be assigned the same color if they do not share an edge. Phase 1 Phase 2 Phase 3 Execute in Parallel Synchronously Barrier Barrier Barrier
35 Contents Role of Machine learning Challenge of Big Analysis Distributed Machine learning Conclusion
36 Dynamically Changing Graphs Example: Social Networks New users New Vertices New Friends New Edges How do you adaptively maintain computation: Trigger computation with changes in the graph Update interest estimates only where needed Exploit asynchrony Preserve consistency
37 Graph Partitioning How can you quickly place a large datagraph in a distributed environment: Edge separators fail on large powerlaw graphs Social networks, Recommender Systems, NLP Constructing vertex separators at scale: No largescale tools! How can you adapt the placement in changing graphs?
38 New Computation model is needed???? BSP: safe by slow Async: fast but risky
39
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationLargeScale Data Processing
LargeScale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationOutline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for LargeScale Parallel/Distributed Computing 2/28/2013
MapReduce & GraphLab: Programming Models for LargeScale Parallel/Distributed Computing Iftekhar Naim Outline Motivation MapReduce Overview Design Issues & Abstractions Examples and Results Pros and Cons
More informationOverview on Graph Datastores and Graph Computing Systems.  Litao Deng (Cloud Computing Group) 06082012
Overview on Graph Datastores and Graph Computing Systems  Litao Deng (Cloud Computing Group) 06082012 Graph  Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationLARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationAdapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits
Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis Pelle Jakovits Outline Problem statement State of the art Approach Solutions and contributions Current work Conclusions
More informationParallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
More informationThe Power of Relationships
The Power of Relationships Opportunities and Challenges in Big Data Intel Labs Cluster Computing Architecture Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO
More informationAdvanced Data Science on Spark
Advanced Data Science on Spark Reza Zadeh @Reza_Zadeh http://rezazadeh.com Data Science Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationBig Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify
Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationDistributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud Yucheng Low Carnegie Mellon University ylow@cs.cmu.edu Danny Bickson Carnegie Mellon University bickson@cs.cmu.edu Joseph
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIAUIUCAL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationBIG DATA Giraph. Felipe Caicedo December2012. Cloud Computing & Big Data. FIBUPC Master MEI
BIG DATA Giraph Cloud Computing & Big Data Felipe Caicedo December2012 FIBUPC Master MEI Content What is Apache Giraph? Motivation Existing solutions Features How it works Components and responsibilities
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationParallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 20132014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECASCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECASCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationSteven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat Karim Awara Amani Alonazi Hani Jamjoom Dan Williams Panos Kalnis King Abdullah University of Science and Technology,
More informationBig Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tuberlin.de/marklv volker.markl@tuberlin.de Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
More informationTowards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tuberlin.de dima.tuberlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
More informationLet the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph Sebastian Schelter Invited talk at GameDuell Berlin 29th May 2012 the mandatory about me slide PhD student at the Database Systems and Information Management
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationGraph Algorithms using MapReduce
Graph Algorithms using MapReduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using MapReduce Graphs are ubiquitous in modern society.
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationAnalytics Drives Big Data Drives Infrastructure Confessions of Storage turned Analytics Geeks
Analytics Drives Big Data Drives Infrastructure Confessions of Storage turned Analytics Geeks Dr. Aloke Guha 29th IEEE Conference on Massive Data Storage May 8 th, 2013 aloke@cruxly.com What s Common Between
More informationScalable Machine Learning  or what to do with all that Big Data infrastructure
 or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Clickthrough prediction Personalized Spam Detection
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin highlevel map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationMLlib: Scalable Machine Learning on Spark
MLlib: Scalable Machine Learning on Spark Xiangrui Meng Collaborators: Ameet Talwalkar, Evan Sparks, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationFastForward I/O and Storage: ACG 6.6 Demonstration
FastForward I/O and Storage: ACG 6.6 Demonstration NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL SECURITY, LLC WHO IS THE OPERATOR AND MANAGER
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationDistributed R for Big Data
Distributed R for Big Data Indrajit Roy, HP Labs November 2013 Team: Shivara m Erik Kyungyon g Alvin Rob Vanish A Big Data story Once upon a time, a customer in distress had. 2+ billion rows of financial
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationLarge Scale Learning
Large Scale Learning Data hypergrowth: an example Reuters 21578: about 10K docs (ModApte) Bekkerman et al, SIGIR 2001 RCV1: about 807K docs Bekkerman & Scholz, CIKM 2008 LinkedIn job Mtle data: about
More informationBig Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions
Big Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past  Traditional
More informationMachine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationMaking Big Data Processing Simple with Spark. Matei Zaharia
Making Big Data Processing Simple with Spark Matei Zaharia December 17, 2015 What is Apache Spark? Fast and general cluster computing engine that generalizes the MapReduce model Makes it easy and fast
More informationMapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data
More informationengineering pipelines for learning at scale Benjamin Recht University of California, Berkeley
engineering pipelines for learning at scale Benjamin Recht University of California, Berkeley If you can pose your problem as a simple optimization problem, you re mostly done + + + + +   + +    
More informationGraph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang
Graph Mining on Big Data System Presented by Hefu Chai, Rui Zhang, Jian Fang Outline * Overview * Approaches & Environment * Results * Observations * Notes * Conclusion Overview * What we have done? *
More informationBig Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationAnalyzing Big Data with AWS
Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Dataintensive
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationReplicationbased Faulttolerance for Largescale Graph Processing
Replicationbased Faulttolerance for Largescale Graph Processing Peng Wang Kaiyuan Zhang Rong Chen Haibo Chen Haibing Guan Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel
More informationLoad balancing. David Bindel. 12 Nov 2015
Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single processor performance Typically in the memory system Saw this in matrix multiply assignment Overhead for parallelism
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationBig Data Analytics. Genoveva VargasSolar http://www.vargassolar.com/bigdataanalytics French Council of Scientific Research, LIG & LAFMIA Labs
1 Big Data Analytics Genoveva VargasSolar http://www.vargassolar.com/bigdataanalytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE
More informationISSN: 23201363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
More informationManaging large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
More informationSIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014
SIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014 Paul G. Constantine! Applied Math & Stats! Colorado School of Mines David F. Gleich! Computer Science! Purdue University Hans De Sterck!
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationCS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving
More informationMapReduce for Machine Learning on Multicore
MapReduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers  dual core to 12+core Shift to more concurrent programming paradigms and languages Erlang,
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationSpark. Fast, Interactive, Language Integrated Cluster Computing
Spark Fast, Interactive, Language Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationNeural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons
Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion
More informationExtreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1
Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing
More informationBig Data Paradigms in Python
Big Data Paradigms in Python San Diego Data Science and R Users Group January 2014 Kevin Davenport! http://kldavenport.com kldavenportjr@gmail.com @KevinLDavenport Thank you to our sponsors: Setting up
More information