Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC
|
|
- Natalie Holt
- 8 years ago
- Views:
Transcription
1 Load Balancing for Distributed Stream Processing Engines Muhammad Anis Uddin Nasir EMDC
2 About me Ex EMDC from Batch 011 (the party batch) Currently PhD Student at KTH Royal Institute of Technology Under supervision of Šarūnas Girdzijauskas Internship Ex-intern at Yahoo Labs Barcelona Intern at Aalto University
3 About me 3
4 About me 4
5 About me 5
6 About me 6
7 About me 7
8 About me 8
9 Why PhD? Freedom Work timings Set of problems Flexibility Conferences Internships 9
10 Research Interest Stream Processing Social Network Analysis Distributed Systems Decentralized Online Social Networks
11 Stream Processing Engines Streaming Application Online Machine Learning Real Time Query Processing Continuous Computation Streaming Frameworks Storm, Borealis, S4, Samza, Spark Streaming 11
12 Stream Processing Model Streaming Applications are represented by Directed Acyclic Graphs (DAGs) Source Data Stream Source Operators Data Channels 1
13 Stream Grouping Key or Fields Grouping Hash-based assignment Stateful operations, e.g., page rank, degree count Shuffle Grouping Round-robin assignment Stateless operations, e.g., data logging, OLTP 13
14 Key Grouping Key Grouping Scalable Low Memory Load Imbalance 14
15 Shuffle Grouping Shuffle Grouping Load Balance Memory O(W) Aggregation O(W) 15
16 Problem Formulation Input is a unbounded sequence of messages from a key distribution Each message is assigned to a worker for processing Load balance properties Memory Load Balance Network Load Balance Processing Load Balance Metric: Load Imbalance 16
17 Partial Key Grouping (PKG) Key Splitting Split each key into two server Assign each instance using power of two choices Local Load Estimation each source estimates load on workers using the local routing history 17
18 Partial Key Grouping (PKG) Key Splitting Local Load Estimation 00 1 Source 1 0 Source 18
19 Partial Key Grouping (PKG) Key Splitting Distributed Stateless Handle Skew Local load estimation No coordination among sources No communication with workers 19
20 Partial Key Grouping PKG Load Balance Memory O(1) Aggregation O(1) 0
21 Streaming Applications Most algorithms that use Shuffle Grouping can be expressed using Partial Key Grouping to reduce: Memory footprint Aggregation overhead Algorithms that use Key Grouping can be rewritten to achieve load balance 1
22 Stream Grouping: Summary Grouping Pros Cons Key Grouping Scalable Memory Load Balance Load Imbalance Shuffle Grouping Partial Key Grouping Memory O(W) Aggregation O(W) Scalable Load Balance Memory O(1) Aggregation O(1)
23 Experiments What is the effect of key splitting on POTC? How does local estimation compare to a global oracle? How does PKG perform on a real deployment on Apache Storm? 3
24 Experimental Setup Metric the difference of maximum and the average load of the workers at time t Datasets Twitter, 1.G tweets (crawled July 01) Wikipedia, M access logs Twitter, 690K cashtags (crawled Nov 013) Social Networks, 69M edges Synthetic, M keys 4
25 Effect of Key Splitting 5
26 Local Load Estimation 6
27 Throughput (keys/s) Real deployment: Apache Storm PKG SG 00 KG s 300s 60s s 30s 60s s PKG SG KG 30s s (a) CPU delay (ms) 600s (b) Memory (keys)
28 Load Balancing for Distributed Stream Processing Engines Muhammad Anis Uddin Nasir EMDC
29 Does PKG work for highly ed data? 9
30 Skewness Source Source 30
31 Skewness Source Source 31
32 Skewness Source Source 3
33 High level ideas Splitting the high frequency keys on more than two workers Challenges How to decide which are the high frequency keys? How many workers to split the high frequency keys? 33
34 Solution Divide stream into two components Heavy Hitters Tail Distribution Split Heavy Hitters on all the workers Use Greedy W-Choices or Round Robin for heavy hitters Use vanilla PKG for tail 34
35 Skewness Source Source 35
36 Experiments- Threshold 5 workers I(t) (messages) workers 0 W-Choice / W 1/ W 1/ W 1/4 W 1/8 W 1. W-Choice workers I(t) (messages) W-Choice RR workers 0 RR RR workers 0-1 W-Choice workers 0 0 workers 0 50 workers 0-7 RR 1. 36
37 Experiments- Load Imbalance 5 workers I(t) (messages) workers k unique items PKG W-Choice RR -1 - k unique items workers workers 0 0k unique items 0k unique items M unique items -7 1M unique items 1M unique items M unique items workers workers k unique items workers workers workers 0-1 k unique items workers 0 0k unique items k unique items workers 0-3 I(t) (messages) -6 I(t) (messages) 50 workers
38 Conclusion Partial Key Grouping (PKG) reduces the load imbalance by up to seven orders of magnitude compared to Key Grouping PKG imposes constant memory and aggregation overhead, i.e., O(1), compared to Shuffle Grouping that is O(W) W-Choices improves PKG to achieve nearly perfect load balance for highly ed input 38
39 Load Balancing for Distributed Stream Processing Engines Muhammad Anis Uddin Nasir EMDC
Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco
Engines Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Stream Processing Engines Online Machine Learning Real Time Query Processing ConCnuous ComputaCon Distributed RPC 2 Stream Processing Engines
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More information5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology. stadler@ee.kth.
5 Performance Management for Web Services Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology stadler@ee.kth.se April 2008 Overview Service Management Performance Mgt QoS Mgt
More informationCS 558 Internet Systems and Technologies
CS 558 Internet Systems and Technologies Dimitris Deyannis deyannis@csd.uoc.gr 881 Heat seeking Honeypots: Design and Experience Abstract Compromised Web servers are used to perform many malicious activities.
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationOnline and Scalable Data Validation in Advanced Metering Infrastructures
Online and Scalable Data Validation in Advanced Metering Infrastructures Chalmers University of technology Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationGraphalytics: A Big Data Benchmark for Graph-Processing Platforms
Graphalytics: A Big Data Benchmark for Graph-Processing Platforms Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, Peter Boncz Delft University of Technology Universitat Politècnica
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationExploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence
Exploring Oracle E-Business Suite Load Balancing Options Venkat Perumal IT Convergence Objectives Overview of 11i load balancing techniques Load balancing architecture Scenarios to implement Load Balancing
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationGossip Based Partitioning and Replication Middleware in Network Infrastructure
Gossip-based Partitioning and Replication Middleware for Online Social Networks Muhammad Anis Uddin Nasir Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:181 School of Information and
More informationTolerating Late Timing Faults in Real Time Stream Processing Systems. Stuart Perks
School of Computing FACULTY OF ENGINEERING Tolerating Late Timing Faults in Real Time Stream Processing Systems Stuart Perks Submitted in accordance with the requirements for the degree of Advanced Computer
More informationMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
More informationSTREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
More informationCluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
More informationArchitectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationInfrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationYARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing
YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom
More informationOn demand synchronization and load distribution for database grid-based Web applications
Data & Knowledge Engineering 51 (24) 295 323 www.elsevier.com/locate/datak On demand synchronization and load distribution for database grid-based Web applications Wen-Syan Li *,1, Kemal Altintas, Murat
More informationB4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13
B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13 Google s Software Defined WAN Traditional WAN Routing Treat all bits the same 30% ~ 40% average utilization Cost of
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationManagement by Network Search
Management by Network Search Misbah Uddin, Prof. Rolf Stadler KTH Royal Institute of Technology, Sweden Dr. Alex Clemm Cisco Systems, CA, USA November 11, 2014 ANRP Award Talks Session IETF 91 Honolulu,
More informationI Logs. Apache Kafka, Stream Processing, and Real-time Data Jay Kreps
I Logs Apache Kafka, Stream Processing, and Real-time Data Jay Kreps The Plan 1. What is Data Integration? 2. What is Apache Kafka? 3. Logs and Distributed Systems 4. Logs and Data Integration 5. Logs
More informationStreaming items through a cluster with Spark Streaming
Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? > Project Management Committee (PMC) member
More informationGroup Based Load Balancing Algorithm in Cloud Computing Virtualization
Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationBig Data Analysis: Apache Storm Perspective
Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationCase Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.
Case Study - I Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008 Challenges The scalability of the database servers to execute batch processes under
More informationMeasuring CDN Performance. Hooman Beheshti, VP Technology
Measuring CDN Performance Hooman Beheshti, VP Technology Why this matters Performance is one of the main reasons we use a CDN Seems easy to measure, but isn t Performance is an easy way to comparison shop
More informationRecommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the
More informationQoS & Traffic Management
QoS & Traffic Management Advanced Features for Managing Application Performance and Achieving End-to-End Quality of Service in Data Center and Cloud Computing Environments using Chelsio T4 Adapters Chelsio
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationJava Performance. Adrian Dozsa TM-JUG 18.09.2014
Java Performance Adrian Dozsa TM-JUG 18.09.2014 Agenda Requirements Performance Testing Micro-benchmarks Concurrency GC Tools Why is performance important? We hate slow web pages/apps We hate timeouts
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationJVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationDRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure
More informationCognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationOptimization of QoS for Cloud-Based Services through Elasticity and Network Awareness
Master Thesis: Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness Alexander Fedulov 1 Agenda BonFIRE Project overview Motivation General System Architecture Monitoring
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationSPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble!
SPARK USE CASE IN TELCO Apache Spark Night 9-2-2014! Chance Coble! Use Case Profile Telecommunications company Shared business problems/pain Scalable analytics infrastructure is a problem Pushing infrastructure
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationReal-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011
Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationMesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
More informationEclipse Exam Tutorial - Pros and Cons
Table of Contents 3. PaaS: the example of Google AppEngine Guillaume Pierre Université de Rennes 1 Fall 2012 http://www.globule.org/~gpierre/ 1 2 Developing Java applications in AppEngine 3 The Data Store
More informationLOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS
LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS Venkat Perumal IT Convergence Introduction Any application server based on a certain CPU, memory and other configurations
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More informationReal-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
More informationBookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011
BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationGeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
More informationAffinity Aware VM Colocation Mechanism for Cloud
Affinity Aware VM Colocation Mechanism for Cloud Nilesh Pachorkar 1* and Rajesh Ingle 2 Received: 24-December-2014; Revised: 12-January-2015; Accepted: 12-January-2015 2014 ACCENTS Abstract The most of
More informationLoad Manager Administrator s Guide For other guides in this document set, go to the Document Center
Load Manager Administrator s Guide For other guides in this document set, go to the Document Center Load Manager for Citrix Presentation Server Citrix Presentation Server 4.5 for Windows Citrix Access
More informationManaging large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
More informationMANAGING RESOURCES IN A BIG DATA CLUSTER.
MANAGING RESOURCES IN A BIG DATA CLUSTER. Gautier Berthou (SICS) EMDC Summer Event 2015 www.hops.io @hopshadoop We are producing lot of data Where does they Come From? On-line services : PBs per day Scientific
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationMultifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers
Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers Íñigo Goiri, J. Oriol Fitó, Ferran Julià, Ramón Nou, Josep Ll. Berral, Jordi Guitart and Jordi Torres
More informationICS 143 - Principles of Operating Systems
ICS 143 - Principles of Operating Systems Lecture 5 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu Note that some slides are adapted from course text slides 2008 Silberschatz. Some
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationSla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing
Sla Aware Load Balancing Using Join-Idle Queue for Virtual Machines in Cloud Computing Mehak Choudhary M.Tech Student [CSE], Dept. of CSE, SKIET, Kurukshetra University, Haryana, India ABSTRACT: Cloud
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationBig Data. A general approach to process external multimedia datasets. David Mera
Big Data A general approach to process external multimedia datasets David Mera Laboratory of Data Intensive Systems and Applications (DISA) Masaryk University Brno, Czech Republic 7/10/2014 Table of Contents
More informationMEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied
More informationScalability Factors of JMeter In Performance Testing Projects
Scalability Factors of JMeter In Performance Testing Projects Title Scalability Factors for JMeter In Performance Testing Projects Conference STEP-IN Conference Performance Testing 2008, PUNE Author(s)
More informationA Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
More informationproject collects data from national events, both natural and manmade, to be stored and evaluated by
Joseph Sebastian CS 2994 Spring 2014 Undergraduate Research Final Paper GOALS The goal of my research was to assist the Integrated Digital Event Archive (IDEAL) team in transferring their Twitter data
More informationKeywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationCOSC1300, COSC1301. Web Servers and Web Technology
School of Computer Science and Information Technology COSC1300, COSC1301 Lecture 10: Load Balancing of Web Servers Lecture Overview Load Balancing What it is? Why use it? Where is it used? Benefits thereof?
More informationEnhancing MapReduce Functionality for Optimizing Workloads on Data Centers
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationUsing Kafka to Optimize Data Movement and System Integration. Alex Holmes @
Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationVDI FIT and VDI UX: Composite Metrics Track Good, Fair, Poor Desktop Performance
VDI FIT and VDI UX: Composite Metrics Track Good, Fair, Poor Desktop Performance Key indicators and classification capabilities in Stratusphere FIT and Stratusphere UX Whitepaper INTRODUCTION This whitepaper
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationComparison of PBRR Scheduling Algorithm with Round Robin and Heuristic Priority Scheduling Algorithm in Virtual Cloud Environment
www.ijcsi.org 99 Comparison of PBRR Scheduling Algorithm with Round Robin and Heuristic Priority Scheduling Algorithm in Cloud Environment Er. Navreet Singh 1 1 Asst. Professor, Computer Science Department
More informationAgenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:
Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination
More informationTITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12
TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12 Matthias Broecheler, CTO August VIII, MMXII AURELIUS THINKAURELIUS.COM Abstract Titan is an open source distributed graph database build on top
More information