Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco
|
|
- Ralf Palmer
- 8 years ago
- Views:
Transcription
1 Engines Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco
2 Stream Processing Engines Online Machine Learning Real Time Query Processing ConCnuous ComputaCon Distributed RPC 2
3 Stream Processing Engines Streaming ApplicaCons are represented by Directed Acyclic Graphs (DAGs) Source Worker Worker Source Worker 3
4 Stream Grouping Key or Fields Grouping Hash- based assignment Stateful operacons, e.g., page rank, degree count Shuffle Grouping Round- robin assignment Stateless operacons, e.g., data logging, OLTP All Grouping 4
5 Data DistribuCon Skewed DistribuCon Power- law Zipf DistribuCon Log Normal Scale- free network Example ApplicaCon Social Networks Web Users Economy Biology 5
6 Key Grouping Worker Source Worker Source Worker 6
7 Shuffle Grouping Worker Source Worker Aggregator Source Worker 7
8 Stream Grouping Key Grouping Efficient RouCng Load Imbalance Shuffle Grouping Load Balance AddiConal Memory AddiConal AggregaCon phase 8
9 A possible solucon Dynamic load rebalancing detect load imbalance perform data migracon Challenges How o_en to check the load imbalance MigraCon is not directly supported with most of the DSPEs and requires extra modificacon State management for stateful operacon 9
10 Power of two choices (POTC) Balls- and- bins problem Algorithm For each ball, pick two bins uniformly at random Assign the ball to least loaded of the two bins Issues Non uniform key distribucon Load InformaCon 10
11 Power of two choices Worker Source Worker Issues: Consensus on keys Load InformaCon Load imbalance Source Worker 11
12 ParCal Key Grouping (PKG) The Power of Both Choices: Key Splieng Split each key into two server Assign each instance using power of two choices Benefits: Decentralized Stateless Handle Skew 12
13 ParCal Key Grouping Local load escmacon each source escmates load on workers using the local roucng history Benefits: No coordinacon among sources No communicacon with workers 13
14 ParCal Key Grouping Worker Source Worker Aggregator Source Worker 14
15 Analysis Problem FormulaCon n workers - > bins keys ki ε K - > colors m messages - > colored balls Minimize the difference between maximum and average workload 15
16 Analysis Key DistribuCon We pick each key ki ε K with probability pi from the distribucon D, where p1 p2 p3. Maximum load is proporconal to the most frequent key (with probability p1) If p1 > 2/n the expected imbalance will be lower bounded by I(m) = (p1 /2 1/n) m 16
17 Analysis Assume a key distribucon D with maximum probability p1 2/n. Then the imbalance a_er m steps of Greedy- d process sacsfies, with probability at least 1 1/n 17
18 Analysis An example with four workers In ideal scenario, each worker should handle 25% of the keys We need to consider three cases: When p1 = 2/4 = 0.5 When p1 > 0.5 When p1 <
19 Analysis An example with four workers In ideal scenario, each worker should handle 25% of the keys We need to consider three cases: When p1 = 2/4 = 0.5 When p1 > 0.5 When p1 <
20 Analysis An example with four workers In ideal scenario, each worker should handle 25% of the keys We need to consider three cases: When p1 = 2/4 = 0.5 When p1 > 0.5 When p1 <
21 ApplicaCons Most algorithms that use Shuffle Grouping can be expressed using ParCal Key Grouping to reduce: Memory footprint AggregaCon overhead Algorithms that use Key Grouping can be rewriren to achieve load balance 21
22 Examples Naïve Bayes Classifier Streaming Parallel Decision Trees Heavy Hirers and Space Saving 22
23 Naïve Bayes Classifier Counts co- occurrences of each feature and class value Key Grouping VerCcal Parallelism: each feature is tracked by single worker process Shuffle Grouping Horizontal Parallelism: each feature is tracked by all worker processes ParCal Key Grouping Each feature is tracked by exactly two processes 23
24 Stream Groupings: A summary Stream Grouping Pros Cons Key Grouping - Scalable - Load Imbalance Shuffle Grouping - Load Balance - Memory Overhead - AggregaCon O(W) Par;al Key Grouping - Scalable - Load Balance - Memory Cost - AggregaCon O(1) 24
25 Experiments What is the effect of key spli=ng on POTC? How does local es;ma;on compare to a global oracle? How robust is ParCal Key Grouping? How does PKG perform on a real deployment on Apache Storm? 25
26 Metric Load Imbalance the difference between the maximum and the average load of the workers at Cme t 26
27 Effect of Key Splieng Wikipedia (WP) Twirer (TW) Workers 27
28 Local Load EsCmaCon 28
29 Robustness Changing trends in data: cashtags Used in the stock market to idencfy a publicly traded company: e.g., $AAPL for Apple Skewed load at source: social networks Test different data distribucon at the sources 29
30 Robustness 5 workers 10 workers 50 workers 100 workers 30
31 Robustness 25 - I (messages) Uniform L 5 Skewed L 5 Uniform L 10 Skewed L 10 Uniform L 15 Skewed L 15 Uniform L 20 Skewed L workers 31
32 Real deployment: Apache Storm Throughput (keys/s) PKG SG KG (a) CPU delay (ms) s 30s 60s 60s 300s 600s 300s 30s PKG SG 10s 1000 KG (b) Memory (keys) 600s 32
33 Future Work Dynamic Load Balancing Key MigraCon: moving top- k keys across workers ParCCon MigraCon: moving subset of keys across workers Handling worker churn with ParCal Key Grouping Applying queuing theory with ParCal Key Grouping for load balancing 33
34 Conclusion ParCal Key Grouping (PKG) reduces the load imbalance by up to seven orders of magnitude compared to Key Grouping PKG imposes constant memory and aggregacon overhead, i.e., O(1), compared to Shuffle Grouping that is O(W) Apache Storm 60% improvement in throughput 45% improvement in latency 34
35 Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco
36 Datasets Twirer, 1.2G tweets (crawled July 2012) Wikipedia, 22M access logs Twirer, 690K cashtags (crawled Nov 2013) Social Networks, 69M edges SyntheCc, 10M keys 36
37 Streaming Parallel Decision Tree Use shuffle grouping, where each worker generates histogram Aggregator is used to merge the results from W workers Memory footprint proporconal to W workers ParCal Key Grouping reduces overhead to 2 workers 37
38 Heavy Hirers and Space Saving Solve top- k items in constant Cme and space Key Grouping Error dependent on a single escmator Poor load balancing Shuffle grouping Error bounds depends on number of workers ParCal Key Grouping Berer Load Balancing Berer EsCmaCon 38
Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC 2011-13
Load Balancing for Distributed Stream Processing Engines Muhammad Anis Uddin Nasir EMDC 011-13 About me Ex EMDC from Batch 011 (the party batch) Currently PhD Student at KTH Royal Institute of Technology
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationHigh Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es
High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
More informationRecommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationIMPROVED PROXIMITY AWARE LOAD BALANCING FOR HETEROGENEOUS NODES
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 2 Issue 6 June, 2013 Page No. 1914-1919 IMPROVED PROXIMITY AWARE LOAD BALANCING FOR HETEROGENEOUS NODES Ms.
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationPerformance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
More informationEvaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
More informationUsing In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationBALANCING FOR DISTRIBUTED BACKUP
CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based
More informationA Novel Data Placement Model for Highly-Available Storage Systems
A Novel Data Placement Model for Highly-Available Storage Systems Rama, Microsoft Research joint work with John MacCormick, Nick Murphy, Kunal Talwar, Udi Wieder, Junfeng Yang, and Lidong Zhou Introduction
More informationA Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationLoad Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
More informationPerformance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
More informationSelf-Tuning Memory Management of A Database System
Self-Tuning Memory Management of A Database System Yixin Diao diao@us.ibm.com IM 2009 Tutorial: Recent Advances in the Application of Control Theory to Network and Service Management DB2 Self-Tuning Memory
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationThe Complete Performance Solution for Microsoft SQL Server
The Complete Performance Solution for Microsoft SQL Server Powerful SSAS Performance Dashboard Innovative Workload and Bottleneck Profiling Capture of all Heavy MDX, XMLA and DMX Aggregation, Partition,
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationLOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS
LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS Venkat Perumal IT Convergence Introduction Any application server based on a certain CPU, memory and other configurations
More informationLBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services
LBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services Ossama Othman Jaiganesh Balasubramanian Dr. Douglas C. Schmidt {jai, ossama, schmidt}@dre.vanderbilt.edu
More informationTackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationInternational journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article
More information2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationEWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationTESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY
TESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY 2 Intro to Load Testing Copyright 2009 TEST4LOAD Software Load Test Experts What is Load Testing? Load testing generally refers to the
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationHadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationExploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence
Exploring Oracle E-Business Suite Load Balancing Options Venkat Perumal IT Convergence Objectives Overview of 11i load balancing techniques Load balancing architecture Scenarios to implement Load Balancing
More informationSorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationJVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
More informationWhite Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationSla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing
Sla Aware Load Balancing Using Join-Idle Queue for Virtual Machines in Cloud Computing Mehak Choudhary M.Tech Student [CSE], Dept. of CSE, SKIET, Kurukshetra University, Haryana, India ABSTRACT: Cloud
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationProduct Overview. UNIFIED COMPUTING Managed Load Balancing Data Sheet
Product Overview Interoute s Load Balancing and Application Delivery services provide high availability, security and increased performance to your critical business applications. Based on the industry-leading
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationCognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
More informationCase Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.
Case Study - I Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008 Challenges The scalability of the database servers to execute batch processes under
More informationLoad Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
More informationJava Performance. Adrian Dozsa TM-JUG 18.09.2014
Java Performance Adrian Dozsa TM-JUG 18.09.2014 Agenda Requirements Performance Testing Micro-benchmarks Concurrency GC Tools Why is performance important? We hate slow web pages/apps We hate timeouts
More informationSharding and MongoDB. Release 3.0.7. MongoDB, Inc.
Sharding and MongoDB Release 3.0.7 MongoDB, Inc. November 15, 2015 2 MongoDB, Inc. 2008-2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 United States License
More informationThe International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com
THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Efficient Parallel Processing on Public Cloud Servers using Load Balancing Manjunath K. C. M.Tech IV Sem, Department of CSE, SEA College of Engineering
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationAgencyPortal v5.1 Performance Test Summary Table of Contents
AgencyPortal v5.1 Performance Test Summary Table of Contents 1. Testing Approach 2 2. Server Profiles 3 3. Software Profiles 3 4. Server Benchmark Summary 4 4.1 Account Template 4 4.1.1 Response Time 4
More informationOn demand synchronization and load distribution for database grid-based Web applications
Data & Knowledge Engineering 51 (24) 295 323 www.elsevier.com/locate/datak On demand synchronization and load distribution for database grid-based Web applications Wen-Syan Li *,1, Kemal Altintas, Murat
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationGraphalytics: A Big Data Benchmark for Graph-Processing Platforms
Graphalytics: A Big Data Benchmark for Graph-Processing Platforms Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, Peter Boncz Delft University of Technology Universitat Politècnica
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationParallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data
Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin
More informationCray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
More informationInternational Journal of Engineering Research & Management Technology
International Journal of Engineering Research & Management Technology March- 2015 Volume 2, Issue-2 Survey paper on cloud computing with load balancing policy Anant Gaur, Kush Garg Department of CSE SRM
More informationCS 558 Internet Systems and Technologies
CS 558 Internet Systems and Technologies Dimitris Deyannis deyannis@csd.uoc.gr 881 Heat seeking Honeypots: Design and Experience Abstract Compromised Web servers are used to perform many malicious activities.
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More information5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology. stadler@ee.kth.
5 Performance Management for Web Services Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology stadler@ee.kth.se April 2008 Overview Service Management Performance Mgt QoS Mgt
More informationPerformance Testing and Optimization in Web-Service Based Applications
Performance Testing and Optimization in Web-Service Based Applications Mesfin Mulugeta mesfin.mulugeta@blackboard.com Sr. Software Performance Engineer Goals of the Presentation Brief introduction to software
More informationComprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering
Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha
More informationWhitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com
Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing
More informationWide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)
Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce
More informationPERFORMANCE MODELS FOR APACHE ACCUMULO:
Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHAREDNOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc. I M NOT ADAM FUCHS But perhaps
More informationHow To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
More informationAgenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:
Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination
More informationHiBench Installation. Sunil Raiyani, Jayam Modi
HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationthese three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
More informationBBM467 Data Intensive ApplicaAons
Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr FoundaAons of Data[base] Clusters Database Clusters Hardware Architectures Data
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More information