How To Win At A Game Of Monopoly On The Moon
|
|
- Rebecca Edwards
- 3 years ago
- Views:
Transcription
1 Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio, RJ, Brazil CIDR
2 2
3 Many Data Management & Analytics Systems Available 3
4 Many Systems are Available as Cloud Services 4
5 Which Hadoop Version? Cloud Services Today Amazon EMR Pig or Hive? How many instances of the service? 5
6 Which Hadoop Version? Cloud Services Today Amazon EMR Pig or Hive? How many instances of the service? 6
7 Cloud Services Today BigQuery How long will my query take? 7
8 Cloud Services Can Do Better! System Internals 8
9 Cloud Services Can Do Better! System Internals 9
10 Cloud Services Can Do Better! Query: Query Capabilities Time Money SELECT FROM WHERE 10
11 A new proposal Time to Re-think the interface Hide details of cluster deployment and resources Show users monetary costs and performance estimates on their data Let users pick the desired trade-off between options shown Personalized Service Level Agreements 11
12 Tier 1: $0.10/hour Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> A PSLA Example Fixed, hourly price Expected performance Templates capture capabilities Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 12
13 Tier 1: $0.10/hour Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> A PSLA Example Fixed, hourly price Expected performance Templates capture capabilities Tier 2: $0.50/hour Within 1 second: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Different tiers of service Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 13
14 Goals Cloud C Database D 14
15 Goals Cloud C PSLA P Database D Money, Time, Capabilities 15
16 Goals Cloud C PSLAManager PSLA P Database D Money, Time, Capabilities 16
17 Goals Cloud C PSLAManager PSLA P Database D Money, Time, Capabilities 17
18 Example of a Real PSLA 18
19 TPC-H Star Schema Benchmark Based on TPC-H 10GB 19
20 Myria is a data management service in the cloud that we built at UW. It has a parallel, shared-nothing back-end query execution engine called MyriaX 20
21 PSLA for Myria 21
22 PSLA for Myria 22
23 PSLA for Myria 23
24 PSLA for Myria 24
25 PSLA for Myria 25
26 PSLAManager Cloud C PSLAManager PSLA P Database D 26
27 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 27
28 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 28
29 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 29
30 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 30
31 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 31
32 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 32
33 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 33
34 Query Workload Generation Which queries to generate? Joins drive performance Think about possible combinations of joins Consider: All possible 2-way joins Tables in Order by Size: Lineitem, Part, Customer, Supplier, Date (Lineitem Part) (Customer Date), (Lineitem Supplier), etc. Only consider most expensive queries Build toward more complex queries, include selections and projections 34
35 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 35
36 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 36
37 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 37
38 Seconds Seconds Tier Selection Runtime Distributions of Query Workload Per Configuration in Myria EMD EMD 7.07 EMD Workers Workers (Configurations) 38
39 Tier Selection 39
40 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 40
41 Workload Compression STEP 1: Query Clustering Threshold-based Density-based Workers 41
42 Workload Compression STEP 1: Query Clustering th Workers 42
43 Workload Compression STEP 1: Query Clustering Workers 43
44 Workload Compression STEP 1: Query Clustering Tier 1: $0.XX/hour Within Cluster Max Threshold: Query Templates in Cluster Within Cluster Max Threshold: Query Templates in Cluster 44
45 Time(s) Workload Compression STEP 2: Template Generation Queries Query Query Templates Dominance SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Configuration 45
46 Time(s) Workload Compression STEP 2: Template Generation Attributes Projected Query Dominance Tables SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Selectivity Configuration 46
47 Time(s) Workload Compression STEP 2: Template Generation Attributes Projected Query Dominance Tables SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Selectivity Given: Configuration 47
48 Time(s) Workload Compression STEP 2: Template Generation Configuration 48
49 Time(s) Workload Compression STEP 2: Template Generation Root Query Template: We call a query template a root query template if no other query template in the same cluster dominates it. Configuration 49
50 Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 50
51 Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Root Query Templates Queries Query Templates Configuration 1 Configuration 2 51
52 Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 52
53 Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 53
54 Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 54
55 PSLA Quality Assessment 55
56 PSLA Quality Metrics PSLA Query Capabilities PSLA Complexity PSLA Performance Error Metric 56
57 Time(s) Time(s) Quality Metric Trade-offs High Complexity Low Error Low Complexity High Error Found that a log interval is best without tuning 57
58 PSLA Evaluation on Predicted Runtimes 58
59 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 59
60 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 60
61 Performance Model (Offline) Train model offline on other data and queries Cloud Configuration Query Feature Vector Est. Rows Est. IO Avg. Row runtime Query workload Predict runtime from query features Query Feature Vector Cloud Configuration Est. Rows Est. IO Avg. Row runtime Based on Predicting Multiple Metrics for Queries: Better Decisions enabled by Machine Learning [Ganapathi et. al. 2009] 61
62 Training Dataset Synthetic Dataset 10GB 6 Tables 61 Attributes 62
63 PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 63
64 Predicted Myria PSLA (Predicted Runtimes) 64
65 Looking Forward Direct extensions to the approach Add support for indexes Improve time predictions Longer-term future work Can we guarantee the runtimes? Can we update the PSLA as user queries data Goal is to show increasingly more complex queries Usability testing 65
66 Conclusion Many cloud DBMSs exist Require users to reason about resources We propose to re-think that interface Personalized Service Level Agreements Service Tiers Price/Capabilities/Performance Important direction for cloud DBMSs 66
Changing the Face of Database Cloud Services with Personalized Service Level Agreements
Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida,, Magdalena Balazinska Computer Science and Engineering Department, University
More informationGetting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
More information: Tiering Storage for Data Analytics in the Cloud
: Tiering Storage for Data Analytics in the Cloud Yue Cheng, M. Safdar Iqbal, Aayush Gupta, Ali R. Butt Virginia Tech, IBM Research Almaden Cloud enables cost-efficient data analytics Amazon EMR Cloud
More informationA Vision for Personalized Service Level Agreements in the Cloud
A Vision for Personalized Service Level Agreements in the Cloud ABSTRACT Jennifer Ortiz, Victor Teixeira de Almeida,, Magdalena Balazinska Computer Science and Engineering Department, University of Washington
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationBuilding a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering Self Service at scale 6 5 4 3 2 1 ? Relational? MPP? Hadoop? Linkedin data 350M Members 25B 3.5M 4.8B 2M
More informationPerformance Analysis of Cloud Relational Database Services
Performance Analysis of Cloud Relational Database Services Jialin Li lijl@cs.washington.edu Naveen Kr. Sharma naveenks@cs.washington.edu June 7, 203 Adriana Szekeres aaazs@cs.washington.edu Introduction
More informationA Generic Auto-Provisioning Framework for Cloud Databases
A Generic Auto-Provisioning Framework for Cloud Databases Jennie Rogers 1, Olga Papaemmanouil 2 and Ugur Cetintemel 1 1 Brown University, 2 Brandeis University Instance Type Introduction Use Infrastructure-as-a-Service
More informationBig data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
More informationDuke University http://www.cs.duke.edu/starfish
Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationCloudRank-D:A Benchmark Suite for Private Cloud Systems
CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction
More informationBig Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big
More informationWhere We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
More informationHadoop and MySQL for Big Data
Hadoop and MySQL for Big Data Alexander Rubin October 9, 2013 About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years Started at MySQL AB, Sun Microsystems, Oracle
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationBig Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
More informationAlexander Rubin Principle Architect, Percona April 18, 2015. Using Hadoop Together with MySQL for Data Analysis
Alexander Rubin Principle Architect, Percona April 18, 2015 Using Hadoop Together with MySQL for Data Analysis About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years
More informationA Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)
1. Computation Amazon Web Services Amazon Elastic Compute Cloud (Amazon EC2) provides basic computation service in AWS. It presents a virtual computing environment and enables resizable compute capacity.
More informationMurat Kantarcioglu, Joint work with (Sharad Mehrotra(UCI), Bhavani Thuraisingham, Kerim Oktay(UCI), Vaibhav Khadilkar, Erman Pattuk)
Murat Kantarcioglu, Joint work with (Sharad Mehrotra(UCI), Bhavani Thuraisingham, Kerim Oktay(UCI), Vaibhav Khadilkar, Erman Pattuk) 1 Cloud Computing App Server Database Cloud Computing Code Multimedia
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
More informationQLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationMaximum performance, minimal risk for data warehousing
SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has
More informationBest Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
More informationDesigning a Data Solution with Microsoft SQL Server 2014
20465C - Version: 1 22 June 2016 Designing a Data Solution with Microsoft SQL Server 2014 Designing a Data Solution with Microsoft SQL Server 2014 20465C - Version: 1 5 days Course Description: The focus
More informationBig Data Benchmark. The Cloudy Approach. Dhruba Borthakur Nov 7, 2012 at the SDSC Industry Roundtable on Big Data and Big Value
Big Data Benchmark The Cloudy Approach Dhruba Borthakur Nov 7, 2012 at the SDSC Industry Roundtable on Big Data and Big Value My Asks for a Cloudy Benchmark 1 BigData system is not a Traditional Database
More informationDeliverable 2.1.4. 150 Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data
Collaborative Project LOD2 Creating Knowledge out of Interlinked Data Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months Deliverable 2.1.4 150 Billion Triple dataset hosted on
More informationResource Provisioning of Web Applications in Heterogeneous Cloud. Jiang Dejun Supervisor: Guillaume Pierre 2011-04-10
Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre -04-10 Background Cloud is an attractive hosting platform for startup Web applications On demand
More informationOracle Database 12c: Performance Management and Tuning NEW
Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Performance Management and Tuning NEW Duration: 5 Days What you will learn In the Oracle Database 12c: Performance Management and Tuning
More informationSQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.
SQL Server 2014 What s New? Christopher Speer Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.com The evolution of the Microsoft data platform What s New
More informationTowards an Optimized Big Data Processing System
Towards an Optimized Big Data Processing System The Doctoral Symposium of the IEEE/ACM CCGrid 2013 Delft, The Netherlands Bogdan Ghiţ, Alexandru Iosup, and Dick Epema Parallel and Distributed Systems Group
More informationParquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationHerodotos Herodotou Shivnath Babu. Duke University
Herodotos Herodotou Shivnath Babu Duke University Analysis in the Big Data Era Popular option Hadoop software stack Java / C++ / R / Python Pig Hive Jaql Oozie Elastic MapReduce Hadoop HBase MapReduce
More informationThe Cubetree Storage Organization
The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationReport 02 Data analytics workbench for educational data. Palak Agrawal
Report 02 Data analytics workbench for educational data Palak Agrawal Last Updated: May 22, 2014 Starfish: A Selftuning System for Big Data Analytics [1] text CONTENTS Contents 1 Introduction 1 1.1 Starfish:
More informationEstimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationBuilding your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha sinhaar@amazon.com AWS Services Deployment & Administration Application Services Compute Storage Database Networking
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationKNIME & Avira, or how I ve learned to love Big Data
KNIME & Avira, or how I ve learned to love Big Data Facts about Avira (AntiVir) 100 mio. customers Extreme Reliability 500 employees (Tettnang, San Francisco, Kuala Lumpur, Bucharest, Amsterdam) Company
More informationSQL Server Instance-Level Benchmarks with HammerDB
SQL Server Instance-Level Benchmarks with HammerDB TPC-C is an older standard for performing synthetic benchmarks against an OLTP database engine. The HammerDB tool is an open-sourced tool that can run
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationComprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering
Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha
More informationSavanna Hadoop on. OpenStack. Savanna Technical Lead
Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization
More informationOptimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud
Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud Zhuoyao Zhang University of Pennsylvania zhuoyao@seas.upenn.edu Ludmila Cherkasova Hewlett-Packard Labs lucy.cherkasova@hp.com
More informationIntroduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases
Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases This guide gives you an introduction to conducting DSS (Decision Support System)
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationWrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney
Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationStorage Layout and I/O Performance in Data Warehouses
Storage Layout and I/O Performance in Data Warehouses Matthias Nicola 1, Haider Rizvi 2 1 IBM Silicon Valley Lab 2 IBM Toronto Lab mnicola@us.ibm.com haider@ca.ibm.com Abstract. Defining data placement
More informationAdobe Deploys Hadoop as a Service on VMware vsphere
Adobe Deploys Hadoop as a Service A TECHNICAL CASE STUDY APRIL 2015 Table of Contents A Technical Case Study.... 3 Background... 3 Why Virtualize Hadoop on vsphere?.... 3 The Adobe Marketing Cloud and
More informationEnterprise GIS Architecture Deployment Options. Andrew Sakowicz
Enterprise GIS Architecture Deployment Options Andrew Sakowicz Audience Audience - Architects - Developers - Administrators - Project Managers Level: - Beginner / Intermediate Introduction Andrew Sakowicz
More informationAlejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationMonitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center
Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Presented by: Dennis Liao Sales Engineer Zach Rea Sales Engineer January 27 th, 2015 Session 4 This Session
More informationHIVE + AMAZON EMR + S3 = ELASTIC BIG DATA SQL ANALYTICS PROCESSING IN THE CLOUD A REAL WORLD CASE STUDY
HIVE + AMAZON EMR + S3 = ELASTIC BIG DATA SQL ANALYTICS PROCESSING IN THE CLOUD A REAL WORLD CASE STUDY Jaipaul Agonus FINRA Strata Hadoop World New York, Sep 2015 FINRA - WHAT DO WE DO? Collect and Create
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationData Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team
Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team Data Warehouse we used to know High-End workload High-End hardware Special know-how
More informationXpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationUsing the Coherence Cloud Service
Using the Coherence Cloud Service An introduction Dave Felcey Coherence Product Manager July 2, 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended
More informationApache Hadoop Ecosystem
Apache Hadoop Ecosystem Rim Moussa ZENITH Team Inria Sophia Antipolis DataScale project rim.moussa@inria.fr Context *large scale systems Response time (RIUD ops: one hit, OLTP) Time Processing (analytics:
More informationReal-time Ad-hoc Analytics on S3 with MemSQL
Real-time Ad-hoc Analytics on S3 with MemSQL Satish Cattamanchi 4INFO Sarvesh Gupta Tavant Technologies September, 2015 ABSTRACT Enterprises are witnessing a rapid increase in data volume with growing
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationManagement & Analysis of Big Data in Zenith Team
Management & Analysis of Big Data in Zenith Team Zenith Team, INRIA & LIRMM Outline Introduction to MapReduce Dealing with Data Skew in Big Data Processing Data Partitioning for MapReduce Frequent Sequence
More informationCRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
More informationCAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
More informationGraph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang
Graph Mining on Big Data System Presented by Hefu Chai, Rui Zhang, Jian Fang Outline * Overview * Approaches & Environment * Results * Observations * Notes * Conclusion Overview * What we have done? *
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationCray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
More information