Database Scalability with Parallel Databases
|
|
- Kimberly Richards
- 8 years ago
- Views:
Transcription
1 Database Scalability with Parallel Databases Boston MySQL Meetup Monday, December 10, 2012
2 What should I do while you yabber along for 1 hour? 1. Eat Pizza 2. Ask questions, and keep me honest 3. Tweet about us, include #parelastic or #mysql and you will be part of the demo!
3 The background CDN CDN Application Cache database We try very hard to avoid hitting the database But there are times when you cannot avoid it! 3
4 Why? 4
5 Some alternatives Scale Up Scale Out 5
6 tpmc Viability of scale-up (1) Measured tpmc on Amazon RDS Small Large Xlarge 2XLarge 4XLarge RDS Instance Size Source: Amazon RDS TPC-C Benchmark. Md. Borhan Uddin, Bo He, Radu Sion, Cloud Computing Center, SUNY Stony Brook. Viewed online 6
7 Viability of scale-up (2) Source: Amazon RDS TPC-C Benchmark. Md. Borhan Uddin, Bo He, Radu Sion, Cloud Computing Center, SUNY Stony Brook. Viewed online 7
8 Database Scale Out: It s been around! [1986] The Case for Shared Nothing (Michael Stonebraker) [1991] Parallel database systems: the future of high performance database systems (David Dewitt, Jim Gray) [2004] MapReduce: simplified data processing on large clusters (Jeffrey Dean, Sanjay Ghemawat) [1984] Teradata releases the world's first parallel data warehouses and data marts. [1997] Teradata customer creates world's largest production database at 24 terabytes. [2001-3] Netezza, Vertica, Greenplum, Datallegro, [2004] MapReduce [2010] ParElastic 8
9 What is a parallel database? A system that seeks to improve performance through the parallelization of various operations such as scanning, restriction, aggregation, sorting, loading, index management, and system recovery. Typically based on the Shared Nothing architecture Shared disk is also common Shared memory is less common Often for Analytics (Netezza, Vertica, ) But not always (Oracle Exadata) 9
10 A shared-nothing parallel database Application Application Application Application Application System Management Parser & Planner System Metadata Transaction Management Query Orchestration database slice 1 database slice 2 database slice 3 database slice 4 database slice 5 database slice 6 10
11 database slice 1 database slice 2 database slice 3 How it works: The scenario 11
12 database slice 1 database slice 2 database slice 3 How it works: The scenario Customer (CID, ) 12
13 database slice 1 database slice 2 database slice 3 How it works: The scenario Customer (CID, ) Items (IID, ) 13
14 database slice 1 database slice 2 database slice 3 How it works: The scenario Customer (CID, ) Orders(OID, CID, IID) Items (IID, ) 14
15 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 15
16 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 16
17 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 17
18 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 18
19 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 19
20 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 20
21 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 21
22 database slice 1 database slice 2 database slice 3 How it works: The scenario CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 22
23 database slice 1 database slice 2 database slice 3 Who bought a bowl? CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; 23
24 database slice 1 database slice 2 database slice 3 Scanning in parallel CID CID CID 1 Abel 2 Baker 3 Charles 4 Dave 5 Ed 6 Fred 7 Grace 8 Henry 9 Ivan 10 Julia 11 Kim 12 Louis OID CID IID OID CID IID OID CID IID IID IID IID 101 Cup 102 Bowl 103 Fork 104 Plate 105 Spoon 106 Knife 24
25 Scanning in parallel 20: :00.0 Time to scan a 2b row table 16 m 08.4 s 10: : :00.0 MySQL Native M1.4XLarge ($1.80/hour) 03 m 11.4 s ParElastic (5 MySQL slices) 5 x M1.Large ($1.30/hour) Note: Amazon EC2 pricing for spot instances, December m1.4xlarge $1.80/hour, m1.large $0.26/hour 25
26 Counting rows in a table Table is distributed on a number of database sites I want to count the number of rows in the table SELECT COUNT(*) FROM CUSTOMERS; 26
27 Counting rows in a table SELECT COUNT(*) FROM CUSTOMERS; agg agg RETURN: SELECT SUM(CT) AS COUNT(*) FROM T1; CONSOLIDATE INTO T1: SELECT COUNT(*) AS CT FROM CUSTOMERS; CUSTOMERS 27
28 ParElastic: A parallel database Application Standard interfaces No changes to the application ParElastic Database Virtualization Engine Multiple off-the-shelf MySQL servers acting as one 28
29 A simple join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; C.NAME JOIN O.CID = C.CID O.CID C.CID, C.NAME JOIN O.IID = I.IID CUSTOMERS I.IID O.IID ITEM = Bowl ORDERS 29
30 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; How is this stream distributed? I.IID ITEM = Bowl 30
31 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; Distributed according to table ITEMS (IID) I.IID ITEM = Bowl 31
32 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; Distributed according to table ITEMS (IID) JOIN O.IID = I.IID How is this stream distributed? I.IID ITEM = Bowl ORDERS 32
33 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; Distributed according to table ITEMS (IID) JOIN O.IID = I.IID Distributed according to ORDERS (CID) I.IID O.IID ITEM = Bowl ORDERS 33
34 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; Distributed according to table ITEMS (IID) JOIN O.IID = I.IID Distributed according to ORDERS (CID) I.IID O.IID ITEM = Bowl ORDERS 34
35 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; Broadcast to all sites for join JOIN O.IID = I.IID Distributed according to ORDERS (CID) I.IID O.IID ITEM = Bowl ORDERS 35
36 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID How is = this I.IID AND I.NAME stream = Bowl ; distributed? O.CID JOIN O.CID = C.CID C.CID, C.NAME How is this stream distributed? JOIN O.IID = I.IID CUSTOMERS I.IID O.IID ITEM = Bowl ORDERS 36
37 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; C.NAME JOIN O.CID = C.CID O.CID C.CID, C.NAME JOIN O.IID = I.IID CUSTOMERS I.IID O.IID ITEM = Bowl ORDERS 37
38 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; C.NAME JOIN O.CID = C.CID O.CID C.CID, C.NAME ORDERS CUSTOMERS I.IID ITEM = Bowl 38
39 The parallel (distributed) join SELECT C.NAME FROM CUSTOMERS C, ORDERS O, ITEMS I WHERE C.CID = O.CID AND O.IID = I.IID AND I.NAME = Bowl ; C.NAME BROADCAST TO T1: SELECT IID FROM ITEMS WHERE NAME = Bowl ; RETURN: SELECT C.NAME FROM CUSTOMERS C, ORDERS O, T1 WHERE C.CID = O.CID AND O.IID = T1.IID; O.CID ORDERS JOIN O.CID = C.CID I.IID C.CID, C.NAME CUSTOMERS ITEM = Bowl 39
40 Parallel Databases vs. Sharding Parallel Database Database architecture Application is data location agnostic Application perceives a single database Requires no application rewrites Application is not constrained by parallel database architecture A parallel database handles any schema Sharding Application architecture Application is data location aware Application perceives a collection of databases Requires application rewrites Application is constrained to the limitations of the sharding architecture Not all schemas are shard able 40
41 ParElastic Architecture ParElastic Database Virtualization Engine Application Standard Interfaces Multiple database servers act as one Processing Elastic database processing capacity (add and remove as needed) Storage Transparent partitioning, sharding and replication Add storage without moving existing data Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 41
42 ParElastic data distribution Data distribution based on role in the schema Table data can be distributed Randomly Deterministically based on some attributes Broadcast, XA for transaction consistency Other distribution methods for multi-tenancy Later in this presentation 42
43 Simple Select SELECT * from customers where name like '%bel% ParElastic Database Virtualization Engine Application Standard Interfaces Processing Storage Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 43
44 Query by Distribution Key SELECT * from orders where cid = 7; ParElastic Database Virtualization Engine Application Standard Interfaces Processing Storage Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 45
45 Local Join select customers.name, orders.iid from customers, orders where customers.cid = orders.cid; ParElastic Database Virtualization Engine Application Standard Interfaces Processing Storage Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 47
46 Ordered Result select customers.name, orders.iid from customers, orders where customers.cid = orders.cid order by customers.name; ParElastic Database Virtualization Engine Application Standard Interfaces Processing Storage Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 49
47 A Cross Node Query select items.name, count(*) from customers, orders, items where customers.cid = orders.cid and orders.iid = items.iid and customers.name like '%e%' group by items.name; ParElastic Database Virtualization Engine Application Standard Interfaces Processing Storage Off-the-Shelf MySQL Databases (dynamic) Off-the-Shelf MySQL Databases (persistent) 51
48 What s the overhead? Query Time 15.72ms Test Client Machine 1 Query Time 17.03ms Test Client Machine 1 Network RTT 0.35ms ParElastic overhead ~ 1.31ms ParElastic Machine 2 mysqld Machine 2 mysqld Machine 3 mysqld Machine 4 53
49 The power of parallelism 54
50 The power of parallelism Let s look at this one some more 55
51 SaaS Applications and Multi-tenancy
52 CONS PROS SaaS application multi-tenancy Simple MT Each tenant gets their own database & tables Application switches to appropriate database Simplicity Isolation of data Many databases causes performance issues Bad neighbor effect In application MT Application code consolidates tenants into a single database Improves utilization, reduces # databases Code complexity Bad neighbor effect Lock-Step Upgrades Restricts customization ParElastic MT Each tenant believes it has its own database and tables ParElastic virtualizes and consolidates databases Simplicity Isolation of data Rolling upgrades Customizations supported 57
53 ParElastic Adaptive Multi-Tenancy Drupal: A SaaS Application Tens of thousands of Drupal sites Each site has its own database With its own tables and data Schema customizations / modules Thousands of MySQL servers Thousands of Application Servers Rolling software upgrades Some with schema changes 58
54 Database Scalability at Acquia Before ParElastic ZERO lines of code change in Drupal! After ParElastic 11/21/12 ParElastic Overview 59
55 Impact of large database count 62
56 ParElastic data distribution Data distribution based on role in the schema Table data can be distributed Randomly Deterministically based on some attributes Broadcast, XA for transaction consistency Multi-tenant systems use all of the above plus Tenant based distribution semantics 65
57 And now, a demonstration
58 The demonstration It s hard to demonstrate plumbing Much easier to demonstrate a faucet! We demonstrate a Drupal site powered by ParElastic And to simulate traffic We ingest data with specific #hashtags from Twitter So go ahead and tweet with #mysql or #parelastic and you will generate traffic for the demo 67
59 A local start-up So what s ParElastic? Based in Waltham, R&D office in Toronto We re hiring! ParElastic is Database Virtualization Software Make groups of MySQL databases act as one The cardinality of the group can change Based on data size Based on workload 68
60 I want it, where can I get it? We re in closed beta Working with some very exciting early adopters Happy to work with you as well! 69
61 ParElastic Contact information Web: Me 70
BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting
BIG DATA APPLIANCES July 23, TDWI R Sathyanarayana Enterprise Information Management & Analytics Practice EMC Consulting 1 Big data are datasets that grow so large that they become awkward to work with
More informationWhere We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
More informationEnabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings
Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving
More informationHow to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.
1 How To Build a High-Performance Data Warehouse How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. Over the last decade, the
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationCloud Computing Trends
UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered
More information5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*
Whitepaper 5 Signs You Might Be Outgrowing Your MySQL Data Warehouse* *And Why Vertica May Be the Right Fit Like Outgrowing Old Clothes... Most of us remember a favorite pair of pants or shirt we had as
More informationMain Memory Data Warehouses
Main Memory Data Warehouses Robert Wrembel Poznan University of Technology Institute of Computing Science Robert.Wrembel@cs.put.poznan.pl www.cs.put.poznan.pl/rwrembel Lecture outline Teradata Data Warehouse
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationWhy compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge!
Why compute in parallel? Introduction to Data Management CSE 344 Lectures 23 and 24 Parallel Databases Most processors have multiple cores Can run multiple jobs simultaneously Natural extension of txn
More informationQLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationMySQL: Cloud vs Bare Metal, Performance and Reliability
MySQL: Cloud vs Bare Metal, Performance and Reliability Los Angeles MySQL Meetup Vladimir Fedorkov, March 31, 2014 Let s meet each other Performance geek All kinds MySQL and some Sphinx Working for Blackbird
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationMeeting the Needs of Database Management for SaaS: Oracle Database 12c
WHITE PAPER Meeting the Needs of Database Management for SaaS: Oracle Database 12c Sponsored by: Oracle Corp. Carl W. Olofson September 2014 IDC OPINION The move of ISV applications to the cloud is growing
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationAzure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationOne-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code
More informationCloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
More informationData Warehousing Reinvented for the Cloud World. Benoit Dageville
Data Warehousing Reinvented for the Cloud World Benoit Dageville Snowflake? Startup founded in August 2012 with the ambition to build a data warehouse for the cloud Located downtown San Mateo 90+ employees,
More informationStorage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
More informationJeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
More informationBig Data Database Revenue and Market Forecast, 2012-2017
Wikibon.com - http://wikibon.com Big Data Database Revenue and Market Forecast, 2012-2017 by David Floyer - 13 February 2013 http://wikibon.com/big-data-database-revenue-and-market-forecast-2012-2017/
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationNear Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationIntroduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
More informationReferences. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline
References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationOLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope
More informationOracle Database Cloud Service Rick Greenwald, Director, Product Management, Database Cloud
Oracle Database Cloud Service Rick Greenwald, Director, Product Management, Database Cloud Agenda Oracle Cloud Database Service Overview Cloud taxonomy What is the Database Cloud Service? Architecture
More informationAn Oracle White Paper May 2012. Oracle Database Cloud Service
An Oracle White Paper May 2012 Oracle Database Cloud Service Executive Overview The Oracle Database Cloud Service provides a unique combination of the simplicity and ease of use promised by Cloud computing
More informationCloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 The information contained in this presentation relates to a pre-release product which may be substantially modified before it is commercially released. This pre-release
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationOracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationLARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationMySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli
Leveraging MySQL Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli MySQL is a popular, open-source Relational Database Management System (RDBMS) designed to run on almost
More informationPerformance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationCloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
More informationLogistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.
Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related
More informationThe Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing
The Vertica Analytic Database Technical Overview White Paper A DBMS Architecture Optimized for Next-Generation Data Warehousing Copyright Vertica Systems Inc. March, 2010 Table of Contents Table of Contents...2
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More informationBringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
More informationExploring Amazon EC2 for Scale-out Applications
Exploring Amazon EC2 for Scale-out Applications Presented by, MySQL & O Reilly Media, Inc. Morgan Tocker, MySQL Canada Carl Mercier, Defensio Introduction! Defensio is a spam filtering web service for
More informationReal Life Performance of In-Memory Database Systems for BI
D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010 10th European TDWI Conference Munich, June 2010 Authors: Dr.
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationLarge-Scale Web Applications
Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationScaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
More informationMonetizing Millions of Mobile Users with Cloud Business Analytics
Monetizing Millions of Mobile Users with Cloud Business Analytics MicroStrategy World 2013 David Abercrombie Data Analytics Engineer Agenda Tapjoy Big Data Architecture MicroStrategy Cloud Implementation
More informationCloud Computing Is In Your Future
Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion
More informationData Management in the Cloud. Zhen Shi
Data Management in the Cloud Zhen Shi Overview Introduction 3 characteristics of cloud computing 2 types of cloud data management application 2 types of cloud data management architecture Conclusion Introduction
More informationElastic Data Warehousing in the Cloud Is the sky really the limit?
Elastic Data Warehousing in the Cloud Is the sky really the limit? By Kees van Gelder Faculty of exact sciences Vrije Universiteit Amsterdam, the Netherlands Index Abstract... 3 1. Introduction... 3 2.
More informationCloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 Michael Faden Partner Technology Advisor Microsoft Schweiz 1 Beyond Virtualization virtualization The power of many servers, the simplicity of one Every app,
More informationORACLE COHERENCE 12CR2
ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationBig Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13
Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Astrid Rheinländer Wissensmanagement in der Bioinformatik What is Big Data? collection of data sets so large and complex
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationI/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
More informationCiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
More informationCitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationTransforming the Economics of Data Warehousing with Cloud Computing
Transforming the Economics of Data Warehousing with Cloud Computing How new frontiers in on-demand computing and DBMS technology will transform business. Copyright Vertica Systems Inc. November, 2008 Table
More informationUnderstanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon
More informationOutdated Architectures Are Holding Back the Cloud
Outdated Architectures Are Holding Back the Cloud Flash Memory Summit Open Tutorial on Flash and Cloud Computing August 11,2011 Dr John R Busch Founder and CTO Schooner Information Technology JohnBusch@SchoonerInfoTechcom
More informationResearch on the data warehouse testing method in database design process based on the shared nothing frame
Research on the data warehouse testing method in database design process based on the shared nothing frame Abstract Keming Chen School of Continuing Education, XinYu University,XinYu University, JiangXi,
More informationCloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015
Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational
More informationHigh Performance IT Insights. Building the Foundation for Big Data
High Performance IT Insights Building the Foundation for Big Data Page 2 For years, companies have been contending with a rapidly rising tide of data that needs to be captured, stored and used by the business.
More informationSession Storage in Zend Server Cluster Manager
Session Storage in Zend Server Cluster Manager Shahar Evron Technical Product Manager, Zend Technologies Welcome! All Phones are muted type your questions into the Webex Q&A box A recording of this session
More informationAnalyzing Big Data with AWS
Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,
More informationTop Ten Data Management Trends
Top Ten Data Management Trends September, 2013 Raj Gill Founder and President, Scalability Experts Executive Summary The amount of data that companies need to manage is doubling every couple of years and
More informationArchitecting Distributed Databases for Failure A Case Study with Druid
Architecting Distributed Databases for Failure A Case Study with Druid Fangjin Yang Cofounder @ Imply The Bad The Really Bad Overview The Catastrophic Best Practices: Operations Everything is going to
More informationDatabase Internals (Overview)
Database Internals (Overview) Eduardo Cunha de Almeida eduardo@inf.ufpr.br Outline of the course Introduction Database Systems (E. Almeida) Distributed Hash Tables and P2P (C. Cassagne) NewSQL (D. Kim
More information