5 Database technology Trends. Guy Harrison, Executive Director, Information Management R&D



Similar documents
Cloud Scale Distributed Data Storage. Jürmo Mehine

Lecture Data Warehouse Systems

Hadoop Ecosystem B Y R A H I M A.

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Structured Data Storage

BIG DATA TRENDS AND TECHNOLOGIES

Large scale processing using Hadoop. Ján Vaňo

So What s the Big Deal?

How To Scale Out Of A Nosql Database

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Big Data Technologies Compared June 2014

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

<Insert Picture Here> Big Data

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Challenges for Data Driven Systems

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

NoSQL Data Base Basics

How To Create A Data Visualization With Apache Spark And Zeppelin

Introduction to Big Data Training

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Introduction to NOSQL

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Moving From Hadoop to Spark

Big Data and Data Science: Behind the Buzz Words

Big Data: Tools and Technologies in Big Data

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hurtownie Danych i Business Intelligence: Big Data

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Can the Elephants Handle the NoSQL Onslaught?

Constructing a Data Lake: Hadoop and Oracle Database United!

Dell In-Memory Appliance for Cloudera Enterprise

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Database Performance with In-Memory Solutions

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data Analytics - Accelerated. stream-horizon.com

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Hadoop implementation of MapReduce computational model. Ján Vaňo

Applications for Big Data Analytics

Hadoop: Embracing future hardware

Big Data Course Highlights

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

A Survey of Distributed Database Management Systems

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Oracle Database 12c Plug In. Switch On. Get SMART.

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Hadoop IST 734 SS CHUNG

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Open source large scale distributed data management with Google s MapReduce and Bigtable

NoSQL for SQL Professionals William McKnight

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Workshop on Hadoop with Big Data

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

3 Case Studies of NoSQL and Java Apps in the Real World

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Scalable Architecture on Amazon AWS Cloud

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introducing Oracle Exalytics In-Memory Machine

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Oracle Big Data SQL Technical Update

Architectures for Big Data Analytics A database perspective

The Future of Data Management

Database Scalability and Oracle 12c

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Cost-Effective Business Intelligence with Red Hat and Open Source

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Large-Scale Data Processing

MapReduce with Apache Hadoop Analysing Big Data

Comparing SQL and NOSQL databases

Database Revolution: Old SQL, NewSQL, NoSQL Huh? Michael Bowers April 9, 2013 v2.9

Big Data Are You Ready? Thomas Kyte

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Peninsula Strategy. Creating Strategy and Implementing Change

Main Memory Data Warehouses

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

2009 Oracle Corporation 1

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Benchmarking Cassandra on Violin

Transcription:

5 Database technology Trends Guy Harrison, Executive Director, Information Management R&D

Introductions Web: guyharrison.net Email: guy.harrison@software.dell.com Twitter: @guyharrison

But Seriously

5 Database Technology Trends 1. The end of one size fits all 2. Big Data and Hadoop 3. NoSQL 4. Columnar architectures 5. In-memory databases

Trend #1: The end of one size fits all 8

History of databases Pre-computer technologies: Printing press Dewey decimal system Punched cards Magnetic tape flat (sequential) files Magnetic Disk IDMS ADABAS System R Oracle V2 Access Postgres MySQL HBase Dynamo MongoDB Redis VoltDB Neo4J 1940-50 1950-60 1960-70 1970-80 1980-90 1990-2000 2000-2010 Relational Model defined IMS Network Model Hierarchical model Indexed-Sequential Access Mechanism (ISAM) SQL Server Sybase Informix Ingres DB2 dbase Aerospike Hana Riak Cassandra Vertica Hadoop

Why? 3 rd Platform drives new demands on the database: Global High Availability Data volumes Unstructured data Transaction rates Latency A single architecture cannot meet all those demands

It takes all sorts In-memory processing (Spark) Analytic/BI software (SAS, Tableau) Web Server Data Warehouse RDBMS (Oracle, Terradata ) In-memory Analytics (HANA, Exalytics ) Hadoop Web DBMS (MySQL, Mongo, Cassandra) Operational RDBMS (Oracle, SQL Server, ) ERP & inhouse CRM

Oracle engineered systems

Trend #2: Big Data and Hadoop 14

The 3-4 V s Value Volume Terabytes Petabytes Exabytes Zetabytes Variety Structured Unstructured Human Generated Machine Generated Velocity Transaction rates User populations Machines

The Industrial revolution of data

2005

2009

The instrumented human Compass Camera Mike/earphones Heads up display Emotion/Attention monitor Bluetooth Personal Area Network 3G/WiFi Wide Area Network GPS Storage Pulse, temp monitor Silent alarms Pedometer, sleep monitoring

The instrumented world

Big Data is the culmination of cloud, social and mobile

More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

Pioneers of big data

Google Software Architecture (circa 2005) Google Applications Map Reduce BigTable Google File System (GFS)

Map Reduce Start Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Reduce

Hadoop: 1.0: Open Source Map-Reduce Stack

Hadoop at Yahoo 2010(biggest cluster): 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores 2014: 16 Clusters 32,500 nodes

Hadoop family Oozie (Workflow manager) Hive (Query) Pig (Scripting) SQOOP (RDBMS loader) Flume (Log Loader) Map Reduce / YARN Hbase (database) Zookeeper (locking) Hadoop File System (HDFS)

Economies Exadata vs Hadoop $$/TB (Hardware only) Hadoop $750 Exadata $4,911 $0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

Hadoop is the most concrete Big Data technology Toad: your companion in the Big Data revolution

More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

Big Data Analytics AKA Data Science Machine Learning Programs that evolve with experience Collective Intelligence Programs that use inputs from crowds to simulate intelligence Predictive Analytics Programs that extrapolate from past to future

Collective Intelligence Siri call me an ambulance From now on, I ll call you An Ambulance. OK?

Data science 250 Predictive Analytics Classification Clustering Model training and deployment 200 150 100 y = 0.9715x + 0.7191 50 0 0 50 100 150 200

Trend #3: NoSQL

Web Servers Memcached Servers Database Servers Read Only Slaves Shard (A-F) Shard (G-O) Shard (P-Z)

CAP Theorem says something has to give CAP (Brewer s) Theorem says you can only have two out of three of Consistency, Partition Tolerance, Availability Partition Tolerance System stays up when network between nodes fail Consistency Everyone always sees the same data NO GO Availability System stays up when nodes fail Oracle RAC lives here Most NoSQL lives here

Major influences on non-relational Amazon Dynamo Eventually consistent transaction model Consistent hashing Google BigTable Column Family model for sparse distributed columnar data OODBMS and XML DBs Paved the way for the document database

Amazon Dynamo Model

BigTable Data Model NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com Name Site Counter Dick Ebay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649.............. 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261.............. 856,767

OODBMS -1990s The OODBMS Manifesto (Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo nik, '90) "A relational database is like a garage that forces you to take your car apart and store the pieces in little drawers Also SQL is ugly A Object database is like a closet which requires that you hang up your suit with tie, underwear, belt socks and shoes all attached (Dave Ensor) http://4.bp.blogspot.com/- IPgd1Tg8ByE/UkOzHg1FmI/AAAAAAAACB0/QYg8kE Vp5_0/s1600/db4o_vs_orm.png

Revenge of the Object Nerds Document databases Structured documents XML and JSON (JavaScript Object Notation) become more prevalent within applications Web programmers start storing these in BLOBS in MySQL Emergence of XML and JSON databases

Memchache DB MongoDB Key Value Oracle NoSQL Voldemort JSON based CouchDB Dynamo DynamoDB Document RethinkDB Riak XML based MarkLogic BerkeleyDB XML Cassandra Hbase Neo4J Table Based BigTable HyperTable Graph Database Infinite Graph Accumulo FlockDB

It s not a database, it s a key value store http://browsertoolkit.com/fault-tolerance.png

No Means Yes!

Trend #4: Column-oriented DB Dell - Restricted - Confidential

Row orientation vs column orientation Row oriented database ID Name DOB Salary Sales Expenses 1001 Dick 21/12/60 67,000 78980 3244 1002 Jane 12/12/55 55,000 67840 2333 1003 Robert 17/02/80 22,000 67890 6436 1004 Dan 15/03/75 65,200 98770 2345 1005 Steven 11/11/81 76,000 43240 3214 Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Column oriented database

Analytical Queries Row oriented database SELECT SUM(salary) FROM saleperson Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Column oriented database

Compression Row oriented database Poor compression ratio (low repetition) Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Good compression ratio (high repetition) Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Column oriented database

Inserts Row oriented database INSERT INTO salesperson Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Column oriented database

C-Store (Vertica) Solution for inserts Bulk sequential loads Merged Query Read Optimized Store Columnar Disk-based Highly Compressed Bulk loadable Asynchronous Tuple Mover Continual Parallel inserts Write Optimized Store Row oriented Uncompressed Single row inserts

Exadata Hybrid Columnar Compression (EHCC) Compression Unit (~<1M) Block (8K) Block Block Block Column 1 Column 2 Column 3 Column 4 Row Row Row

Exadata Hybrid Columnar Compression SELECT SUM(Column4) FROM table Provides high compression ratio Manageable impact on row read/write operations Some optimization of analytic queries

Trend #5: The End of Disk? 68

5MB HDD circa 1956

The more that things change...

Faster or slower? IO/CPU -390 CPU 1,013 IO/Capacity -630 Disk Capacity 1,635 IO Rate 260-1,000-500 0 500 1,000 1,500 2,000 %age change

Solid state disk to the rescue DDR RAM Drive SATA flash drive PCI flash drive SSD storage Server

Cheaper by the IO SSD DDR-RAM SSD PCI flash SSD SATA Flash 15 25 80 Magnetic Disk 4,000 0 1,000 2,000 3,000 4,000 5,000 Seek time (us)

$$/GB $$/GB But not by the GB 12 10 10 10 2.9 8 2.2 7.4 1.7 2.3 1.3 1 6 1 5.3 2011 2012 2013 2014 2015 4 0.35 2.9 2 0.28 2.2 1.7 0.21 3.2 1.3 0.17 2.3 1 0.35 0.28 0.21 0.17 0.13 0.13 0 0.12011 2012 2013 2014 2015 HDD MLC SDD SLC SSD

$/GB Tiered storage management Main Memory DDR SSD Flash SSD Fast Disk (SAS, RAID 0+1) $/IOP Slow Disk (SATA, RAID 5) Tape, Flat Files, Hadoop

Cost (US$/GB) Size (GB) In-Memory databases $100,000.00 100 Cost of RAM falling 50% each 18 months. $10,000.00 10 Some databases can fit entirely within the RAM of a single server or cluster of servers $1,000.00 $100.00 US$/GB Size (GB) 1 0.1 $10.00 0.01 $1.00 0.001 1990 1995 2000 2005 2010 2015 2020 Year

Oracle Times Ten Clients In-memory transactional database Disk-based Checkpoints and disk-based logging By default, COMMITs are not durable (writes to the transaction log are asynchronous). Can configure synchronous replication or synchronous log writes to avoid data loss Columnar compression and analytic functions in the Exalytics version Memory Point in time snapshot Commits Checkpoints Transaction Logs

SAP Hana Memory Column store Persistence Layer Txn logs Row Store Savepoints Data files Delta store Note: Table must be either row or column not both

Exalytics Instantaneous!

You keep using that word. I do not think it means what you think it means

Exalytics Hardware: 2 TB RAM 4 10GBe, 2 InfiniBand ports 6x1.2TB SAS (7.2 TB) 3x800GB (2.4TB) SSD Software: Oracle BI ESSBase Oracle R Times-Ten 12c In-memory

VoltDB Clients Clients Clients Single threaded access to memory: no latch/mutex waits Transactions in selfcontained stored procedures: minimal locking K-Safety for COMMIT: No sync waits CPU CPU CPU CPU CPU CPU In-memory Partition In-memory Partition In-memory Partition In-memory Partition In-memory Partition In-memory Partition

Spark (sort of) in-memory Hadoop In Memory compute Spark Streaming Mlib Machine Learning SparkSQL HDFS compatible Libraries for data processing, machine learning, streaming, SQL, etc Spark: in-memory distributed compute Python and Scala interfaces Part of the Berkeley Data Analytic Stack HDFS Tachyon in memory File system Mesos Cluster manager

Oracle 12c in-memory database Column store Memory (SGA) Row store Column Store (IMCU) OLTP Analytics (SMU) Redo Logs Data files

What does all this mean for me?

Trend #6: shameless product plugs will increase over the next 120 seconds 89

Toad: your companion in the Big Data revolution

Toad for Hadoop

SharePlex for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Capture Redo-logs Batched HDFS File Copy Audit / Change Data

Toad BI Suite join and analyse data from any source

Dell Statistica

Dell In-Memory Appliances for Cloudera Enterprise Starter Configuration 8 Node Cluster R720-4 Infrastructure Nodes R720XD- 4 Data Nodes Force10- S55 ~176TB (disk raw space) ~1.5TB (raw memory) Mid-Size Configuration 16 Node Cluster R720-4 Infrastructure Nodes R720XD- 12 Data Nodes Force10- S4810P Force10- S55 ~528TB (disk raw space) ~4.5 TB (raw memory) Small Enterprise Configuration 24 Node Cluster R720-4 Infrastructure Nodes R720XD- 20 Data Nodes ~880TB (disk raw space) ~7.5 TB (raw memory) Expansion Unit- R720XD-4 Data, Cloudera Enterprise Data Hub, Scale in Blocks

Dell appliances for any database Dell provides appliances and reference architectures specifically designed for: Oracle SQL Server HANA SSD database acceleration Large memory footprints

Big Data for the rest of us Success in Big Data requires capabilities at multiple technology levels: hardware, software infrastructure, business intelligence and analytics Only Dell can deliver capabilities at every technology layer Only Dell s solutions are designed and priced to suit mid-market initial deployments and to scale to the largest enterprise Advanced Analytics Business Intelligence Data Integration Systems Management Hadoop and database software Server and Storage Toad Data point Boomi Statistica Boomi, Toad Intelligence Central Dell Foglight and TOAD Dell appliances for Hadoop, Oracle, etc Dell servers and storage arrays

Thank you.