5 Database technology Trends. Guy Harrison, Executive Director, Information Management R&D

Size: px
Start display at page:

Download "5 Database technology Trends. Guy Harrison, Executive Director, Information Management R&D"

Transcription

1 5 Database technology Trends Guy Harrison, Executive Director, Information Management R&D

2 Introductions Web: guyharrison.net

3

4

5

6 But Seriously

7 5 Database Technology Trends 1. The end of one size fits all 2. Big Data and Hadoop 3. NoSQL 4. Columnar architectures 5. In-memory databases

8 Trend #1: The end of one size fits all 8

9 History of databases Pre-computer technologies: Printing press Dewey decimal system Punched cards Magnetic tape flat (sequential) files Magnetic Disk IDMS ADABAS System R Oracle V2 Access Postgres MySQL HBase Dynamo MongoDB Redis VoltDB Neo4J Relational Model defined IMS Network Model Hierarchical model Indexed-Sequential Access Mechanism (ISAM) SQL Server Sybase Informix Ingres DB2 dbase Aerospike Hana Riak Cassandra Vertica Hadoop

10 Why? 3 rd Platform drives new demands on the database: Global High Availability Data volumes Unstructured data Transaction rates Latency A single architecture cannot meet all those demands

11 It takes all sorts In-memory processing (Spark) Analytic/BI software (SAS, Tableau) Web Server Data Warehouse RDBMS (Oracle, Terradata ) In-memory Analytics (HANA, Exalytics ) Hadoop Web DBMS (MySQL, Mongo, Cassandra) Operational RDBMS (Oracle, SQL Server, ) ERP & inhouse CRM

12 Oracle engineered systems

13 Trend #2: Big Data and Hadoop 14

14 The 3-4 V s Value Volume Terabytes Petabytes Exabytes Zetabytes Variety Structured Unstructured Human Generated Machine Generated Velocity Transaction rates User populations Machines

15 The Industrial revolution of data

16 2005

17 2009

18 The instrumented human Compass Camera Mike/earphones Heads up display Emotion/Attention monitor Bluetooth Personal Area Network 3G/WiFi Wide Area Network GPS Storage Pulse, temp monitor Silent alarms Pedometer, sleep monitoring

19

20 The instrumented world

21 Big Data is the culmination of cloud, social and mobile

22 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

23 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

24 Pioneers of big data

25

26

27

28

29

30 Google Software Architecture (circa 2005) Google Applications Map Reduce BigTable Google File System (GFS)

31 Map Reduce Start Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Reduce

32 Hadoop: 1.0: Open Source Map-Reduce Stack

33 Hadoop at Yahoo 2010(biggest cluster): 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores 2014: 16 Clusters 32,500 nodes

34

35 Hadoop family Oozie (Workflow manager) Hive (Query) Pig (Scripting) SQOOP (RDBMS loader) Flume (Log Loader) Map Reduce / YARN Hbase (database) Zookeeper (locking) Hadoop File System (HDFS)

36 Economies Exadata vs Hadoop $$/TB (Hardware only) Hadoop $750 Exadata $4,911 $0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

37 Hadoop is the most concrete Big Data technology Toad: your companion in the Big Data revolution

38 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

39 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit

40 Big Data Analytics AKA Data Science Machine Learning Programs that evolve with experience Collective Intelligence Programs that use inputs from crowds to simulate intelligence Predictive Analytics Programs that extrapolate from past to future

41

42 Collective Intelligence Siri call me an ambulance From now on, I ll call you An Ambulance. OK?

43 Data science 250 Predictive Analytics Classification Clustering Model training and deployment y = x

44 Trend #3: NoSQL

45 Web Servers Memcached Servers Database Servers Read Only Slaves Shard (A-F) Shard (G-O) Shard (P-Z)

46 CAP Theorem says something has to give CAP (Brewer s) Theorem says you can only have two out of three of Consistency, Partition Tolerance, Availability Partition Tolerance System stays up when network between nodes fail Consistency Everyone always sees the same data NO GO Availability System stays up when nodes fail Oracle RAC lives here Most NoSQL lives here

47 Major influences on non-relational Amazon Dynamo Eventually consistent transaction model Consistent hashing Google BigTable Column Family model for sparse distributed columnar data OODBMS and XML DBs Paved the way for the document database

48 Amazon Dynamo Model

49 BigTable Data Model NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com Name Site Counter Dick Ebay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId SiteId Counter , , , , , , ,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507, , , ,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716, , ,767

50 OODBMS -1990s The OODBMS Manifesto (Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo nik, '90) "A relational database is like a garage that forces you to take your car apart and store the pieces in little drawers Also SQL is ugly A Object database is like a closet which requires that you hang up your suit with tie, underwear, belt socks and shoes all attached (Dave Ensor) IPgd1Tg8ByE/UkOzHg1FmI/AAAAAAAACB0/QYg8kE Vp5_0/s1600/db4o_vs_orm.png

51 Revenge of the Object Nerds Document databases Structured documents XML and JSON (JavaScript Object Notation) become more prevalent within applications Web programmers start storing these in BLOBS in MySQL Emergence of XML and JSON databases

52 Memchache DB MongoDB Key Value Oracle NoSQL Voldemort JSON based CouchDB Dynamo DynamoDB Document RethinkDB Riak XML based MarkLogic BerkeleyDB XML Cassandra Hbase Neo4J Table Based BigTable HyperTable Graph Database Infinite Graph Accumulo FlockDB

53 It s not a database, it s a key value store

54 No Means Yes!

55 Trend #4: Column-oriented DB Dell - Restricted - Confidential

56 Row orientation vs column orientation Row oriented database ID Name DOB Salary Sales Expenses 1001 Dick 21/12/60 67, Jane 12/12/55 55, Robert 17/02/80 22, Dan 15/03/75 65, Steven 11/11/81 76, Block ID Name DOB Salary Sales Expenses Dick 21/12/60 67, Jane 12/12/55 55, Robert 17/02/80 22, Dan 15/03/75 65, Steven 11/11/81 76, Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/ ,000 55,000 22,000 65,200 76, Column oriented database

57 Analytical Queries Row oriented database SELECT SUM(salary) FROM saleperson Block ID Name DOB Salary Sales Expenses Dick 21/12/60 67, Jane 12/12/55 55, Robert 17/02/80 22, Dan 15/03/75 65, Steven 11/11/81 76, Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/ ,000 55,000 22,000 65,200 76, Column oriented database

58 Compression Row oriented database Poor compression ratio (low repetition) Block ID Name DOB Salary Sales Expenses Dick 21/12/60 67, Jane 12/12/55 55, Robert 17/02/80 22, Dan 15/03/75 65, Steven 11/11/81 76, Good compression ratio (high repetition) Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/ ,000 55,000 22,000 65,200 76, Column oriented database

59 Inserts Row oriented database INSERT INTO salesperson Block ID Name DOB Salary Sales Expenses Dick 21/12/60 67, Jane 12/12/55 55, Robert 17/02/80 22, Dan 15/03/75 65, Steven 11/11/81 76, Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/ ,000 55,000 22,000 65,200 76, Column oriented database

60 C-Store (Vertica) Solution for inserts Bulk sequential loads Merged Query Read Optimized Store Columnar Disk-based Highly Compressed Bulk loadable Asynchronous Tuple Mover Continual Parallel inserts Write Optimized Store Row oriented Uncompressed Single row inserts

61 Exadata Hybrid Columnar Compression (EHCC) Compression Unit (~<1M) Block (8K) Block Block Block Column 1 Column 2 Column 3 Column 4 Row Row Row

62 Exadata Hybrid Columnar Compression SELECT SUM(Column4) FROM table Provides high compression ratio Manageable impact on row read/write operations Some optimization of analytic queries

63 Trend #5: The End of Disk? 68

64 5MB HDD circa 1956

65 The more that things change...

66 Faster or slower? IO/CPU -390 CPU 1,013 IO/Capacity -630 Disk Capacity 1,635 IO Rate 260-1, ,000 1,500 2,000 %age change

67 Solid state disk to the rescue DDR RAM Drive SATA flash drive PCI flash drive SSD storage Server

68 Cheaper by the IO SSD DDR-RAM SSD PCI flash SSD SATA Flash Magnetic Disk 4, ,000 2,000 3,000 4,000 5,000 Seek time (us)

69 $$/GB $$/GB But not by the GB HDD MLC SDD SLC SSD

70 $/GB Tiered storage management Main Memory DDR SSD Flash SSD Fast Disk (SAS, RAID 0+1) $/IOP Slow Disk (SATA, RAID 5) Tape, Flat Files, Hadoop

71 Cost (US$/GB) Size (GB) In-Memory databases $100, Cost of RAM falling 50% each 18 months. $10, Some databases can fit entirely within the RAM of a single server or cluster of servers $1, $ US$/GB Size (GB) $ $ Year

72 Oracle Times Ten Clients In-memory transactional database Disk-based Checkpoints and disk-based logging By default, COMMITs are not durable (writes to the transaction log are asynchronous). Can configure synchronous replication or synchronous log writes to avoid data loss Columnar compression and analytic functions in the Exalytics version Memory Point in time snapshot Commits Checkpoints Transaction Logs

73 SAP Hana Memory Column store Persistence Layer Txn logs Row Store Savepoints Data files Delta store Note: Table must be either row or column not both

74 Exalytics Instantaneous!

75 You keep using that word. I do not think it means what you think it means

76 Exalytics Hardware: 2 TB RAM 4 10GBe, 2 InfiniBand ports 6x1.2TB SAS (7.2 TB) 3x800GB (2.4TB) SSD Software: Oracle BI ESSBase Oracle R Times-Ten 12c In-memory

77 VoltDB Clients Clients Clients Single threaded access to memory: no latch/mutex waits Transactions in selfcontained stored procedures: minimal locking K-Safety for COMMIT: No sync waits CPU CPU CPU CPU CPU CPU In-memory Partition In-memory Partition In-memory Partition In-memory Partition In-memory Partition In-memory Partition

78 Spark (sort of) in-memory Hadoop In Memory compute Spark Streaming Mlib Machine Learning SparkSQL HDFS compatible Libraries for data processing, machine learning, streaming, SQL, etc Spark: in-memory distributed compute Python and Scala interfaces Part of the Berkeley Data Analytic Stack HDFS Tachyon in memory File system Mesos Cluster manager

79 Oracle 12c in-memory database Column store Memory (SGA) Row store Column Store (IMCU) OLTP Analytics (SMU) Redo Logs Data files

80 What does all this mean for me?

81 Trend #6: shameless product plugs will increase over the next 120 seconds 89

82 Toad: your companion in the Big Data revolution

83 Toad for Hadoop

84 SharePlex for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Capture Redo-logs Batched HDFS File Copy Audit / Change Data

85 Toad BI Suite join and analyse data from any source

86 Dell Statistica

87 Dell In-Memory Appliances for Cloudera Enterprise Starter Configuration 8 Node Cluster R720-4 Infrastructure Nodes R720XD- 4 Data Nodes Force10- S55 ~176TB (disk raw space) ~1.5TB (raw memory) Mid-Size Configuration 16 Node Cluster R720-4 Infrastructure Nodes R720XD- 12 Data Nodes Force10- S4810P Force10- S55 ~528TB (disk raw space) ~4.5 TB (raw memory) Small Enterprise Configuration 24 Node Cluster R720-4 Infrastructure Nodes R720XD- 20 Data Nodes ~880TB (disk raw space) ~7.5 TB (raw memory) Expansion Unit- R720XD-4 Data, Cloudera Enterprise Data Hub, Scale in Blocks

88 Dell appliances for any database Dell provides appliances and reference architectures specifically designed for: Oracle SQL Server HANA SSD database acceleration Large memory footprints

89 Big Data for the rest of us Success in Big Data requires capabilities at multiple technology levels: hardware, software infrastructure, business intelligence and analytics Only Dell can deliver capabilities at every technology layer Only Dell s solutions are designed and priced to suit mid-market initial deployments and to scale to the largest enterprise Advanced Analytics Business Intelligence Data Integration Systems Management Hadoop and database software Server and Storage Toad Data point Boomi Statistica Boomi, Toad Intelligence Central Dell Foglight and TOAD Dell appliances for Hadoop, Oracle, etc Dell servers and storage arrays

90 Thank you.

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our

More information

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved. Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Hurtownie Danych i Business Intelligence: Big Data

Hurtownie Danych i Business Intelligence: Big Data Hurtownie Danych i Business Intelligence: Big Data Robert Wrembel Politechnika Poznańska Instytut Informatyki [email protected] www.cs.put.poznan.pl/rwrembel Outline Introduction to Big Data

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Database Performance with In-Memory Solutions

Database Performance with In-Memory Solutions Database Performance with In-Memory Solutions ABS Developer Days January 17th and 18 th, 2013 Unterföhring metafinanz / Carsten Herbe The goal of this presentation is to give you an understanding of in-memory

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

A Survey of Distributed Database Management Systems

A Survey of Distributed Database Management Systems Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:

More information

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information

More information

3 Case Studies of NoSQL and Java Apps in the Real World

3 Case Studies of NoSQL and Java Apps in the Real World Eugene Ciurana [email protected] - pr3d4t0r ##java, irc.freenode.net 3 Case Studies of NoSQL and Java Apps in the Real World This presentation is available from: http://ciurana.eu/geecon-2011 About Eugene...

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013 SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Database Scalability and Oracle 12c

Database Scalability and Oracle 12c Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data [email protected] Warning I will be covering topics and saying things that will cause a rethink in

More information

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where

More information

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent. Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years [email protected] 2 What is Hadoop? An open-source

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products MaxDeploy Ready Hyper- Converged Virtualization Solution With SanDisk Fusion iomemory products MaxDeploy Ready products are configured and tested for support with Maxta software- defined storage and with

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside [email protected] About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Database Revolution: Old SQL, NewSQL, NoSQL Huh? Michael Bowers April 9, 2013 v2.9

Database Revolution: Old SQL, NewSQL, NoSQL Huh? Michael Bowers April 9, 2013 v2.9 Database Revolution: Old SQL, NewSQL, NoSQL Huh? Michael Bowers April 9, 2013 v2.9 Database Revolution: Old SQL, NewSQL, NoSQL Huh? Prepared by Michael Bowers 2013 03 28 v. 2.9 Abstract We are in the middle

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Peninsula Strategy. Creating Strategy and Implementing Change

Peninsula Strategy. Creating Strategy and Implementing Change Peninsula Strategy Creating Strategy and Implementing Change PS - Synopsis Professional Services firm Industries include Financial Services, High Technology, Healthcare & Security Headquartered in San

More information

Main Memory Data Warehouses

Main Memory Data Warehouses Main Memory Data Warehouses Robert Wrembel Poznan University of Technology Institute of Computing Science [email protected] www.cs.put.poznan.pl/rwrembel Lecture outline Teradata Data Warehouse

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information