A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle



Similar documents
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Agenda. Big Data. Dell Cloud Solutions A Dell Story Summary. Concepts Market Trends and Challenges Dell Solutions

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

The Future of Data Management

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Constructing a Data Lake: Hadoop and Oracle Database United!

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

NextGen Infrastructure for Big DATA Analytics.

HDP Hadoop From concept to deployment.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Protecting Big Data Data Protection Solutions for the Business Data Lake

The Future of Data Management with Hadoop and the Enterprise Data Hub

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Reference Architecture, Requirements, Gaps, Roles

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

More Data in Less Time

Architecting for the Internet of Things & Big Data

Customized Report- Big Data

Virtualizing Apache Hadoop. June, 2012

Oracle Big Data Essentials

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

So What s the Big Deal?

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

TUT NoSQL Seminar (Oracle) Big Data

High Performance IT Insights. Building the Foundation for Big Data

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

ZFS Storage Solutions for Unstructured Data Challenges

<Insert Picture Here> Big Data

Oracle Big Data Handbook

Safe Harbor Statement

Oracle Database 12c Plug In. Switch On. Get SMART.

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Simple. Extensible. Open.

Oracle Big Data Strategy Simplified Infrastrcuture

Testing Big data is one of the biggest

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

Big Data - Infrastructure Considerations

Where is... How do I get to...

Growth of Unstructured Data & Object Storage. Marcel Laforce Sr. Director, Object Storage

An Oracle White Paper June Oracle: Big Data for the Enterprise

Big Data Course Highlights

Tap into Hadoop and Other No SQL Sources

Next-Generation Cloud Analytics with Amazon Redshift

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Big Data. Lyle Ungar, University of Pennsylvania

Automatic Data Optimization

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

The Next Wave of Data Management. Is Big Data The New Normal?

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Luncheon Webinar Series May 13, 2013

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Big Data Analytics Nokia

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Introducing Oracle Exalytics In-Memory Machine

Oracle Big Data SQL Technical Update

Oracle Big Data Building A Big Data Management System

Oracle Big Data Fundamentals Ed 1 NEW

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

How to Enhance Traditional BI Architecture to Leverage Big Data

HDP Enabling the Modern Data Architecture

Big Data Are You Ready? Thomas Kyte

Big Data? Definition # 1: Big Data Definition Forrester Research

Transcription:

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020 Enterprise 45%/year growth in database data Analytics Customers top 50PB. Keep all your data. Cloud Public and private. IaaS, Paas, SaaS. Regulation 300 Exabytes in archives by 2015. Mobile, POS, sensors Mobile #1 Internet access device in 2013 Social Media and Commerce $30B/year in commerce by 2015 Cloud Applications 80% of new applications

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Plan for action Summary

Unstructured Data Analytics Matures Hadoop The engine of choice for unstructured data Chasm By architecture an independent data store

Hadoop Ecosystem and Technologies

But Hadoop is not The Matrix

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Plan for Action Summary

Data Types Used In Analytics Defined by the 3 Vs Variety, volume, and velocity

Optimizing Big Data Multiple Data Models with Multiple Strengths Relational DB Distributed File System Operational Data Historical Data Analytical Data Spatial XML Binary Graph or Document DB NoSQL DB Map-Reduce Data Quality Master/ Reference Metadata Explicit data model Large scale Structured data Power of relational model Real time and batch Robust security No/minimal data model Extreme scale Unknown data structures Flexibility of no schema Batch and near-real time Basic to no security

Aligning Cost and Business Value SLA Tier 3 Infrequent access Tier 2 Standard business data Tier 1 Business, time critical data Keeping most (or ALL) of your data Regression analysis Trends Compliance Cost versus value Value varies widely Business value of data Rising volume and use exceeds declining cost

Other Key Big Data Storage Factors Scalability with performance Security Data integrity

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Plan for Action Summary

With higher-end systems, there is a lot of data coming from all of the different business processes, from managing inventories to analyzing data for trends for future products. So in these systems, there is a real need to integrate a lot of different applications - a lot of different usages of the same massive amount of data - and that requires someone to think about how all of the pieces go together. Rich Partridge Senior Analyst, Ideas International

Big Data Storage Strategy Reapplying What We Have Already Learned to Big Data.. Multiple storage silos are generally costly and inefficient Hadoop storage is a silo by definition One centralized storage solution for all needs does not optimize either data or computation Storage architecture generally needs to evolve Green field implementations have few constraints but. Most data centers have existing, comprehensive network and storage architectures Store in concert with value Storing petabytes on high performance disks is not cost-effective

Big Data Functional Model Analytics is a Multi-function Task Consolidation Ingest Oracle DB Transactions and other structured data Analytics and Visualization Value Retention and Reuse Data bus Hadoop Unstructured data Other sources NoSQL, etc. Media

Big Data Storage Architecture Data focused architecture Integrated, end-to-end Heterogeneous, purpose-aligned storage Optimized data management Data aligned Processing engines aligned with data types and sources Maximized efficiency Minimized movement and copy Leveraged reuse Leveraged data transport (network)

Big Data Storage Architecture Shared storage Archival storage (Tape) Data management software Hadoop or NoSQL Cluster Data management software RDBMS (Transaction and data warehouse) Legacy servers and clusters

Big Data Integration Software The Magic for Optimizing the Movement of Data Function Component Unstructured data ingest Structured data ingest (ELT) RDBMS to HDFS HDFS to RDBMS HDFS to/from shared storage RDBMS to/from shared storage Tiered storage management

Aligning Cost and Business Value SLA Tier 3 Infrequent access Tier 2 Standard business data Tier 1 Business, time critical data Automation becomes key Business value of data

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Plan for Action Summary

Big Data Storage Implementations Unstructured-data dominated, greenfield solution/poc Hadoop Cluster Apache Flume

Big Data Storage Implementations Existing Hadoop Cluster with NAS Hosted Replication Sun ZFS 7420 Hadoop Cluster Apache Sqoop RDBMS (Transaction and data warehouse)

Big Data Storage Implementations Existing Fibre Channel Datacenter Solution Fibre Channel SAN StorageTek SL8500 HCA Oracle SAM Hadoop Cluster Oracle Hadoop Connectors RDBMS (Transaction and data warehouse) Legacy servers

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Plan for Action Summary

Call to Action I Implementing a Big Data Storage Architecture Start a Big Data Proof of Concept (POC) Small Hadoop cluster using Cloudera CDH5 R for statistical computing Build base expertise through the community

Call to Action II Implementing a Big Data Storage Architecture Complete a current data inventory Type and location and profile (value) Organic growth projection Project ingest requirements from contemporary sources The three V s: variety, volume, velocity Project your data retention needs Consider keeping ALL your data Assess need for specialized analytics and visualization

Call to Action III Implementing a Big Data Storage Architecture Complete a data flow analysis Quantify data movement Map to datacenter assets to identify requirements Storage and network Optimize flow and cost across processing engines Identify and test software components Align data value with cost

Agenda Big Data Using Hadoop, The First Wave Big Data Today, The Second Wave A Storage Model for Big Data Implementations Call to Action Summary

Big Data Storage Implementations Greenfield, Oracle Purpose-Aligned Integrated Solution Pillar Axiom Sun ZFS StorageTek Oracle SAM Oracle Big Data Appliance Oracle Hadoop Connectors Oracle Exadata Oracle Exalytics

Big Data Storage Implementations Virtualized, Network Infrastructure Shared storage Archival storage (Tape) Data management software Hadoop or NoSQL Cluster Data management software RDBMS (Transaction and data warehouse) Legacy servers and clusters

Big Data Storage Implementations Moving the Processing to the Data Shared storage with embedded Hadoop engines Archival storage (Tape) Data management software Data management software RDBMS (Transaction and data warehouse) Legacy servers and clusters

Big Data Integration Software Key components for optimizing the movement of data Function Oracle Component Open source POC Other Unstructured data ingest Apache Flume (Big Data Appliance) Apache Flume Hadoop startups Structured data ingest (ELT) Oracle Data Integrator RDBMS to HDFS Apache Sqoop (Big Data Appliance) Apache Sqoop Hadoop startup Custom DBs HDFS to RDBMS Oracle Hadoop Connectors HDFS to/from shared storage RDBMS to/from shared storage Custom Custom Custom/Vendor Hybrid columnar compression (HCC) Tiered storage management Oracle SAM

Summary A Big Data Storage Model Assess the value of your data Design for an integrated whole... with consideration for each function.ingest, consolidation, analytics and viz, value retention across a high performance network. Optimize data usage and movement. Incorporate existing data center architecture Store in concert with value

Thank You Dave Sunny Sundstrom Sunny.sundstrom@oracle.com 34

Oracle s Storage Strategy Key Elements in Big Data Implementations Quality-of-Service for multi-application consolidation End-to-end data transport across IB networks without copy Automated, policy based tiering across flash, SSD and tape Integration with Oracle applications and management Easy, in-place upgradeability from small to large storage systems Critical-path data access optimization Advanced fault prevention, detection and correction Maximum archival media life and automated data verification Data compression to disk and tape.