Big Data and the Data Lake. February 2015



Similar documents
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

CONVERGE APPLICATIONS, ANALYTICS, AND DATA WITH VCE AND PIVOTAL

Internet of Things. Opportunity Challenges Solutions

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Enterprise Hybrid Cloud. Wong Tran

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

EMC ViPR Software Defined Storage

Third Platform Apps & EMC: Redefining IT & Helping Our Customers Lead The Way. Name

The Future of Data Management

Advanced In-Database Analytics

Journey to the cloud. Sergei Butenko District Manager EMC

TRANSFORMING DATA PROTECTION

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

HAVE YOUR AGILITY AND EFFICENCY TOO

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Bringing the Power of SAS to Hadoop. White Paper

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Cisco IT Hadoop Journey

EMC STRATEGY Journey to Cloud -Big Data

The Technology of the Business Data Lake

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

VIPR SOFTWARE- DEFINED STORAGE

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

The Next Generation Data Centers: SPECS and The 3 rd Platform.

The Technology of the Business Data Lake

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

Extend your analytic capabilities with SAP Predictive Analysis

Simple. Extensible. Open.

Virtualizing Apache Hadoop. June, 2012

Big Data 101: Harvest Real Value & Avoid Hollow Hype

場次: Track B-2 公司名稱: EMC 主講人: 藍基能

Cisco Solutions for Big Data and Analytics

Survey of Big Data Architecture and Framework from the Industry

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

locuz.com Big Data Services

Databricks. A Primer

NEXT GENERATION EMC: LEAD YOUR STORAGE TRANSFORMATION. Copyright 2013 EMC Corporation. All rights reserved.

Big Data Integration: A Buyer's Guide

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Cisco Data Preparation

Accenture and SAP: Delivering Visual Data Discovery Solutions for Agility and Trust at Scale

From Spark to Ignition:

BIG DATA GOVERNANCE: BALANCING BIG DATA VELOCITY & INFORMATION GOVERNANCE

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Find the Hidden Signal in Market Data Noise

Big Data Analytics Nokia

Databricks. A Primer

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Building Data-Driven Internet of Things (IoT) Applications

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Analytics Best Practices

EMC GREENPLUM DATABASE

In-Database Analytics

Protecting Big Data Data Protection Solutions for the Business Data Lake

VCE AND THE SIMPLIFIED DATACENTRE

EMC HEALTHCARE SOLUTIONS

High-Performance Analytics

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

SAP Real-time Data Platform. April 2013

Traditional BI vs. Business Data Lake A comparison

VCE PROFESSIONAL SERVICES PORTFOLIO OVERVIEW

How To Turn Big Data Into An Insight

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Interactive data analytics drive insights

OpenChorus: Building a Tool-Chest for Big Data Science

The Future of Data Management with Hadoop and the Enterprise Data Hub

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Luncheon Webinar Series May 13, 2013

Ironside Group Rational Solutions

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Green Migration from Oracle

2015 Ironside Group, Inc. 2

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

RED HAT AND HORTONWORKS: OPEN MODERN DATA ARCHITECTURE FOR THE ENTERPRISE

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Oracle Big Data SQL Technical Update

Integrating a Big Data Platform into Government:

This Symposium brought to you by

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

Native Connectivity to Big Data Sources in MSTR 10

Transcription:

Big Data and the Data Lake February 2015

My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act on. The goal of any project, interface or other work by this team should always be to get the data intelligence out of the data.

Precision Medicine at Precision medicine is changing the landscape of cancer treatment at Wake Forest Baptist, allowing us to provide our patients with more precise, targeted therapies. Using the latest DNA sequencing technology, our experienced team of oncologists and geneticists can identify the genetic makeup of a patient s tumor and tailor treatment to the specific cancer mutations (abnormalities). Our goal is to provide the best individualized cancer therapy designed for you. Targeted Cancer Therapy: No two cancers are alike. Every cancer has a unique genetic code, or blueprint, that shapes how it spreads and grows. Through genomic sequencing, our physicians can uncover genetic abnormalities or changes in a tumor that drive the growth of cancer. We then select treatments to specifically target these genes and attack the cancer, while sparing healthy tissues that the body needs. For adults and children who have active cancer and whose treatment is no longer working, precision medicine may be an option. Learn more.

Patient-Centric/Care Coordination Focus Internal Use Only Subject to Change

Wake Cloud: Hybrid Cloud Solution Greater Agility, Leverage Existing Skills & Process SELF SERVICE EMC CLOUD SERVICE PROVIDER MANAGEMENT & ORCHESTRATION CONVERGED INFASTRUCTURE SOFTWARE-DEFINED DATA CENTER Compute Storage Network VMWARE VCLOUD AIR VIPR SW-DEFINED STORAGE DATA PROTECTION VMAX VNX Isilon Data Domain Avamar VPLEX & RP

Business Benefits

Enterprise Information Market Trends Information Discovery and Visualization BI on the Go Content Intelligence And Discovery BI on the Cloud Demand Big Data Information Discovery and Visualization Advanced Analytics Predictive, Statistical, Data & Decision Sciences Social decision making Social media and analytics Managing the Information Value Chain Big Data Architectures, Quality, Governance Cohesive Information Architectures With Master Data Management 2015 2016 Time 7

Journey to Data Driven Enterprise Steps Archive Realize cost efficiencies and extend life of existing systems and Data migration Insights Integrate all existing data to generate business insights Data Analysis Apps Build Apps to assist/take (automated) actions from the insights generated Data Driven Apps Business Models Create new revenue streams leveraging new data and new insights Business Transformation Repeatable Framework Platform for experimenting data driven business models and innovation Experimentation Platform Technology Data Lake Platform as a Service Target Manager IT Leaders Business Leader CEO

Data lakes take advantage of commodity cluster computing techniques For massively scalable, low-cost storage of data files in any format. 13 Oliver Halter, PricewaterhouseCoopers LLP

Healthcare Data Lake Concept Wake Lake Internal Use Only Subject to Change 15

Wake Data Lake Functional Module Pivotal Functional Big Data Module 76 TB Usable Allowance for Growth GemFire XD Brings real-time data processing and analytics capabilities Future home of High Performance Computing, Research, Translational Medicine Integration of relational DBs with unstructured data In Memory Data Grid Analytic Data Warehouse Applications BI/Analytics Tools Greenplum Functional MPP Module 27.5 TB Greenplum DB Module (required) Future home of Enterprise Data Warehouse 2.0, TDW, other relational databases Commercial quality tools to manage Big Data, allowing relational data access (SQL) (Big) Data Staging Platform Data Science

EMC Data Lake Reference Architecture Apache Hadoop is at the heart of a data lake. EMC supports data lakes with enterprise management and enhanced data services provided by Pivotal. HAWQ Advanced Database Services A full-featured SQL interface to data in Hadoop. Spring XD The Spring Programming Framework lets you build Hadoop applications in a standardized, extensible fashion. Hadoop Virtualization Extensions (HVE) Pivotal integrates the open source Hadoop Virtualization Extensions bringing the flexibility of virtual infrastructure to Hadoop. Command Center The Command Center makes Pivotal HD, enterprise ready with automated deployment, configuration, monitoring and control. GemFire XD Brings real-time data processing and analytics capabilities to the 3 rd platform. Data Loader High performance data-loading building to ingest 100s of TB an hour. Internal Use Only Subject to Change 14

And the data goes where?? Your Use Case Gemfire Greenplum Pivotal HD with Hawk When do I need it? Now Later Later What doi want to do with it? How willi query and search? How do I need to store it? Whereis it coming from? Singular event processing, Transactions Structured analytics Structured,regular Ad hocsql Temporary Events/ Stream, file, ETL I do but not required to File, ETL Exploratory analytics Unstructured/ unknown I must andi am required to File, ETL

In-Database Analytics: Detail Data Access & Query Layer ODBC JDBC SQL In-Database Analytics Embedded Partner Open-Source Customized User-written Greenplum DB Embedded Analytics Greenplum Spatial Greenplum Text SAS Scoring Accelerator SAS/HPA High Performance Analytics SAS Access SAS Grid MADlib Open Source Analytical Algorithms Customized MADlib User-Written Analytical Algorithms GREENPLUM DATABASE

Chorus Analytics Studio Create, store, and share visual analytic workflows Build analytic flows for Greenplum, HAWQ, and Hadoop Powered by Alpine and MADlib 75+ drag-and-drop operators for the entire analytics process MADlib algorithms in-database

Data & Analytics Technology Ecosystem Analytics Business Intelligence Data Integration Social Media Services Data Modeling

How Does this Work in Practice? Store Everything Obsessively collect data Keep it forever Put the data in one place Analyze Anything Cleanse, organize, and manage your data lake Make the right tools available Use the resources wisely to compute, analyze, and understand data Build the Right Thing Use insights to iteratively improve your product

Questions?