Modern Data Architecture for Predictive Analytics



Similar documents
HDP Hadoop From concept to deployment.

Big Data Realities Hadoop in the Enterprise Architecture

HDP Enabling the Modern Data Architecture

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

Comprehensive Analytics on the Hortonworks Data Platform

SAP and Hortonworks Reference Architecture

Using Tableau Software with Hortonworks Data Platform

Hadoop, the Data Lake, and a New World of Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

#TalendSandbox for Big Data

Upcoming Announcements

Apache Hadoop's Role in Your Big Data Architecture

The Future of Data Management

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

A Modern Data Architecture with Apache Hadoop

Big Data: Making Sense of it all!

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Please give me your feedback

The Future of Data Management with Hadoop and the Enterprise Data Hub

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Bringing Big Data to People

The Enterprise Data Hub and The Modern Information Architecture

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Open Source in Financial Services: Meet the challenges of new business models and disruption

Apache Hadoop Patterns of Use

Modernizing Your Data Warehouse for Hadoop

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

Bringing the Power of SAS to Hadoop. White Paper

Hadoop Job Oriented Training Agenda

Talend Big Data. Delivering instant value from all your data. Talend

Luncheon Webinar Series May 13, 2013

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Hortonworks Data Platform for Hadoop and SAP HANA

THE JOURNEY TO A DATA LAKE

Ganzheitliches Datenmanagement

Big Data and Apache Hadoop Adoption:

Are You Ready for Big Data?

Apache Hadoop: The Big Data Refinery

Case Study : 3 different hadoop cluster deployments

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

The Evolving Apache Hadoop Eco-System

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big data for the Masses The Unique Challenge of Big Data Integration

HADOOP. Revised 10/19/2015

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Big Data and Data Science: Behind the Buzz Words

Information Builders Mission & Value Proposition

Stinger Initiative: Introduction

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Modern Data Architecture for Retail with Apache Hadoop on Windows

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

High Performance Predictive Analytics in R and Hadoop:

Are You Ready for Big Data?

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Delivering Value from Big Data with Revolution R Enterprise and Hadoop

Data Refinery with Big Data Aspects

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Native Connectivity to Big Data Sources in MSTR 10

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Extend your analytic capabilities with SAP Predictive Analysis

June JMS and Hadoop Agent. Automic Workload Automation

The Next Wave of Data Management. Is Big Data The New Normal?

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Data Security in Hadoop

Cisco IT Hadoop Journey

Big Data 101 Webinar

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Transforming the Telecoms Business using Big Data and Analytics

Manifest for Big Data Pig, Hive & Jaql

Microsoft SQL Server 2012 with Hadoop

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Integrating a Big Data Platform into Government:

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hadoop implementation of MapReduce computational model. Ján Vaňo

Are You Big Data Ready?

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Agile Business Intelligence Data Lake Architecture

Azure Data Lake Analytics

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

BIG DATA TRENDS AND TECHNOLOGIES

Cloudera Enterprise Data Hub in Telecom:

BIG DATA What it is and how to use?

Transcription:

Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1

Your Presenters David Smith (@revodavid) VP Marketing and Community at Revolution Analytics Data Scientist, Blogger and co-author of An Introduction to R John Kreisa (@marked_man) VP Strategic Marketing, Hortonworks Over 20 years in data management as a developer and a marketer Avid camper Hortonworks Inc. 2013 Page 2

Today s Topics Introduction Drivers for the Modern Data Architecture (MDA) Apache Hadoop in the MDA R s role in the MDA Q&A Hortonworks Inc. 2013 Page 3

Poll #1: What stage are you at looking in Hadoop? Research Evaluation Trial Haven t started research Hortonworks Inc. 2013 Page 4

SOURCES DATA SYSTEM APPLICATIONS Existing Data Architecture Business Analytics Custom Applications Packaged Applications DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MPP REPOSITORIES MANAGE & MONITOR Existing Sources (CRM, ERP, Clickstream, Logs) Hortonworks Inc. 2013 Page 5

SOURCES DATA SYSTEM APPLICATIONS Existing Data Architecture Business Analytics Custom Applications Packaged Applications 2.8 ZB in 2012 RDBMS EDW MPP REPOSITORIES 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 Source: IDC Existing Sources (CRM, ERP, Clickstream, Logs) Hortonworks Inc. 2013 Page 6

SOURCES DATA SYSTEM APPLICATIONS Modern Data Architecture Enabled Business Analytics Custom Applications Packaged Applications DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MPP REPOSITORIES MANAGE & MONITOR Existing Sources (CRM, ERP, Clickstream, Logs) Emerging Sources (Sensor, Sentiment, Geo, Unstructured) Hortonworks Inc. 2013 - Confidential Page 7

Hadoop Powers Modern Data Architecture Hadoop Cluster compute & storage.......... compute & storage Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hortonworks Inc. 2013 - Confidential Page 8

Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Driving Efficiency Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Driving Opportunity Big Data Sets Existing Traditional Server log Clickstream Emerging Sentiment/Social Machine/Sensor Geo-locations Hortonworks Inc. 2013 - Confidential

Opportunity in types of data 1. Sentiment Understand how your customers feel about your brand and products right now 2. Clickstream Capture and analyze website visitors data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur Value 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Hortonworks Inc. 2013 - Confidential Page 10

SOURCES DATA SYSTEM APPLICATIONS Efficiency in the Modern Data Architecture Business Analytics Custom Applications Packaged Applications Drive efficiency via modern data architecture Store data once and access it in many ways RDBMS EDW MPP REPOSITORIES Often referred to a data lake or data repository Infrastructure platform driven Existing Sources (CRM, ERP, Clickstream, Logs) Emerging Sources (Sensor, Sentiment, Geo, Unstructured) IT-oriented, TCO based Hortonworks Inc. 2013 - Confidential Page 11

SOURCES DATA SYSTEM APPLICATIONS Engineered for Interoperability BusinessObjects BI DEV & DATA TOOLS OPERATIONAL TOOLS RDBMS EDW MPP HANA INFRASTRUCTURE Existing Sources (CRM, ERP, Clickstream, Logs) Emerging Sources (Sensor, Sentiment, Geo, Unstructured) Hortonworks Inc. 2013 - Confidential Page 12

Requirements for Hadoop Adoption Requirements for Hadoop s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Key Services Platform, operational and data services essential for the enterprise Hortonworks Inc. 2013 - Confidential Page 13

SOURCES DATA SYSTEM APPLICATIONS Revolution R Enterprise Architecture Business Analytics Custom Applications Packaged Applications DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MPP REPOSITORIES MANAGE & MONITOR Existing Sources (CRM, ERP, Clickstream, Logs) Emerging Sources (Sensor, Sentiment, Geo, Unstructured) = Revolution R Enterprise Hortonworks Inc. 2013 - Confidential Page 14

Today s Topics Introduction Drivers for the Modern Data Architecture (MDA) Apache Hadoop s role in the MDA R s role in the MDA Q&A Hortonworks Inc. 2013 Page 15

Poll #2: Which of the following best describes your use of R and Hadoop? We have R+ Hadoop in Production We have testing R+ Hadoop We have started to investigate but nothing is implemented No current plans Hortonworks Inc. 2013 Page 16

What is the Open Source R Project? Revolution Confidential The R Language: Object-Oriented Language for Stats, Math and Data Science Comprehensive data visualization and statistical modeling capabilities The R Community: 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects New graduates with data skills learn R The R Ecosystem: 5000+ Freely Available Algorithms in CRAN Specialized methods for finance, economics, genomics, linguistics, and every data-driven domain 17

R is open source and drives analytic innovation but has Revolution Confidential some limitations for Enterprises Memory Bound Big Data Bigger data sizes Single Threaded Community Support Innovative 5000+ packages Exponential growth Scale out, parallel processing, high speed Commercial production support Combines with open source R packages where needed Speed of analysis Production support Innovation and scale

Revolution R Enterprise Revolution Confidential Revolution R Enterprise is the only commercial big data analytics platform based on open source R statistical computing language High Performance Analytics Big Data Analytics Cross-Platform Easier Build & Deploy Enterprise-Ready 19

Modern Data Architecture Extract and Analyze Ad-hoc Data Distillation Exploratory Data Analysis / Data Visualization Model Development SOURCE DATA INTERACTIVE Query/Visualization/ Reporting/Analytical Tools and Apps DBs AMBARI HIVE Server2 Fil Fil es es Files DATA REFINEMENT PIG HIVE CUSTOM ANALYTICAL rhadoop Analytical Tools JMS Queue s - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory REST HTTP STREAM LOAD SQOOP FLUME NFS WebHDFS MAPREDUCE YARN HDFS STRUCTURE HCATALOG (metadata services) LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES

The Data Scientist s Big Data Toolkit Revolution Confidential R Data Step Descriptive Statistics Statistical Tests Sampling Simulation Data Visualization Machine Learning Predictive Models 21

Parallel External-Memory Algorithms CPU CPU CPU SMP SERVER 22

Parallel External-Memory Algorithms HADOOP NODE HADOOP NODE HADOOP NODE HADOOP CLUSTER 23

Modern Data Architecture with RRE7 Revolution Confidential In-Hadoop Predictive Analytics Production Data Distillation (e.g. Semantic Analysis) Production Model Processing / Re-Estimation Production Model Scoring SOURCE DATA INTERACTIVE Query/Visualization/ Reporting/Analytical Tools and Apps DBs AMBARI HIVE Server2 Fil Fil es es Files JMS Queue s - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory REST HTTP STREAM LOAD SQOOP FLUME NFS WebHDFS PIG DATA REFINEMENT HIVE MAPREDUCE YARN HDFS DISTILLED DATA FILES CUSTOM ANALYTICAL Revolution R Enterprise STRUCTURE HCATALOG (metadata services) LOAD SQOOP/Hive Web HDFS Analytical Tools Data Sources CSV DATABASES

Hadoop As An R Engine Revolution Confidential Hadoop Use Revolution R Enterprise PEMAs in Hadoop No need to change existing R code Simple R programming No need to Think In MapReduce Eliminate data movement to slash latencies Use Hadoop nodes as parallel R computation engines 25

Requirements for Hadoop Adoption Requirements for Hadoop s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Key Services Platform, operational and data services essential for the enterprise Hortonworks Inc. 2013 Page 26

Poll #3: Which of the following would you most like to accomplish with R + Hadoop? Build a model to be put in product in Hadoop Build a model to be put in product elsewhere Create new data from Hadoop to supplement an existing analytics process Something else Hortonworks Inc. 2013 Page 27

Next Steps: More about Revolution Analytics and Hadoop http://www.revolutionanalytics.com/products/r-forhadoop.php Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox Follow us: @hortonworks @RevolutionR Hortonworks Inc. 2013 Page 28