Apache Hadoop's Role in Your Big Data Architecture



Similar documents
Big Data Realities Hadoop in the Enterprise Architecture

Hadoop in the Enterprise

All You Wanted to Know About Big Data Projects Chida Jan 2014

HDP Enabling the Modern Data Architecture

Stinger Initiative: Introduction

HDP Hadoop From concept to deployment.

Modern Data Architecture for Predictive Analytics

Big Data and Government: What s the Big Deal? John Kreisa Chief Strategic Marketing Officer Hortonworks

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

The Future of Data Management

Hadoop, the Data Lake, and a New World of Analytics

Big Data: Making Sense of it all!

Comprehensive Analytics on the Hortonworks Data Platform

Apache Hadoop Patterns of Use

A Modern Data Architecture with Apache Hadoop

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

YARN Apache Hadoop Next Generation Compute Platform

#TalendSandbox for Big Data

Upcoming Announcements

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

The Future of Data Management with Hadoop and the Enterprise Data Hub

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Getting Started Practical Input For Your Roadmap

Talend Big Data. Delivering instant value from all your data. Talend

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Are You Ready for Big Data?

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

How Companies are! Using Spark

The Next Wave of Data Management. Is Big Data The New Normal?

Transforming the Telecoms Business using Big Data and Analytics

Evolution from Big Data to Smart Data

Cloudera Enterprise Data Hub in Telecom:

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Are You Ready for Big Data?

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Big Data: Are You Ready? Kevin Lancaster

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Open Source in Financial Services: Meet the challenges of new business models and disruption

Using Tableau Software with Hortonworks Data Platform

BIG DATA TRENDS AND TECHNOLOGIES

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Modernizing Your Data Warehouse for Hadoop

Modern Data Architecture for Retail with Apache Hadoop on Windows

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Proact whitepaper on Big Data

THE JOURNEY TO A DATA LAKE

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Enterprise Data Hub and The Modern Information Architecture

Success Story: Big Data Drives Profits. Brett Farrar. Founding Partner. Sendero Business Services ATC Fall Conference

A New Era Of Analytic

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

INVESTOR PRESENTATION. Third Quarter 2014

Microsoft Big Data. Solution Brief

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data and Industrial Internet

Copyright 2013 Splunk, Inc. Splunk 6 Overview. Presenter Name, Presenter Title

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Tap into Hadoop and Other No SQL Sources

Investor Presentation. Second Quarter 2015

How To Use Big Data For Business

SAP and Hortonworks Reference Architecture

INVESTOR PRESENTATION. First Quarter 2014

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

How the oil and gas industry can gain value from Big Data?

Next-Generation Cloud Analytics with Amazon Redshift

BIG DATA USING HADOOP

Driving Better Marketing Results with Big Data and Analytics David Corrigan, IBM, Director of Product Marketing

The Principles of the Business Data Lake

Integrating a Big Data Platform into Government:

China Bank BigData Usecase Huawei FusionInsight Solution

Big Data for Banking. Kaleem Chaudhry Senior Director, Sales Consulting, ASEAN. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

How To Use Big Data To Help A Retailer

Safe Harbor Statement

Enterprise Operational SQL on Hadoop Trafodion Overview

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

So What s the Big Deal?

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Big Data and Your Data Warehouse Philip Russom

Big Data. Fast Forward. Putting data to productive use

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Teradata s Big Data Technology Strategy & Roadmap

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Transcription:

Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks charris@hortonworks.com Twi<er : cj_harris5 Hortonworks Inc. 2012 Page 1

Agenda The Growth of Enterprise Data Hadoop Market Drivers Hortonworks an Overview The Future of Hadoop and Big Data Page 2

The Growth of Data in the Enterprise Data Explosion 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 Source: IDC By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. Gartner, Mark Beyer, Information Management in the 21st Century Page 3

Next Generation Data Architecture Drivers Business Drivers From reactive analytics to proactive customer interaction Find insights for competitive advantage & optimal returns Technical Drivers Data continues to grow exponentially Data is increasingly everywhere and in many formats Financial Drivers Cost of data systems, as % of IT spend, continues to grow Cost advantages of commodity hardware & open source

Market Transitioning into Early Majority relative % customers Innovators, technology enthusiasts Early adopters, visionaries The CHASM Early majority, pragmatists Late majority, conservatives Laggards, Skeptics time Customers want technology & performance Customers want solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 5

Most Common NEW TYPES OF DATA 1. Sentiment Understand how your customers feel about your brand and products right now 2. Clickstream Capture and analyze website visitors data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Value + Keep existing data longer!

Apache Hadoop Enterprise Use Cases Vertical Use Case Data Type Financial Services Telecom Retail New Account Risk Screens Text, Server Logs Fraud Prevention Server Logs Trading Risk Server Logs Maximize Deposit Spread Text, Server Logs Insurance Underwriting Geographic, Sensor, Text Accelerate Loan Processing Text Call Detail Records (CDRs) Machine, Geographic Infrastructure Investment Machine, Server Logs Next Product to Buy (NPTB) Clickstream Real-time Bandwidth Allocation Server Logs, Text, Sentiment New Product Development Machine, Geographic 360 View of the Customer Clickstream, Text Analyze Brand Sentiment Sentiment Localized, Personalized Promotions Geographic Website Optimization Clickstream Optimal Store Layout Sensor Page 7

Call Detail Records (CDRs) Telecom Data: Machine, Geo Business Problem Telcos perform forensics on dropped calls and sound quality Call detail records flow in at a rate of millions per second High volume makes pattern recognition and root cause analysis difficult, which need to happen in real-time Delay causes attrition and harms servicing margins Solution HDP can ingest millions of CDRs per second HDP facilitates data retention and root cause analysis Continuously improve call quality, customer satisfaction and servicing margins Hortonworks Inc. 2012 Page 8

Infrastructure Investment Telecom Data: Machine, Logs Business Problem Telecom marketing and capacity planning are coordinated Consumption of bandwidth and services can be out of sync with plans for new towers and transmission lines Mismatch between infrastructure investments and the actual return on investment puts revenue at risk Solution HDP helps telcos understand service consumption in a particular state, county or neighborhood Analyze Call Detail Records (CDRs) and network loads, more intelligently, over longer periods of time Plan infrastructure with more precision and less variability Hortonworks Inc. 2012 Page 9

Assembly Line Quality Assurance Manufacturing Data: Sensor Business Problem High-tech manufacturing uses sensors to capture data at critical steps in the manufacturing process Sensor data helps diagnose errors with returned products Much data is discarded, because of high storage costs Lean margins mean small budgets for data analysis Solution HDP stores unstructured, streaming, dirty sensor data Manufacturers can proactively analyze more data, over a longer time, to detect subtle issues otherwise undetected Sensor data managed with HDP can help a manufacturer reduce warranty costs and earn a reputation for quality Hortonworks Inc. 2012 Page 10

Supply Chain and Logistics Manufacturing Data: Sensor Business Problem Manufacturers need just-in-time availability of components Stock-outs cause harmful production delays Sensors and RFID tags reduce the cost of capturing more supply chain data, which needs storage and processing Solution HDP stores unstructured, streaming, dirty sensor data Manufacturers get lead time to make alternative arrangements for supply chain disruptions Prevent stock-outs, reduce supply chain costs and improve margins for the finished product Hortonworks Inc. 2012 Page 11

Fraud Prevention Financial Services Data: Server Logs Business Problem Financial institutions are always at risk of fraud Fraudsters test bank systems for vulnerabilities This testing leaves subtle patterns often undetected by bank employees or law enforcement Fraud losses costs banks millions Solution HDP reduces the cost to detect fraudulent activity HDP stores more types of data for longer Analysis of data in the data lake exposes fraudulent patterns that would have gone undetected Hortonworks Inc. 2012 Page 12

360 View of the Customer Retail Data: Clickstream, Text Business Problem Retailers interact with customers across multiple channels Customer interaction and purchase data is often siloed Few retailers can correlate customer purchases with marketing campaigns and online browsing behavior Merging data in relational databases is expensive Solution HDP gives retailers a 360 view of customer behavior Store data longer & track phases of the customer lifecycle Gain competitive advantage: increase sales, reduce supply chain expenses and retain the best customers Hortonworks Inc. 2012 Page 13

Analyze Brand Sentiment Retail Data: Sentiment Business Problem Enterprises lack a reliable way to track their brand health It is difficult to analyze how advertising, competitor moves, product launches or news stories affect the brand Internal brand studies can be slow, expensive and flawed Solution HDP allows quick, unbiased brand sentiment snapshots Analyze sentiment from Twitter, Facebook, LinkedIn or industryspecific social media streams Retailers better understand customer perceptions, to align their communications, products and promotions with those perceptions and expectations Hortonworks Inc. 2012 Page 14

Growth Pressures Existing Data Architectures APPLICATIONS Packaged Analy:c App Custom Analy:c App DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS RDBMS EDW MPP TRADITIONAL REPOS OPERATIONAL TOOLS MANAGE & MONITOR DATA SOURCES Tradi:onal Sources OLTP, POS SYSTEMS (RDBMS, OLTP, OLAP) Data growth 8% annually Page 15

An Emerging Data Architecture APPLICATIONS Packaged Analy:c App Custom Analy:c App DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS RDBMS EDW MPP TRADITIONAL REPOS ENTERPRISE HADOOP PLATFORM OPERATIONAL TOOLS MANAGE & MONITOR DATA SOURCES Tradi:onal Sources OLTP, POS SYSTEMS (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) Data growth 85% annually Page 16

Hortonworks & Teradata Unified Data Architecture The right technology on the right analytical problems using best of breed technologies Aster Connector for Hadoop SQL- H Aster- Teradata Connector SQL- H Teradata Connector for Hadoop Viewpoint Integration Common management console for Aster, Teradata and Apache Hadoop TVI: Teradata Vital Infrastructure Proactive reliability, availability, and manageability support service Aster Connector for Hadoop SQL-H integration Teradata Connector for Hadoop Sqoop integration Pre-tuned HDFS and MapReduce parameters for Big Data workloads Page 17

Agenda The Growth of Enterprise Data Hadoop Market Drivers An Overview The Future of Hadoop and Big Data Page 18

A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013 2004 2006 2008 2010 2012 2005: Yahoo! creates team under E14 to work on Hadoop 2011: Hortonworks created to focus on Enterprise Hadoop Enterprise Hadoop Page 19

Leadership Starts at the Core Driving next generation Hadoop YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery 420k+ lines authored since 2006 More than twice nearest contributor Deeply integrating w/ecosystem Enabling new deployment platforms (ex. Windows & Azure, Linux & VMware HA) Creating deeply engineered solutions (ex. Teradata big data appliance) All Apache, NO holdbacks 100% of code contributed to Apache Page 20

Agenda The Growth of Enterprise Data Hadoop Market Drivers Hortonworks an Overview The Future of Hadoop and Big Data Page 21

The 1 st Generation of Hadoop: Batch HADOOP 1.0 Built for Web-Scale Batch Apps Single App INTERACTIVE Single App ONLINE All other usage patterns must leverage that same infrastructure Single App BATCH Single App BATCH Single App BATCH Forces the creation of silos for managing mixed workloads HDFS HDFS HDFS

The Enterprise Requirement: Beyond Batch To become an enterprise viable data platform, customers have told us they want to store ALL DATA in one place and interact with it in MULTIPLE WAYS Simultaneously & with predictable levels of service BATCH INTERACTIVE ONLINE STREAMING GRAPH IN- MEMORY HPC MPI SEARCH HDFS (Redundant, Reliable Storage) Page 23

YARN: Taking Hadoop Beyond Batch Created to manage resource needs across all uses Ensures predictable performance & QoS for all apps Enables apps to run IN Hadoop rather than ON Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc. Applica:ons Run Na:vely IN Hadoop BATCH (MapReduce) INTERACTIVE (Tez) ONLINE (HBase) STREAMING (Storm, S4, ) GRAPH (Giraph) IN- MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave ) YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) Page 24

The Future of the Hadoop and Big Data The next generation data architecture evolving rapidly Store ALL data in a Hadoop data reservoir Push subsets of data to a final platform for processing Hadoop 2.0 takes Hadoop beyond Batch 2.0 YARN based architecture enabling mixed use workloads with enterprise resource management Enabling a new generation of applications at scale Based on new data types (sensor, sentiment, clickstream, etc.) or keeping existing types for much longer

Hortonworks Sandbox Hands on tutorials integrated into Sandbox HDP environment for evaluation Page 26

THANK YOU! Chris Harris charris@hortonworks.com Download Sandbox hortonworks.com/sandbox Page 27