HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

Similar documents
The Future of Data Management

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

The Future of Data Management with Hadoop and the Enterprise Data Hub

More Data in Less Time

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Investor Presentation. Second Quarter 2015

Cisco Data Preparation

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Deploying an Operational Data Store Designed for Big Data

HDP Enabling the Modern Data Architecture

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Interactive data analytics drive insights

Oracle Big Data Building A Big Data Management System

Virtualizing Apache Hadoop. June, 2012

Please give me your feedback

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP

The Enterprise Data Hub and The Modern Information Architecture

Cisco, Big Data and the Internet of Everything. Paul Davies, Big Data Sales Solution Leader, EMEAR Data Center

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

HDP Hadoop From concept to deployment.

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Protecting Big Data Data Protection Solutions for the Business Data Lake

NextGen Infrastructure for Big DATA Analytics.

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Traditional BI vs. Business Data Lake A comparison

Cisco Solutions for Big Data and Analytics

Big Data Services From Hitachi Data Systems

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Databricks. A Primer

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

How to avoid building a data swamp

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

CAPITALIZE ON BIG DATA

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

Data Center Network Evolution: Increase the Value of IT in Your Organization

SAP HANA - an inflection point

Get More Scalability and Flexibility for Big Data

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Introducing Oracle Exalytics In-Memory Machine

Databricks. A Primer

Extend your analytic capabilities with SAP Predictive Analysis

Dell In-Memory Appliance for Cloudera Enterprise

YOUR CLOUD, YOUR WAY EXTEND YOUR I.T. TO LET INSIGHT HAPPEN ANYWHERE STEVE GARONE MAY 21, 2013

Microsoft Analytics Platform System. Solution Brief

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

The Open Cloud Near-Term Infrastructure Trends in Cloud Computing

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Architecting for the Internet of Things & Big Data

IBM InfoSphere BigInsights Enterprise Edition

Data Analytics Solution for Enterprise Performance Management

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

The Impact of PaaS on Business Transformation

locuz.com Big Data Services

Three Open Blueprints For Big Data Success

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Luncheon Webinar Series May 13, 2013

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Big Data and Natural Language: Extracting Insight From Text

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Tap into Hadoop and Other No SQL Sources

Big Data at Cloud Scale

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Real Time Big Data Processing

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Cloudera Enterprise Data Hub in Telecom:

Next-Generation Cloud Analytics with Amazon Redshift

Architecture & Experience

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

HadoopTM Analytics DDN

Big data: Unlocking strategic dimensions

How To Get More Value From Your Data With Hitachi Data Systems And Sper On An Integrated Platform For Sper.Com

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Dell s SAP HANA Appliance

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Native Connectivity to Big Data Sources in MSTR 10

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

INVESTOR PRESENTATION. First Quarter 2014

CONSOLIDATE MORE: HIGH- PERFORMANCE PRIMARY DEDUPLICATION IN THE AGE OF ABUNDANT CAPACITY

White Paper: Datameer s User-Focused Big Data Solutions

Transcription:

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

WEBTECH EDUCATIONAL SERIES HITACHI DATA SYSTEMS HADOOP SOLUTION Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data. Hadoop, the latest software platform that scales to process massive volumes of unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business. Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pretested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there. Attend this WebTech and learn how to Solve big-data problems with Hadoop. Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data. Implement Hadoop using HDS Hadoop reference architecture.

PRESENTERS Shankar Radhakrishnan, Solutions Manager, Hitachi Data Systems Sai Saiprabhu Director, Specialized Services, Hitachi Consulting Art Vancil Big Data Senior Manager, Hitachi Consulting Daniel Templeton, Partner Manager, Cloudera

ASK BIGGER QUESTIONS DANIEL TEMPLETON, PROGRAM MANAGER AT CLOUDERA 4

AMOUNT OF DATA Enterprise Data Evolution CREATE COMPETITIVE ADVANTAGE IMPROVE OPERATIONAL EFFICIENCY Combine data from across the business Ask new questions immediately Enable new real-time applications Process data faster Store data more cost-effectively Simplify infrastructure Data collection & reporting

DATA GROWTH Data Has Changed in the Last 30 Years END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES UNSTRUCTURED DATA 90% STRUCTURED DATA 10% 1980 2012

Data Management Strategies Have Stayed the Same Raw data on SAN, NAS and tape Data moved from storage to compute Relational models with predesigned schemas

Too Much Data, Too Many Sources Can t ingest fast enough

Too Much Data, Too Many Sources $ $ $! Can t ingest fast enough Costs too much to store $

Too Much Data, Too Many Sources 2 3 4 5 Can t ingest fast enough Costs too much to store Exists in different places 1

Too Much Data, Too Many Sources Can t ingest fast enough Costs too much to store Exists in different places Archived data is lost

Can t Use It The Way You Want To Analysis and processing takes too long

Can t Use It The Way You Want To 2 3 4 5 Analysis and processing takes too long Data exists in silos 1

Can t Use It The Way You Want To??? Analysis and processing takes too long Data exists in silos Can t ask new questions

Can t Use It The Way You Want To Analysis and processing takes too long Data exists in silos Can t ask new questions Can t analyze unstructured data

Cloudera Transform The Way You Think About Data 16

The Cloudera Approach Meet enterprise demands with a new way to think about data. THE OLD WAY Multiple platforms for multiple workloads THE CLOUDERA WAY Single data platform to support BI, Reporting & App Serving COMPLEX, FRAGMENTED, COSTLY Data silos by department or LOB Lots of data stored in expensive specialized systems Analysts pull select data into EDW No one has a complete view SIMPLIFIED, UNIFIED, EFFICIENT Bulk of data stored on scalable low cost platform Perform end-to-end workflows Specialized systems reserved for specialized workloads Provides data access across departments or LOB 17

Hadoop complements the Data Warehouse Enterprise Applications Data Warehouse Query (High $/Byte) Operational BI OLTP ETL Load Archive CLOUDERA Business Intelligence Transform Math Query Archival Data, Exploration, Analytics Store 18

Cloudera Enterprise: The Platform for Big Data A Revolutionary Solution Built on Apache Hadoop INGEST STORE EXPLORE PROCESS ANALYZE SERVE CDH CLOUDERA MANAGER CLOUDERA NAVIGATOR CLOUDERA SUPPORT BRINGS STORAGE & COMPUTE TOGETHER WORKS WITH EVERY TYPE OF DATA CHANGES THE ECONOMICS OF DATA MANGAGEMENT 19

CDH4 Big Data Storage, Processing & Analytics Based on Apache Hadoop Store Land structured and unstructured data in a 1 scalable, cost-effective repository Process & Analyze Transform data in parallel and query at the 2 speed of thought Integrate Interoperate with existing platforms, systems and 3 applications 20

Cloudera Manager End-to-End Administration for CDH Deploy Install, configure & start your cluster in 3 1 simple steps 2 Configure & Optimize Ensure optimal settings for all hosts & services Monitor, Diagnose & Report Find & fix problems quickly, view current & 3 historical activity & resource usage 21

Cloudera Navigator Data Management Layer for Cloudera Enterprise Audit & Access Control (AVAILABLE NOW) Ensuring appropriate permissions and reporting on 1 data access for compliance Exploration & Lineage (COMING SOON) Finding out what data is available, what it looks like 2 and where it came from 3 Lifecycle Management (COMING SOON) Migration of data based on policies 22

Cloudera Support Our Team of Experts on Call to Help You Meet Your SLAs Extend Your Team Get a dedicated team at your disposal to 1 help you solve problems quickly Leverage the Experts Take advantage of our expertise to make sure 2 your cluster operates at its best Influence Roadmaps Get advocacy with the open source community to 3 build the features and functionality you need 23

Cloudera Enterprise The Best Hadoop-Based Platform CDH4 Cloudera Manager The only solution with real time query (Impala) The only solution with HDFS high availability The most widely deployed & proven The broadest ecosystem of certified partners 100% open source & built for the enterprise Management for the complete Hadoop system The most mature & functionally advanced The easiest to use w/built-in intelligence Integration w/enterprise monitoring tools Cloudera Navigator Cloudera Support The only data management tool for Hadoop Cloudera Navigator 1.0: Data audit & access control Dedicated team with a global presence Contributors and committers for every part of CDH Tens of thousands of nodes under management across industries 24

A Complete Solution INGEST STORE EXPLORE PROCESS ANALYZE SERVE CLOUDERA UNIVERSITY DEVELOPER TRAINING ADMINISTRATOR TRAINING DATA SCIENCE TRAINING CDH CLOUDERA MANAGER CLOUDERA NAVIGATOR CLOUDERA SUPPORT CERTIFICATION PROGRAMS 25

CHOOSING THE RIGHT INFRASTRUCTURE ALTERNATE TITLE FOR HADOOP SLIDE SHANKAR RADHAKRISHNAN, PRESENTER SOLUTIONS PRODUCT NAME DATE MANAGER ORACLE, SAP HANA AND BIG DATA SOLUTIONS NOTE TITLE SLIDES Additional title slide options can be found in the HDS Icon and Slide Library. (View in slideshow mode to activate link.) Hitachi Data Systems Corporation 2013. All Rights Reserved.

HADOOP APPLICATION EXAMPLE: GENOME ANALYSIS National Institute of Genomics Japan Challenge: Accelerate the speed of analysis for genome data from next-generation sequencers 4 PB of data Solution 115-node Hadoop cluster using Hitachi Compute Rack servers Reliable and scalable solution

PROACTIVE MAINTENANCE AT HITACHI SERVER DIVISION Challenge Proactive hardware maintenance from logs, call center data, and product information Leverage historical data for future product development Solution: Hadoop + SAP HANA + SAP Visual Intelligence User Inquiry CRM Customer Data Callcenter Log Maintenance Report Server Log Sales/Financial Data Location Information Hardware Auditing Log Operation History Distribution/Stock Data BOM data Production Data Of Business System

INFRASTRUCTURE REQUIREMENTS FOR HADOOP DATA GROWTH Scale out to handle petabytes of unstructured and semistructured data Keep data closer to CPU COST Cost-effective for low-fidelity data Increase efficiency and utilization of resources and meet required service levels COMPLEXITY Hardware less prone to failures Easy to manage

HADOOP IN THE ENTERPRISE: ARCHITECTURE CxOs Data Scientist Business Users / Customers Business Intelligence Dashboard Data Warehouse Business Apps RDB Data Connector Hadoop Real-Time Computer (Streaming) Other Big Data Sources (Email, Audio, Documents, etc.) Outside Services Real Real Time (Connect Time Computer to Facebook Computer (Streaming) for CRM, (Streaming) etc.) One Platform for All Data, All Applications Hitachi Strength and Focus

INTRODUCING HITACHI REFERENCE ARCHITECTURE FOR HADOOP ENTERPRISE-READY INFRASTRUCTURE FOR HADOOP Pretested and validated for interoperability, performance, and scalability Flexible customize to fit application LAN Management Name Node + Job Tracker Secondary Name Node LAN Pre-validated using Cloudera, leading Hadoop distribution (certification in progress) Complementary to existing Hitachi platforms for block, file, and object Seamless management integration with other Hitachi solutions T A S K T R A C K E R D A T A N O D E - H D F S T A S K T R A C K E R D A T A N O D E - H D F S

REFERENCE ARCHITECTURE: HARDWARE COMPONENTS Qty Form factor Component Description 1 1U Management node Hitachi server CR 210H - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 5 x 3.5-inch 3TB NL-SAS 7200 RPM 1U 1U Switch-1 Switch-2 1 2U HDFS master name node - Name node - Job tracker Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 1 2U Secondary name node Hitachi server CR 220S - 2 x quad-core E2600 Series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 42U As needed 2U Data nodes - Data node - Task tracker Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 2 1U or 2U Ethernet switches (10 GbE network) Cisco Nexus 5548-48 x GigE / 10GigE or Brocade VDX 6720-60 - 40 x GigE / 10GigE form factor = 2U Internal HDD 2U Why Compute Rack Servers? High density (2U), high processing power (2 CPU sockets), large data storage (12 HDD) Redundant power supplies Eco-friendly power saving capabilities CR220S

REFERENCE ARCHITECTURE: SOFTWARE COMPONENTS Tested Software Component Version Description Operating System 6.3 Redhat or CentOS 64-bit Linux distribution Hadoop distribution CDH4 Cloudera Hadoop distribution Hadoop management Management framework 4.0.1 Cloudera Manager n/a Hitachi Compute Systems Manager Reference Architecture White Paper Targeted for June 2013 LAN Management Name Node + Job Tracker HA Name Node T A S K T R A C K E R D A T A N O D E - H D F S

WHY HITACHI FOR HADOOP INFRASTRUCTURE Enterprise-ready (RAS) for Hadoop Less worry about hardware failure, more focus on business value Seamless management integration with Hitachi solutions Lower opex Competitive pricing with commodity hardware Lower capex One platform solution for all your data volumes, velocity and types Lower TCO, faster ROI for your big data initiatives

HITACHI CONSULTING SAI SAIPRABHU, DIRECTOR, SPECIALIZED SERVICES ART VANCIL, BIG DATA SENIOR MANAGER 35

HITACHI CONSULTING As the global consulting company of Hitachi, Ltd., Hitachi Consulting brings business visions to life through in-depth industry expertise combined with innovative technology solutions and services From articulating strategy through deploying and maintaining applications, Hitachi Consulting helps clients quickly realize measurable business value and achieve sustainable ROI The Hitachi Consulting client base includes 35 percent of the Fortune 100 and 25 percent of the Fortune Global 100, along with many mid-market leaders. With offices in North America, Europe, the Middle East, and Asia, the company employs more than 5,000 professionals, with delivery centers in India and China for global delivery scale

WHAT DO WE SEE WITH OUR CLIENTS? Business Objectives Refinement Emerging Businesses Technology Adoption without disruption Business Intelligence Jump Start With Big Data Technologies Data Science Practice Adoption Business Intelligence Practice Adoption

DO YOU NEED AN EXECUTIVE SPONSOR? Perhaps your company has not yet started using Hadoop for big data initiatives. Or, perhaps you are stuck in "discovery mode" trying to find that golden nugget big idea from big data. If your company is like mine, you will not be given permission to simply play with Hadoop for months on end In most companies your time spent on a project needs to be backed by someone with a budget who wants to get something done. Let's look at successful methods to secure your big data executive sponsorship. The Internet has driven most businesses to demand better information much faster than ever before across almost every industry Examples: Retailers can influence the next shopping visit based on analytics; Amazon can tailor a shopping visit on a variety of dimensions (personalization, price incentives, product combinations, etc.). How will similar dynamics impact your company?

HOW DO I GET STARTED? If you have no budget for big data, then perhaps you are waiting for a stroke of luck? Award-winning luck #1 1. Your executive brings to you the justification for big data Award-winning luck #2 2. Your subject matter expert and your data scientist pour over the data until they find the golden nugget of justification Stop waiting, and begin now to collaborate with your business consultant to discover the data value and the essence of your big data business opportunity

THE NITTY-GRITTY DETAILS Hitachi helps you to choose your big data solution by targeting the message to your sponsor s role and asking the BIG QUESTIONS CEO/ CSO Predict the Future COO Optimize the Business Process CMO Nurture the Customer Relationship CFO/ CTO Deliver Faster and Cheaper

FOR EXAMPLE A high-end disk storage manufacturer collects daily performance data from its customers storage devices, but cannot effectively analyze it BECAUSE OF THE VOLUME The big questions to ask: If we stored the data in Hadoop, then Could we detect operational patterns that predict device failure worldwide? Could we anticipate the failure AND suggest a replacement without downtime? Could we sell the data analysis back to the customer for a fee? Could we reduce the support effort by delivering proactive notifications? How much revenue would we gain/costs would we eliminate?

SOLUTION SELECTION FRAMEWORK The solution discovery and evaluation process is a top-down survey of organizational leadership followed by a prioritization and ranking, based upon business value and organizational priorities All Possible Solutions and Purposes Solution Solution Solution Solution Solution Solution Feasible Solutions Solution Solution Solution Prioritized Big Data Solution Selection Solution

SPONSOR CONVERSATIONS: ESTABLISHED BUSINESS INTELLIGENCE ENVIRONMENT Specific use cases that address chosen pain points to be tackled using big data capabilities Measures that show how the use cases alleviate current pain points External expertise needed to augment your big data jump start Action plan to implement prioritized use cases and evaluate larger adoption of big data capabilities Executive sponsor buy-in Executive sponsor oversight Funding

LEVERAGE BIG DATA CAPABILITIES Extend Historical Transactions Availability Ability to Process Large Volumes Extend Data Staging, Volume Processing and Complex Data Processing Flexibility and Complexity Management Extend Complex Data Processing Extends Existing Data Management Environment Introduces New Analytic Capabilities Leverage Emerging Capabilities

BIG DATA TECHNOLOGIES: ADOPTION STRATEGY Protect Existing Investments That are Already in the Right Place. Introduce Big Data Technologies to Enable new and Evolving Business Needs Existing Transactional Sources 1 Protect Investments as Needed Structured Data Management and Existing Data Management Batch or Stream Enterprise Analytics Existing Analytic Capabilities Social Media Sources Expand as Demand grows Stream and Organize Introduce New Capabilities Stream and Organize 4 3 Big Data Appliance Introduce, Consolidate and Expand New Capabilities Stream and Organize Big Volume Data Analyses High Velocity Data Analyses Unstructured Data Analyses Streamline as the Environment Matures Current Augmentation to Structured Data Management (Limited) 2 Sporadic Analytic Capabilities

SPONSOR CONVERSATIONS: EMERGING BUSINESS INTELLIGENCE ENVIRONMENT Measures that help monitor business operations alignment with business strategies Business intelligence competencies needed to attain and sustain competitive edge External expertise needed to augment your Big data and business intelligence jump start Executive sponsor buy-in Executive sponsor oversight Funding Action plan to implement and evaluate larger adoption of big data business intelligence capabilities

NEXT STEPS Hitachi Unified Compute Platform for Business Analytics web page http://www.hds.com/products/hitachi-unified-compute-platform/business-analytics.html Contact your HDS sales rep for more information

QUESTIONS AND DISCUSSION

UPCOMING WEBTECHS WebTechs Take SAP HANA From Proof of Value Through Production Deployment, June 20, 9 a.m. PT, noon ET A Cloud You Can Trust Improve Datacenter Efficiency and Agility, June 26, 9 a.m. PT, noon ET Check www.hds.com/webtech for Links to the recording, the presentation, and Q&A (available next week) Schedule and registration for upcoming WebTech sessions

THANK YOU