Cisco IT Hadoop Journey

Similar documents
Cisco IT Hadoop Journey

How Cisco IT Built Big Data Platform to Transform Data Management

Platfora Big Data Analytics

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

HDP Hadoop From concept to deployment.

HDP Enabling the Modern Data Architecture

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Get More Scalability and Flexibility for Big Data

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Big Data Analytics Nokia

Comprehensive Analytics on the Hortonworks Data Platform

The Future of Data Management

Big Data Management and Security

Deploying Hadoop with Manager

How to Hadoop Without the Worry: Protecting Big Data at Scale

Cost-Effective Business Intelligence with Red Hat and Open Source

Please give me your feedback

Oracle Big Data SQL Technical Update

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Upcoming Announcements

Bringing Big Data to People

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Impact of PaaS on Business Transformation

Building Your Big Data Team

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Cisco Data Preparation

Apache Hadoop: Past, Present, and Future

Self-service BI for big data applications using Apache Drill

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Cisco Solutions for Big Data and Analytics

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Luncheon Webinar Series May 13, 2013

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Self-service BI for big data applications using Apache Drill

PROPRIETARY CISCO. Cisco Cloud Essentials for EngineersV1.0. LESSON 1 Cloud Architectures. TOPIC 1 Cisco Data Center Virtualization and Consolidation

Oracle Database 12c Plug In. Switch On. Get SMART.

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Ganzheitliches Datenmanagement

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

MapR Enterprise Edition & Enterprise Database Edition

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Data Analyst Program- 0 to 100

Dell s SAP HANA Appliance

Hadoop & Spark Using Amazon EMR

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Native Connectivity to Big Data Sources in MSTR 10

SAP and Hortonworks Reference Architecture

MDM and Data Warehousing Complement Each Other

Modern Data Architecture for Predictive Analytics

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Why EMC for SAP HANA. EMC is the #1 Storage Vendor for SAP (IDC Storage User Demand Study, Fall 2011)

Overview: X5 Generation Database Machines

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Moving From Hadoop to Spark

#TalendSandbox for Big Data

Modernizing Your Data Warehouse for Hadoop

Real-Time Big Data Analytics for the Enterprise

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

YARN Apache Hadoop Next Generation Compute Platform

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Information Builders Mission & Value Proposition

David Lawler Vice President Server, Access & Virtualization Group

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

HadoopTM Analytics DDN

Integrated Grid Solutions. and Greenplum

Big Data Analytics - Accelerated. stream-horizon.com

IBM System x reference architecture for Hadoop: MapR

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Transcription:

Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1

Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

Bringing Hadoop into Cisco IT in 2011-2012 Paradigm shift from database based application development of last 2 decades at Cisco IT - Cost Structure - Development Methodology & Project lifecycle - Programming Model - Maturity curve of the technology is different FUD Fear, Uncertainty and Doubt Availability of skilled workforce Rapid pace of innovation and constantly changing industry dynamics 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

Hadoop Journey in Cisco IT Use Cases Deployment Enterprise Data Lake 2014 Growth & Expanding Ecosytem POCs 2011 Multi-tenant Shared Platform July 2012 Starting 2013. 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

Key Decisions Rationale Open Source vs Distribution Architecture Operational Excellence, Availability, Performance, Skill set UCS Common Platform Architecture Support Growth & Leverage Ecosystem Hive (SQL), Mahout, Hbase, Cost & Ecosystem Environment Lifecycle Data Lake Production, Stage, Development & Technical POC (Isolate usage by Risk & Development lifecycle) Data Governance, Reduce cost, Eliminate duplication 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

Lessons from Technology Journey Architecture Choice (s) Multi-tenant Mission critical features Start Small & Grow Support: Open Source or Distribution Leverage Skills. Use components that help users leverage the existing skills like Informatica and SQL Tiered Integrated Architecture to manage data across multiple platforms 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Lessons from Technology Journey Hive doesn t support ANSI SQL Reusable UDFs for Hive were created Tidal Enterprise Scheduler allowed for easy workload management and error handling Hadoop scales linearly and our platform grew 100% in the first year. Invest in architecture that allows you to grow. 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

Data Platform Reference Architecture v3 Data Sources Data Storage and Processing Data Consumption (Mobile / Browser / Data Service) Databases ALL other Sources Cisco Data Virtualization (Composite) Logical Data Abstraction Layer across transactional, SaaS, Big Data & DW Experience Toolkit Rapid Prototyping / Data Integration / Data Services Databases Agile Analytics Self Service Dashboard Rapid Business Intell. Customer Registry ERP SFDC Docs, Cases, Content, Social Media, Clicksteam Customer Network, Product Usage Internet of Everything (IoE) Big Data Platform Hadoop & Spark on UCS Machine Learning Data Archiving Data Science Network of Truth SAP HANA on UCS Prrediictive Engine Real time BI Mission Critical Reporting Legacy EDW Financial SSOTs Stable core Controlled Change Cisco Data Virtualization (Composite) Analytics & Modeling HANA Hadoop & Spark SAS Data Exploration Real time Predictive Data Analysis, Analytics Mission Critical Operational Reports Text Machine Learning,, Statistical Analysis (R) Machine Data Insights (e.g. In supply chain) Financial Reporting & Extract Operational Intelligence IT App & System Logs & Config. Index & Search Operational Intelligence(Splunk UI) 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

Shared Data! Rich Analytics Engineering Advanced Services Cisco Services Marketing Enterprise Platform(s) IT Sales Security Finance Supply Chain 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

Enterprise Data Lake Metadata driven utilities to automate ingestion of Data Access Management Driven by Metadata Scalable Cost Effective 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

Hadoop Use Cases Organization (vs) Adoption Level Production Pipeline EDS CSTG - icam - Party Ranking Service - Teradata ETL Offload - Data Lake - Connected Analytics Network Deployment (CAND) - Smart Call Home - Cloud Consumption (Sentinel) - NOS Online - Network SSOT Marketing - Multi-Channel Scoring - Automatic Qualified Leads CWCS Metadata - Content Auto-Tagging CITS - Cisco Partner Annuity Initiative - Social Media Services GIS - Collaboration Dashboard - Item, BOM & Compliance Data Analytics Legal Supply Chain - Data Warehouse Expansion - Measurement - ACTS - TST 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

Cisco IT Use Cases for Hadoop in Production Data Platform Option to Reduce Cost Marketing & Content Management Services Risk & Compliance Migrate ETL Processing from EDW (Teradata) Data Lake & Adhoc Data Analysis Data Archiving Customer Segmentation Multi-Channel Scoring Content Autotagging Smart Analytics Offerings Service Opportunity Identification Organization Network Analytics Engineering Source Code Monitoring 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

Hadoop Distribution: MapR Advantage(s) for Cisco IT High Availability Distributed Name Node Snapshots Volume Based Disaster Recovery Performance Higher performance and fewer nodes ($) Operational Cost / Productivity HBase (MapR DB) and Hadoop on the same cluster NFS (Fully Read & Write) Multiple simultaneous versions on same cluster 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

Thank You 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

Cisco Hadoop Platform Physical Architecture Multi UCS cluster Hadoop environment Multi-Tenant model for PROD and DEV/Stage Production Capacity N7K Components Details Cisco UCS 62XXUP Fabric InterConnects ( Per Domain ) 8X 10 Gb/s Each 80 Gb/s 80 Gb/s 8X 10 Gb/s Each OS RHEL 6.4 Distribution MapR (M7) Server (node) UCS 240 M3 16 cores (w HT Hyper Threading 32 cores) Processor E5-2655 Cisco Nexus 2232PP 10 GE Fabric Extenders ( Per Rack) Scalability High Performance High Availability Operational Simplicity Operational Simplicity Unified Management Unified Management ZooKeeper, CLDB, WebServer, JobTracker 3 nodes each, File Server, TaskTracker across all nodes, Platfora 4 nodes Cisco Unified Computing System C240 M3 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 15 Memory/ Node Storage/Node No. of Nodes 54 Cores Total Memory Storage 256 GB 24*1 TB (22 HDFS) 864 (Hyper Threading enabled) 13824 GB 1188 TB No-SQL HBASE (MapR - M7)

2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16 16 Hadoop Lifecycles Components POC DEV QA Production Software OS RHEL 6.4 RHEL 6.4 RHEL 6.4 RHEL 6.4 Hadoop Distribution MapR M7 3.1.0 MapR M7 3.1.0 MapR M7 3.1.0 MapR M7 3.1.0 Server-Cluster Cisco UCS Servers UCS C210 M2 UCS C210 M2/ C240 M3 Processor Intel Xeon X5675 Intel Xeon X5675 UCS C240 M3 Intel Xeon X5675 UCS C240 M3 Intel Xeon E5-2655 Memory per Node 48 GB 48 GB / 256 GB 256 GB 256 GB Storage per Node (HDFS) 14*1 TB 7200 RPM SATA 14*1 TB / 22 *1TB 7200 RPM SATA 22*1 TB 7200 RPM SATA Rack Level No. of Nodes 4 18 8 54 Processors/Cores 48 240 128 864 22*1 TB 7200 RPM SATA Memory 4x48=192 GB 12x48 + 6x256 GB 8x256 GB 54x256 = 13.8 TB Storage Capacity ( 3 way Replication, Compression) 4x18 = 72 TB 12x14 + 6x22 = 257 TB 150TB 1188 TB

Cisco UCS Big Data Common Platform (CPA) A Highly Scalable Architecture Designed to Meet Variety of Scale-Put Application Demands " UCS Fabric Interconnects provide high-speed, fully redundant, active-active connectivity " Unified fabric (single wire management) " 66% reduction in switch ports " 66% reduction in cables " Powered by UCS C-Series Rack servers " Form factor extension to UCS blade system " UCS Manager " Global view of the cluster " Proactive monitoring of health " 1 Click system software management " UCS Central " Unified management across cluster (up to 10,000 nodes) " Application isolation 2013-2014 Cisco and/or its affiliates. All rights reserved. Business Benefits " Operational Simplification: Simplified and policy-based management Business Benefits " Modular Solution: Modular framework that can scale from small to very large " Risk Reduction: Pre-validation, tighter integration and optimizations reduce integration and deployment risk " Lower TCO: Unified fabric, unified management and infrastructure optimized for performance lowers TCO significantly Architectural Benefits " Scalability: Modular building block, scalable up to 7.2 PB with single management domain " Performance: Best-in-class performance of compute and network for massively scale-out applications " Management and Monitoring: Unified management across cluster (up to 10,000 nodes) Hadoop Requirements Distributed powerful computing Reliable Hardware Local storage in PB Low Latency Low Cost Scalability and Performance Manageability Cisco Confidential 17

Hadoop Platform Security Current State Penthao BI & DI Platform Hadoop Admins Business User Hadoop Developer/ Data Analyst Generic User ID Replication Used for Authentication Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr) Load Balanced Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr) CLDB MapR-FS, Job Tracker ZooKeeper Admin ACL to limit access Secure Shell Login Job Submission Tableau Dashboards Edge Servers Sqoop A tool for moving data to/from non-hadoop data stores Pig A high level data flow language Hive SQL like language to query and analyze data using MR Impala Interactive SQL tool on Hadoop Mahout Data mining algorithm using MR R Statistical & Machine Learning language Oozie A job control workflow Flume Tool to ingest/stream log data TES Agent To allow scheduled jobs to execute Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr) icam Servers 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18