Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1
Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases 2013 Cisco and/or its affiliates. All rights reserved. 2
Bringing Hadoop into Cisco IT in 2011-2012 Paradigm shift from database based application development of last 2 decades at Cisco IT - Cost Structure - Development Methodology & Project lifecycle - Programming Model - Maturity curve of the technology is different FUD Fear, Uncertainty and Doubt Availability of skilled workforce Rapid pace of innovation and constantly changing industry dynamics 2013 Cisco and/or its affiliates. All rights reserved. 3
Hadoop Journey in Cisco IT Use Cases Deployment Enterprise Data Lake 2014 Growth & Expanding Ecosytem POCs 2011 Multi-tenant Shared Platform July 2012 Starting 2013. 2013 Cisco and/or its affiliates. All rights reserved. 4
Key Decisions Rationale Open Source vs Distribution Architecture Operational Excellence, Availability, Performance, Skill set UCS Common Platform Architecture Support Growth & Leverage Ecosystem Hive (SQL), Mahout, Hbase, Cost & Ecosystem Environment Lifecycle Data Lake Production, Stage, Development & Technical POC (Isolate usage by Risk & Development lifecycle) Data Governance, Reduce cost, Eliminate duplication 2013 Cisco and/or its affiliates. All rights reserved. 5
Lessons from Technology Journey Architecture Choice (s) Multi-tenant Mission critical features Start Small & Grow Support: Open Source or Distribution Leverage Skills. Use components that help users leverage the existing skills like Informatica and SQL Tiered Integrated Architecture to manage data across multiple platforms 2013 Cisco and/or its affiliates. All rights reserved. 6
Lessons from Technology Journey Hive doesn t support ANSI SQL Reusable UDFs for Hive were created Tidal Enterprise Scheduler allowed for easy workload management and error handling Hadoop scales linearly and our platform grew 100% in the first year. Invest in architecture that allows you to grow. 2013 Cisco and/or its affiliates. All rights reserved. 7
Data Platform Reference Architecture v3 Data Sources Data Storage and Processing Data Consumption (Mobile / Browser / Data Service) Databases ALL other Sources Cisco Data Virtualization (Composite) Logical Data Abstraction Layer across transactional, SaaS, Big Data & DW Experience Toolkit Rapid Prototyping / Data Integration / Data Services Databases Agile Analytics Self Service Dashboard Rapid Business Intell. Customer Registry ERP SFDC Docs, Cases, Content, Social Media, Clicksteam Customer Network, Product Usage Internet of Everything (IoE) Big Data Platform Hadoop & Spark on UCS Machine Learning Data Archiving Data Science Network of Truth SAP HANA on UCS Prrediictive Engine Real time BI Mission Critical Reporting Legacy EDW Financial SSOTs Stable core Controlled Change Cisco Data Virtualization (Composite) Analytics & Modeling HANA Hadoop & Spark SAS Data Exploration Real time Predictive Data Analysis, Analytics Mission Critical Operational Reports Text Machine Learning,, Statistical Analysis (R) Machine Data Insights (e.g. In supply chain) Financial Reporting & Extract Operational Intelligence IT App & System Logs & Config. Index & Search Operational Intelligence(Splunk UI) 2013 Cisco and/or its affiliates. All rights reserved. 8
Shared Data Rich Analytics Engineering Advanced Services Cisco Services Marketing Enterprise Platform(s) IT Sales Security Finance Supply Chain 2013 Cisco and/or its affiliates. All rights reserved. 9
Enterprise Data Lake Metadata driven utilities to automate ingestion of Data Access Management Driven by Metadata Scalable Cost Effective 2013 Cisco and/or its affiliates. All rights reserved. 10
Hadoop Use Cases EDS CSTG Organization (vs) Adoption Level -icam -Party Ranking Service -Teradata ETL Offload -Data Lake Production -Connected Analytics Network Deployment (CAND) -Smart Call Home -Cloud Consumption (Sentinel) -NOS Online -Network SSOT Pipeline Marketing -Multi-Channel Scoring -Automatic Qualified Leads CWCS Metadata -Content Auto-Tagging CITS -Cisco Partner Annuity Initiative -Social Media Services GIS -Collaboration Dashboard -Item, BOM & Compliance Data Analytics Legal Supply Chain -Data Warehouse Expansion -Measurement -ACTS -TST 2013 Cisco and/or its affiliates. All rights reserved. 11
Cisco IT Use Cases for Hadoop in Production Data Platform Option to Reduce Cost Marketing & Content Management Services Risk & Compliance Migrate ETL Processing from EDW (Teradata) Data Lake & Adhoc Data Analysis Data Archiving Customer Segmentation Multi-Channel Scoring Content Autotagging Smart Analytics Offerings Service Opportunity Identification Organization Network Analytics Engineering Source Code Monitoring 2013 Cisco and/or its affiliates. All rights reserved. 12
Hadoop Distribution: MapR Advantage(s) for Cisco IT High Availability Distributed Name Node Snapshots Volume Based Disaster Recovery Performance Higher performance and fewer nodes ($) Operational Cost / Productivity HBase (MapR DB) and Hadoop on the same cluster NFS (Fully Read & Write) Multiple simultaneous versions on same cluster 2013 Cisco and/or its affiliates. All rights reserved. 13
Thank You 2013 Cisco and/or its affiliates. All rights reserved. 14