FlexPod Big Data Solutions for Hadoop Iyer Venkatesan, Solutions Marketing Manager, NetApp Ashok Rajagopalan, Product Manager, SAVTG, Cisco May 30, 2013
FlexPod Summer 2013 Michael Harding, Marketing Manager, NetApp 2
FlexPod Summer 2013 Release New FlexPod validated designs for data protection, business applications and non-disruptive operations New FlexPod category naming Four new validated FlexPod designs within a single grouped payload New core data protection design, enhanced networking with multihop FCoE capability, and updates to Microsoft and SAP workloadspecific reference-architectures; also Hadoop validations FlexPod Event presence in the quarter Cisco Live (Orlando), Microsoft TechEd, SAP Saphire, Citrix Synergy, Cisco Partner Summit Field, Channel and end-customer communications Updated Sales tools & Collateral Social media outreach 3
FlexPod Summer 2013 Payload Four new published FlexPod reference architectures FlexPod with SnapProtect (7-mode) May-31 FlexPod with Microsoft Private Cloud and System Center (7-Mode) Jun-4 FlexPod with SAP (refresh) Jun-21 FlexPod with Nexus 7000 Series Switch for FCoE ( Wilbur ) Jul-10 Also, two new Hadoop validated designs Cloudera and Hortonworks 4
Key New Resources/Capabilities New unified NetApp-Cisco Financing program Removes overlap between Cisco and NetApp programs Allows program to be talked about earlier in sales cycle Demonstrates shared innovation Targeted for mid-june Mobile App Smartphone-based interactive brochure Available for iphone & ipad 5
FlexPod Big Data Solutions for Hadoop Objective: Help you position and sell the business value of FlexPod Big Data Solutions for Hadoop Topics Covered: Big Data Overview and Hadoop Fundamentals Solution Overview Hadoop Distributions, Cisco UCS, E-series Positioning vs classic FlexPod, Cisco Big Data Solution Use Cases Selling Strategies Customer Profile, Messaging, Value Proposition Support Model Sales Resources Next Steps, Call to Action 6
Introducing the Big Data World: Too much data, too many sources, can t use it the way you want to More Content More Devices Broader Consumption Big Data refers to datasets whose size is beyond the ability of current tools to capture, store, manage and analyze STRUCTURED DATA 80% of Data is UNSTRUCTURED Explosive Data Growth Doubling every two years 2005 2010 2015 7
How Big Data Challenges Show Up in Enterprises Enterprise Applications Transactions Database ERP, CRM 1 2 1 Slow data transformations. Missed SLAs. DATA EXISTS IN DIFFERENT PLACES Slow queries. Frustrated business and IT. Extract Transform Load ETL: Database to database transfer CAN T INGEST FAST ENOUGH ANALYSIS & PROCESSING 2 TAKES TOO LONG Query CAN T Data ASK NEW QUESTIONS Business Warehouse Intelligence CAN T 1 Transform ANALYZE UNSTRUCTURED DATA 3 Must archive. Archived data doesn t provide value.
Hadoop Overview and Why it Was Created Exploding Data Volumes & Types FILES AD IMPRESSIONS DIGITAL CONTENT SOCIAL MEDIA Good Fit WEB LOGS TRANSACTIONAL DATA Very large files (GB, TB, PB) SMART GRIDS OPERATIONAL DATA R&D DATA It s difficult to handle data this diverse, at this scale. Traditional platforms can t keep pace. Streaming data access Write once, read many Fault-prone hardware New opportunities to derive value from all your data. An open-source software framework that: Is scalable by enabling applications to work with thousands of nodes and petabytes of data Handles large files / data throughput and supports dataintensive distributed applications Two main components HDFS, file system which stores the Not Fit data Low-latency data access MapReduce process or algorithm Lots of small files Parallel computation Multiple writers, arbitrary file modifications 9 9
Hadoop Building Blocks Hadoop Distributed File System Divides a file into blocks, default block size is 64MB File smaller than a single block does not occupy a full block s worth of storage Large block size to minimize seek time HDFS replication, set at three, works at block level Replication factor can be adjusted Layers on top of typical Linux file system NameNode Coordinates data storage activities Manages the file system namespace Stores all the metadata in the RAM Tells DataNode their associated blocks Manages block replication Single point of failure in the cluster DataNode Stores blocks of files on top of native file system Serves read/write requests from the clients directly Performs block creation, deletion, replication Same block can be stored on multiple DataNodes for redundancy 10
Big Data and Hadoop Market Opportunity Use Cases/Workloads Driving Market Big Data TAM is $10B today Finance fraud detection, loan docs, insurance risk and rates, credit card Transactions, images Healthcare managing, storing images, Clinical trial effectiveness, genome analysis Government counter-terrorism, energy usage analysis, surveillance video ingest Identity theft, forecasting and modeling Retail customer patterns, social media analysis, inventory management, loyalty information 25 20 15 10 5 0 2010 2011 2012 2013 2014 2015 2016 $2.4B Storage $ 1BServer $ 400M Networking 2013 Software Cloud Infrastructure Services Networking Servers Services Storage Source: IDC Big Data fuels infrastructure needs Growth is 43% Enterprise Class Solution
FlexPod Big Data Solutions for Hadoop 12
FlexPod Big Data Solutions for Hadoop Validated for Cloudera and Hortonworks Hadoop Distributions Cisco UCS C-Series Rack Mount Servers Cisco UCS Fabric Interconnect Cisco UCS Manager Converged big data platform from NetApp and Cisco for Hadoop Enterprise class Hadoop: Innovative storage, servers, networking validated with leading Hadoop distributions Faster time to value: pre-validated configuration accelerates deployment High Availability: Less downtime, higher serviceability to meet tight SLAs around data applications and processes Flexible Scaling : Independently scale servers and storage. Modular design for scaling as data needs grow. NetApp FAS Storage Systems NetApp E-Series Storage Array * NetApp 50% Storage Guarantee http://www.netapp.com/us/solutions/infrastructure/virtualization/guarantee.html 13
Position E-Series Storage Systems for Big Data Connectivity with FibreChannel, InfiniBand, SAS, iscsi interfaces Up to 384 SAS drives, 1.44PB Up to 6,000 sustained MBps Performance Outstanding bandwidth drives bigger/faster solutions Extreme density Saves floor space and lowers operational costs Modular flexibility Custom configurations tuned to your needs Reliability, availability Ensure continuous high-speed data delivery Service disk drives while cluster is running 14
Systems = Disk Shelves + Controllers Flexible, Modular Architecture Any to Any E5400 E2600 DE6600 (4U/60) (60) 3.5 drives Highest throughput Largest capacity/density NL-SAS & SSD drives E5460 / E2660 DE5600 (2U/24) (24) 2.5 drives Highest throughput/ru Great performance/watt 10K SAS & SSD drives E5424 / E2624 DE1600 (2U/12) (12) 3.5 drives Lowest entry price NL-SAS drives E5412 / E2612 15
NetApp FAS and E-Series Storage Comparison Clustered ONTAP: Flagship Operating System Ideal for data centers deploying shared virtual infrastructures and/or cloud services Built for business and engineering applications and enterprise content repositories Offers advanced data management from storage layer E-Series: Performanceoptimized for dedicated workloads Perfect for application-specific infrastructures with unique requirements Designed for high performance applications and digital content workflows Ideal when data management resides in application layer 16 16
FlexPod Classic vs. FlexPod Big Data Solutions FlexPod Datacenter (Classic) Architecture includes UCS B or C- Series servers managed by UCS Manager/Fabric Interconnect NetApp FAS storage connected to N5K or N7K access layer (which is connected to UCS FI) Storage connectivity through Fabric Data protocol is FC/FCoE or Ethernet based iscsi/nfs/cifs Targeted for data center enterprise applications such as MSFT apps, VDI, SAP, Oracle, and so on FlexPod Big Data Solution for Hadoop Architecture includes UCS C-Series servers only managed by UCS Manager/Fabric Interconnect Primarily NetApp E-Series storage for data store E-series connected to UCS C-Series servers directly (SAS attached) Storage connectivity through directattach to server Data protocol is SAS Targeted for Hadoop and big data workloads such as Cloudera Hadoop, Hortonworks Hadoop NetApp FAS is used for NameNode store (not for data store) 17
Cisco CPA Solution vs FlexPod Big Data Solution Cisco Common Platform Architecture (CPA) 6200 Series UCS FIs and 2232 Nexus FEX 16 x C240M3 SFF (Performance) or 16 x C240M3 LFF (Capacity) Cisco UCS Manager/Central unified manageability Up to 7.2 PB per rack Internal LFF or SFF HDD for storage Modular architecture Cisco CPA vs FlexPod Big Data Cisco CPA is based on UCS C240M3 servers with internal SFF or LFF HDD, FlexPod Big Data Solution is based on NetApp E-series storage Key solution differentiators are: Enterprise class solution with hot spares FlexPod Big Data provides disk encryption FlexPod Big Data allows for 2- copy of Hadoop data (compared to 3) FlexPod Big Data provides backup and SanTricity based tools FlexPod Big Data provides single node data redundancy 18
Extending FlexPod Enterprise Application Ecosystem to Big Data Big Data Eco- System UCS Manager Deploy, Manage, Monitor Cisco Tidal Enterprise Scheduler Hadoop Connectors Enterprise Applications Extendable to multi-data center implementations for disaster recovery and business continuity UCS Rack-Mount Servers UCS Blade Servers NetApp FAS Availability Backup Snapshot 19
FlexPod Big Data Solution for Cloudera FlexPod for Cloudera is tested and documented by Cisco and NetApp Documented deployment process Deployment instructions Tuning Bill of Materials High-level sizing guidance Cisco Validated Design FlexPod for Cloudera Raghunath Nambiar, Cisco, Doug Rady, NetApp June 2013 TR-XYZA Cisco Validated Design 20
FlexPod Big Data Solution for Hortonworks Hortonworks Validation Documented Technical Report Deployment instructions Wiring diagrams Bill of Materials High-level sizing guidance Implementation or Build Guide Implementation Guide FlexPod for HortonWorks Raghunath Nambiar, Cisco, Prem Jain, NetApp June 2013 TR-XYZA 21
Use Cases Fraud Detection, risk analysis & recommendation engine for services Financial Provide millions of customers with personalized smart utility guidance based on capacity and comparative use data Utility Identify customer retention issues by improving user experience & operational effectiveness Service Provider Near real-time customer recommendation engine Data analysis of clinical study data records to measure the effectiveness From simple, resilient data storage to advanced analytics Retail Health Public Sector
Messaging and Positioning 23
FlexPod Big Data Solutions for Hadoop Pre-validated single rack and multi-rack solution 6296 Fabric Interconnects (connectivity and management), C220 M3 Servers (compute), NetApp E5460 (data storage) and FAS 2240 (meta data storage) with Hadoop Expanding the successful FlexPod model of pre-sized rack level configurations available through the well-established FlexPod sales engagement and channel to big data, purpose-build workloads Architectural Benefits UCS Fabric Interconnects provide high-speed, fully redundant, activeactive connectivity Unified fabric (single wire management) 66% reduction in switch port s 66% reduction in cables Direct SAN access with no additional hardware Powered by UCS C-Series Rack servers Form factor extension to UCS blade system UCS Manager Global view of the cluster Proactive monitoring of health 1 Click system software management UCS Central Unified management across cluster (up to 10,000 nodes) Application isolation Scalability: Modular building block, scalable up to 7.2 PB with single domain Performance: Best in class performance of compute, network and storage Management : Unified management across cluster (up to 10000 nodes) Business Benefits Operational Simplification: Simplified and policy-based management Expand as Data Needs Grow: Modular framework that can scale from small to very large Risk-reduction: Prevalidation, tighter integration and reduce deployment risks Availability: Higher uptime for Hadoop clusters 24
FlexPod Big Data Solutions for Hadoop NetApp and Cisco deliver enterprise class Hadoop for high availability, performance, scalability Cloudera or Hortonworks Hadoop Master Expansion Architected for the Enterprise Superior NameNode protection Faster recovery from failover Lower cluster downtime Faster time to value Validated, pre-sized configurations Low-latency, high bandwidth networking 12 DataNodes in master, 16 in expansion Co-existence with current applications and infrastructure Supports existing applications from SAP, Microsoft, Oracle Data management and monitoring with Cloudera Manager, Cisco UCS Manager 25
Service Level Expectations Around Data High-value time-sensitive problems Accelerate time to insights Fast deployment with validated, pre-configured, reference designs Store, process, analyze all data for new opportunities and business impact More time to focus on data analysis rather than deal with cluster downtime Making the Hadoop experience better Optimized, tuned, fully configured cluster Hadoop integrated with storage, compute, networking Monitoring and management tools with SANtricity and from partners (Cloudera Manager, Cisco UCS Manager) High density and capacity reduces data center footprint Reduce risk in an open ecosystem Compatibility with existing infrastructure and applications Best of breed partnerships, not entire stack from one vendor Future proof against lock-in and benefit from evolving ecosystem FlexPod Big Data Solution with Cloudera 26
Ease of Setup and Deployment 27
Where is the sweet spot? Existing FlexPod customer Needs help in deploying Hadoop Running Hadoop pilot but not in production Tight SLAs around data applications and processes Customer is running frequent ETLs ETLs are getting slow and becoming bottlenecks Challenging data requirements Lots of unstructured data Doesn t use one vendor for analytic stack
FlexPod Big Data Solutions for Hadoop Unified Support End User Hadoop Optional Reseller Support FlexPod Support Cloudera Hortonworks Cloudera or Hortonworks Provides Support Server Storage Network Reseller can offer optional L1/L2** Support Not required Unified Support * * Uses existing Unified Support Feature within the Support Edge Premium escalation processes, but not CSL lab environments ** Typically holds NetApp support entitlement 29
Call to Action Identify customers within FlexPod installed base that could be good fit Start a pilot or proof of concept Try and Buy, Evaluation equipment Download the CVD and Implementation Guides How the solution is built in the channel What services can be provided Learn about Big Data and Hadoop Cloudera and Hortonworks will offer sales, product training to Cisco, NetApp employees sales, technical architects etc. Cloudera.com and hortonworks.com for more general information 30
Sales Resources Joint collateral on Cisco-NetApp portal Sales presentation, customer presentation, technical presentation ABCs Web-based training Customer facing solution briefs, white papers CVD and Implementation Guide http://www.cisco.com/en/us/docs/unified_computing/ucs/ucs_cvds/ UCS_CVDshadoop_on_netapp.html HortonWorks (June availability) TAC training available here (p/w is TrainEm) https://cisco.webex.com/cisco/lsr.php? AT=pb&SP=TC&rID=67335587&act=pb&rKey=14ce58ec47d6463b Hadoop Distribution Information for partners http://www.cloudera.com/content/cloudera/en/partners.html http://hortonworks.com/resources/ 31
Thank You