White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

Similar documents
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Get More Scalability and Flexibility for Big Data

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Platfora Big Data Analytics

Cisco UCS with ParAccel Analytic Platform Solution: Deliver Powerful Analytics to Transform Business

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

IT Agility Delivered: Cisco Unified Computing System

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Accelerate Cloud Initiatives with Cisco UCS and Ubuntu OpenStack

Cisco UCS Business Advantage Delivered: Data Center Capacity Planning and Refresh

How Cisco IT Built Big Data Platform to Transform Data Management

Unified Computing Systems

How To Build A Cisco Ukcsob420 M3 Blade Server

Support a New Class of Applications with Cisco UCS M-Series Modular Servers

Veeam Backup & Replication Enterprise Plus Powered by Cisco UCS: Reliable Data Protection Designed for Virtualized Environments

Cisco UCS B-Series M2 Blade Servers

Architectural Comparison: Cisco UCS and the Dell FX2 Platform

Boost Database Performance with the Cisco UCS Storage Accelerator

Cisco IT Hadoop Journey

A Platform Built for Server Virtualization: Cisco Unified Computing System

Cisco Nexus Planning and Design Service

Cloud Ready: Architectural Integration into FlexPod with Microsoft Private Cloud Solution

Cisco and VMware: Transforming the End-User Desktop

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

Optimally Manage the Data Center Using Systems Management Tools from Cisco and Microsoft

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Cisco UCS C220 M3 Server

Integrated Grid Solutions. and Greenplum

Achieve Automated, End-to-End Firmware Management with Cisco UCS Manager

Cisco UCS and HP BladeSystem: A Comparison

Cisco UCS C24 M3 Server

IBM System x reference architecture for Hadoop: MapR

Cisco UCS B440 M2 High-Performance Blade Server

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Cisco UCS C220 M3 Server

How To Build A Cisco Uniden Computing System

Cisco UCS B460 M4 Blade Server

Cisco UCS Architecture and Management:

Cisco Unified Computing System: Meet the Challenges of Microsoft SharePoint Server Workloads

Going Big (Data) with MapR Hadoop and Cisco UCS

Cisco Unified Computing System Hardware

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Cisco UCS C420 M3 Rack Server

Large Unstructured Data Storage in a Small Datacenter Footprint: Cisco UCS C3160 and Red Hat Gluster Storage 500-TB Solution

Cisco-EMC Microsoft SQL Server Fast Track Warehouse 3.0 Enterprise Reference Configurations. Data Sheet

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Cisco Unified Computing System: Meet the Challenges of Microsoft SharePoint Server Workloads

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

Cisco UCS B200 M3 Blade Server

Apache Hadoop: Past, Present, and Future

White Paper. OLTP Reference Configuration Guidelines for Microsoft SQL Server 2008 R2 on Cisco UCS C-Series Rack-Mount Servers

HadoopTM Analytics DDN

BIG DATA TRENDS AND TECHNOLOGIES

Cisco Unified Data Center: The Foundation for Private Cloud Infrastructure

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Cisco UCS: Unified Infrastructure Management That HP OneView Still Can t Match

Cisco Data Center Network Manager for SAN

Dell Reference Configuration for Hortonworks Data Platform

Cisco Unified Computing System: Meet the Challenges of Virtualization with Microsoft Hyper-V

Cisco Unified Computing. Optimization Service

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Cisco Solution for EMC VSPEX Server Virtualization

Insurance Company Deploys UCS and Gains Speed, Flexibility, and Savings

Cisco Unified Computing System and EMC VNXe3300 Unified Storage System

StackIQ Enterprise Data Reference Architecture

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Dell In-Memory Appliance for Cloudera Enterprise

Dell s SAP HANA Appliance

Virtualizing Apache Hadoop. June, 2012

Protecting Big Data Data Protection Solutions for the Business Data Lake

nexsan NAS just got faster, easier and more affordable.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

I/O Considerations in Big Data Analytics

Oracle PeopleSoft Payroll 9.0 for North America on Cisco Unified Computing System. Contents. March 2012

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Deploying Hadoop with Manager

Cisco Unified Communications on the Cisco Unified Computing System

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Real-Time Big Data Analytics for the Enterprise

Capitalizing on Big Data Analytics. for the Insurance Industry 3

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

IBM Netezza High Capacity Appliance

Cisco Unified Computing System and EMC VNX5300 Unified Storage Platform

SQL Server Consolidation on VMware Using Cisco Unified Computing System

How To Write An Article On An Hp Appsystem For Spera Hana

Cisco UCS C-Series Rack-Mount Servers The Computing Platform for Virtualised Data Centres. Business Overview

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Cisco UCS B200 M1 and UCS B250 M1 Blade Servers. Table 1 compares the features of the Cisco UCS B-Series Blade Servers.

Deliver Fabric-Based Infrastructure for Virtualization and Cloud Computing

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

How To Get More Value From Your Data With Hitachi Data Systems And Sper On An Integrated Platform For Sper.Com

Lenovo ThinkServer and Cloudera Solution for Apache Hadoop

EMC BACKUP MEETS BIG DATA

Transcription:

White Paper Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

Contents Next-Generation Hadoop Solution... 3 Greenplum MR: Hadoop Reengineered... 3 : The Exclusive Platform for Greenplum MR... 6 Reference Configurations... 7 Excellence from Cisco and Greenplum... 12 Complete Big Data Analysis Solution... 12 Designed for High Availability and Reliability... 12 High Performance and Exceptional Scalability... 12 Simplified Management... 13 Coexistence with Enterprise Applications... 13 Rapid Deployment... 13 Enterprise Service and Support... 13 For More Information... 13 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 2

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations White Paper Highlights Optimized for Performance Greenplum and Cisco deliver an integrated Hadoop solution specifically engineered to handle the most demanding Hadoop workloads. Creates a Flexible Appliance Platform The Cisco Unified Computing System ( ) provides a flexible, high-performance platform that can be optimized and easily scaled for any size of Hadoop cluster. Ease of Management Manager provides unified, embedded management of all computing, networking, and storageaccess resources. Choice of Configurations The solution provides a choice of configurations, letting organizations select performance and capacity as their needs dictate Enterprise-Class Support and Services The Greenplum MR on Reference Configurations combine the support and services of two of the world s largest technology companies. Greenplum MR on provides companies with an integrated Hadoop solution that delivers advanced performance, full data protection, no single point of failure, and improved data-access features that can expedite the implementation of big data analytics environments. Next-Generation Hadoop Solution The worldwide leader in data center networking, and now a leading competitor in the server market, Cisco is partnering with Greenplum to provide a best-in-class big data solutions that meet a range of needs. The Greenplum MR on Reference Configurations deliver integrated, end-to-end software and hardware infrastructure that accelerates big data initiatives with a choice of performance and capacity. The combination of world-leading Cisco Unified Computing System () and Greenplum MR enables companies to significantly reduce time-tovalue and the operating expenses associated with Apache Hadoop implementations. Greenplum MR: Hadoop Reengineered Greenplum MR, based on the MapR M5 Distribution, is an implementation of the Apache Hadoop stack that enables near-real-time collection and organization of high volumes of structured, semistructured, and unstructured data distributed across a cluster of servers. Greenplum MR provides direct data input and output to the cluster with MapR Direct Access Network File System (NFS), offers real-time analytics, and is the first distribution to provide true high availability at all levels. Greenplum MR introduces the concept of logical volumes to Hadoop: a means of grouping data and applying policy consistently across an entire data set. Greenplum MR provides hardware status information and control with the MapR Control System, a comprehensive user interface that includes a heatmap that displays the health of the entire cluster at a glance. 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 3

and Greenplum MR can help businesses manage many different big data scenarios. The examples in Table 1 show how the Greenplum MR on Reference Configurations can accelerate big data initiatives. Table 1. Sample Use Cases for and Greenplum MR Scenario Cisco Greenplum MR Reference Configuration Capabilities Content management Collect and store unstructured and semi-structured data in a fault-resilient, scalable data store that can be organized and sorted for indexing and analysis. Batch processing unstructured data Batch-process large quantities of unstructured and semi-structured data: for example, data warehousing extract, transform, and load (ETL) processing. Medium-term data archive Archive data (medium-term, 12 to 36 months) from an enterprise data warehouse (EDW) database management system (DBMS) to increase the length of time that data is retained or to meet data retention and compliance policies. Integration with data warehouse Transfer data stored in Hadoop to and from a separate DBMS for advanced analytics. Customer risk analysis Perform a comprehensive data assessment of customer-side risk, based on activity and behavior across products and accounts. Personalization and asset management Create and model investor strategy and goals based on market data, individual asset characteristics, and reports entered into an online recommendation system. Trade analytics Analyze historical volume and trading data for individual stock symbols, variable cost of trades, and allocation of expenses. Credit scoring Update credit-scoring models using cross-functional transaction data and recent outcomes, to respond to changes such as the collapse of bubble markets. Sweep recent credit history to build transactional and temporal models. Retailer compromise Prevent or catch fraud resulting from a breach of retailer cards or accounts by monitoring, modeling, and analyzing high volumes of transaction data and extracting features and patterns. Miscategorized credit card fraud Reduce false positives and prevent miscategorization of legitimate transactions as fraud, using high volumes of data to build good models. Next-generation credit card fraud Perform daily cross-sectional analysis of portfolio using transaction similarities to find accounts that are being cultivated for eventual fraud, using common application elements, temporal patterns, vendors, and transaction amounts to detect similar accounts before the fraud is perpetrated. Customer retention Combine transactional customer contact information and social network data to perform attrition modeling to learn social and transaction markers for attrition and retention. Sentiment analysis (opinion mining) and bankruptcy Find better indicators to predict bankruptcy among existing customers using sentiment analysis from social networking, responding quickly before the warning horizon. 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 4

Greenplum MR Complete enterprise-class solution Elimination of the common problems experienced with HDFS Direct access through NFS High availability through JobTracker enhancements and a no- NameNode architecture From two to five times faster performance than other Hadoop distributions Advanced management for clusters of all sizes Robust data protection features, including snapshots and intercluster mirroring Comprehensive network of enterprise business intelligence tools As shown in Figure 1, the major components of Greenplum MR provide a Hadoop solution that is easy, dependable, and fast, with components that include: Advanced storage services: A replacement for the Hadoop Distributed File System (HDFS), MapR advanced storage services provide multidimensional scalability and accelerate MapReduce performance. The services allow random read and write operations while automatically compressing data in real time. MapR Heatmap: MapR Heatmap provides visibility, access, and tools that offer insight into the state of the cluster. Graphical and programmatic interfaces are designed to scale with the largest clusters. MapR Control System: MapR Control System provides real-time monitoring of the cluster health, including alarms to notify you of conditions that need to be corrected. Alarms can also be configured to trigger email notifications. MapReduce: Part of the Apache Hadoop framework, MapReduce simplifies the creation of applications that process large amounts of unstructured and structured data in parallel. Underlying hardware failures are handled transparently for user applications, providing a reliable and fault-tolerant capability. Hive: Hive is a data warehouse system for Hadoop that facilitates data summarization, impromptu queries, and analysis of large data sets. This SQL-like interface increases the compression of stored data for improved storage resource utilization without affecting access speed. Pig: Pig is a high-level procedural language for processing data sets in parallel using the Hadoop MapReduce platform. Its intuitive syntax simplifies the Greenplum MR for Apache Hadoop MapR Control System MapR Heatmap LDAP and NIS Integration Quotas, Alerts, and Alarms CLI and REST API Hive Pig Oozie Sqoop HBase Whirr HCatalog Mahout Cascading Nagios Integration Ganglia Integration Flume Zookeeper Easy Dependable Fast Direct Access NFS Real-Time Streaming Volumes Mirrors Snapshots Data Placement No NameNode Architecture High-Performance Direct Shuffle Stateful Failover and Self-Healing MapR Storage Services Figure 1. Greenplum MR Architecture and Components 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 5

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations development of MapReduce jobs, providing an alternative programming language to Java. HBase: HBase is the distributed, versioned, column-oriented database that delivers random, real-time, read-write access to big data. ZooKeeper: ZooKeeper is a highly available system for coordinating distributed processes. Applications use ZooKeeper to store and mediate updates to important configuration information. : The Exclusive Platform for Greenplum MR Validated through an extensive testing and development process at Greenplum and Cisco, is the exclusive hardware platform for Greenplum MR. innovations combine industry-standard, x86-architecture servers with networking and storage access into a single converged system (Figure 2). The system is entirely programmable using unified, model-based management to simplify and accelerate the deployment of enterprise-class applications and services running in bare-metal, virtualized, and cloud-computing environments. Cisco Unified Computing System Components Fabric Interconnects Integrate all components into a single management domain Up to 2 matching interconnects per system Low-latency, lossless, 10 GE and FCoE connectivity Uplinks to data center networks: 10 GE, native Fibre Channel, or FCoE Embedded unified, model-based management 6200 Series Fabric Interconnects 1 2 4 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Manager (Embedded) 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6248UP or 6296UP Data and Management Planes 2208XP Fabric Extender Cisco Fabric Extenders Distribute the unified fabric to blade and rack servers Scale data and management planes without complexity 2208XP: Up to 160 Gbps per blade chassis (with 6200 Series Fabric Interconnects) 2104XP: Up to 80 Gbps per blade chassis Cisco Nexus 2232PP: Integrates rack-mount servers into the system Data and Management Planes Cisco Nexus 2232PP 10GE Fabric Extender 2104XP Fabric Extender 5108 Blade Chassis Accommodates up to 8 half-width blade servers or 4 full-width blade servers Accommodates up to two fabric extenders for connectivity and management Straight-through airflow, 92% efficient power supplies, N+1, and grid redundancy 5108 Blade Server Chassis B-Series Blade Servers C-Series Rack Servers Scale Out - and Web 2.0 UCS B22 M3 CONSOLE B22 M3 UCS UCS C24 M3 C22 M3 C22 M3 C24 M3 B200 M3 B420 M3 Enterprise - Class - CONSOLE CONSOLE CONSOLE C220 M3 C240 M3 C260 M2 C460 M2 B250 M2 1 2 B230 M2 B440 M2 Mission - Critical - Servers Powered exclusively by Intel Xeon processors World-record-setting performance Comprehensive product line for ease of matching servers to workloads Every aspect of identity, personality, and connectivity configured by Manager Figure 2. Components 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 6

The Cisco Unified Computing System delivers a radical simplification of traditional architecture with the first self-aware, self-integrating, converged system that automates system configuration in a reproducible, scalable manner More than 60 world records on critical benchmarks The benefits of centralized computing, through a single point of management, delivered to massive scale-out applications Self-aware and self-integrating system Automatic server provisioning through association of models with system resources Standards-based, high-bandwidth, low-latency, lossless Ethernet network helps organizations gain more than efficiency: it helps them become more effective through technologies that foster simplicity rather than complexity. The result is a flexible, agile, and high-performance platform that reduces operating costs with increased uptime through automation and enables more rapid return on investment (ROI). Reference Configurations The Greenplum MR on Reference Configurations are based on Cisco s big data common platform architecture (CPA), a highly scalable architecture designed to meet variety of scale-out application demands with transparent data integration and management integration capabilities using the following components: 6200 Series Fabric Interconnects: The 6200 Series Fabric Interconnects are a core part of, providing both network connectivity and management capabilities across 5100 Series Blade Server Chassis and C-Series Rack Servers. Deployed in redundant pairs, the fabric Interconnects offer line-rate, low-latency, lossless 10 Gigabit Ethernet connectivity and unified management with Manager in a highly available management domain. 2200 Series Fabric Extenders: 2200 Series Fabric Extenders behave like remote line cards for a parent switch and provide a highly scalable and extremely cost-effective unified server-access platform. rack servers: Specific models are used to support the base, highperformance, and high-capacity configurations: -- C210 M2 General-Purpose Rack Server: C210 M2 servers are general-purpose 2-socket platforms based on the Intel Xeon processor 5600 series. These servers support up to 192 GB of main memory and 16 internal front-accessible, hot-swappable, Small Form Factor (SFF) disk drives, with a choice of one or two RAID controllers for data performance and protection. -- C240 M3 Rack Server: C240 M3 Servers are designed for both performance and expandability over a wide range of storageintensive infrastructure workloads. Each server provides sockets for up to two processors from the Intel Xeon processor E5-2600 product family and up to 768 GB of main memory. Up to 24 SFF or 12 Large Form Factor (LFF) disk drives are supported, along with four Gigabit Ethernet LAN-on-motherboard (LOM) ports. -- virtual interface cards (VICs): Unique to Cisco, VICs incorporate next-generation converged network adapter (CNA) technology from Cisco and offer dual10-gbps ports designed for use with C-Series Rack Servers. Optimized for virtualized networking, these cards deliver high performance and bandwidth utilization and support up to 256 virtual devices. 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 7

Manager: Manager resides in the 6200 Series Fabric Interconnects. It makes the system self-aware and self-integrating, managing all the system components as a single logical entity. Manager can be accessed through an intuitive GUI, a command-line interface (CLI), or an XML API. Manager uses service profiles to define the personality, configuration, and connectivity of all resources within, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives. It also provides the most streamlined, simplified approach commercially available today to firmware updating for all server components. Organizations deploying Hadoop clusters have different needs depending on the nature of their computational applications and storage requirements. Cisco understands these important distinctions and has structured its Hadoop reference configurations to accommodate a range of diverse needs Base reference configuration: Cisco s base reference configuration is designed for organizations that require balanced computing and storage capacity. The reference configuration uses C210 M2 servers, each with two Intel Xeon processors X5675, 96 GB of memory, and 16 1-TB 7,200-rpm SATA disk drives (Figure 3). 2 Intel Xeon Processors X5675 1 P81E VIC (2x 10 Gbps) 96 GB Memory Embedded Cisco IMC (2x 1 Gbps) 1 LSI 6G MegaRAID 9261-8i Card 2 Integrated Gigabit Ethernet Ports 16 1-TB 200 RPM SFF SATA Disk Drives Red Hat Enterprise Linux Server Standard Redundant Hot-Swappable Power Supplies and Fans Figure 3. B210 M2 Cluster Node for Base Configuration High-performance reference configuration: Many organizations need to optimize computing and memory performance in their Hadoop clusters, so Cisco s highperformance reference configuration uses C240 M3 servers, each with two Intel Xeon processors E5-2665, 256 GB of memory, and 24 1-TB 7,200-rpm SFF SATA disk drives (Figure 4). High-capacity reference configuration:. For organizations that require abundant storage capacity for the Hadoop cluster, the high-capacity reference configuration uses C240 M3 servers, each with two Intel Xeon processors E5-2640, 128 GB of memory, and 12 3-TB 7,200-rpm SAS disk drives (Figure 4). 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 8

2 Intel Xeon Processors E5 Family High-Performance Configuration (shown): VIC 1225 (2x 10 Gbps) 256 GB Memory Embedded Cisco IMC (2x 1 Gbps) 24 1-TB SATA 7200 RPM SFF Disk Drive LSI MegaRAID SAS 9226CV-8i Card High-Capacity Configuration (not shown): 4 Integrated Gigabit Ethernet Ports 128 GB Memory Red Hat Enterprise Linux Server Standard 12 3-TB SAS 7200 RPM LFF Disk Drives Redundant Hot-Swappable Power Supplies Figure 4. C240 M3 Cluster Node for High Performance and High Capacity Table 2 summarizes the capabilities of the three reference configurations. Table 2. Base, High-Performance, and High-Capacity Reference Configurations. Component Base Configuration High- Performance Configuration High Capacity Configuration Server-level capacity servers C210 M2 C240 M3 C240 M3 Processor Intel Xeon X5675 Intel Xeon M5-2665 Intel Xeon M5-2640 Memory 96 GB 256 GB 128 GB Storage 16 1-TB SFF 7.2K-RPM SATA 24 1-TB SFF 7.2K-RPM SATA 12 3-TB LFF 7.2K-RPM SAS Rack-level capacity (16 servers) Processor cores and threads 192 cores and 384 threads 256 cores and 512 threads 192 cores and 384 threads Memory 1536 GB 4096 GB 2048 GB Typical user storage capacity (3-way replication, compressed) 320 terabytes (TB) 480 TB 720 TB I/O bandwidth 21 GBps 32 GBps 17 GBps 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 9

Component Base Configuration High- Performance Configuration High Capacity Configuration Domainlevel capacity (10 racks) Processor cores and threads 1920 cores and 3840 threads 2560 cores and 5120 threads 1920 cores and 3840 threads Typical user storage capacity (3-way replication, compressed) 3.2 petabytes (PB) 4.8 PB 7.2 PB I/O bandwidth 210 GBps 320 GBps 170 GBps The Greenplum MR on Reference Configurations come in single- and multiple-rack form factors. The single-rack configuration consists of two fully redundant 6248UP 48-Port Fabric Interconnects (for up to five racks) or 6296UP 96-port Fabric Interconnects (for up to 10 racks) along with two Cisco Nexus 2232PP 10GE Fabric Extenders (Figure 5). Each node in the configuration connects to the unified fabric through two active-active 10-Gbps links using a VIC (for data traffic) and Cisco Integrated Management Controller (IMC; for management traffic). Multiple-rack configurations include the components for a single rack and two Cisco Nexus 2232PP fabric extenders for every additional rack. 6248UP or 6296UP Fabric Interconnects Cisco Nexus 2232PP 10GE Fabric Extenders C210 M2 or C240 M3 Rack Server Combined Data and Management Traffic Management Traffic Data Traffic Figure 5. Fabric Architecture 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 10

Figure 6 shows the components of the single-rack and multiple-rack configurations. Table 3 provides general guidelines for the number of instances of each service to run in a solution. 2x 6248UP or 6296UP Fabric Interconnects 2x Cisco Nexus 2232PP 10GE Fabric Extenders 2x Cisco Nexus 2232PP 10GE Fabric Extenders for Each Additional Rack 16x Servers 16x Servers for Each Additional Rack Single-Rack Configuration Multiple-Rack Configuration Figure 6. Base, High-Performance, and High-Capacity Configurations Can Be Built as Single- and Multiple-Rack Configurations Table 3. Node Recommendations for Greenplum MR Services Service Number of Nodes (per Rack) Container location database 1 to 3 FileServer Most or all nodes HBase Master 1 to 3 HBase RegionServer Varies JobTracker 1 to 3 NFS Varies TaskTracker Most or all nodes WebServer 1 or more Zookeeper 1, 3, 5, or a higher odd number 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 11

Excellence from Cisco and Greenplum Hadoop implementations can present a number of challenges to enterprise environments. Many of these challenges arise from the dichotomy between the introduction of innovative new technology and the enterprise-class performance, reliability, and support demanded by mission-critical systems. The collaboration between Cisco and Greenplum is specifically designed to provide a solution to these challenges. The joint Cisco and Greenplum solution delivers all the characteristics expected of a fully integrated solution, including radically simplified deployment and management, high availability, excellent performance, exceptional scalability, and enterprise-class service and support. Complete Big Data Analysis Solution The comprehensive solution from Cisco and Greenplum helps organizations deploy big data solutions quickly, with validated configurations that scale easily and predictably, as demand dictates. The Greenplum MR on Reference Configurations provide an end-to-end solution that has been tested and validated and that enables enterprise customers to accelerate big data initiatives. Designed for High Availability and Reliability Every component in the Cisco and Greenplum MR Hadoop solution is fully redundant. The Greenplum MR architecture provides JobTracker high availability (HA) and a no-namenode architeture to prevent lost jobs from causing timeconsuming restarts or failover incidents. Combining the core networking capabilities of Cisco, this solution can be extended to include remote mirroring to help ensure data reliability by synchronizing a copy of the cluster s data at a remote site so that data analysis can continue in the event of a disaster. Locally, snapshots protect data from application errors or accidental deletion. Snapshots also enable easy data recovery to a specific point in time by simply copying a file or directory from the snapshot location to the desired destination directory. High Performance and Exceptional Scalability unified fabric architecture provides fully redundant, highly scalable lossless 10-Gbps unified fabric connectivity for big data traffic. Powered by the latest Intel Xeon processor, the Cisco and Greenplum solution delivers best-inclass performance and internal storage capacity to gain at least two times faster performance than Apache Hadoop. The Greenplum MR on Reference Configurations can easily scale to support a large number of nodes when required by business demands. The advanced management capabilities of radically simplify this process with a single point of management that spans all nodes in the cluster. 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 12

Simplified Management Hadoop implementations tend to involve very large numbers of servers. In traditional environments, it can be challenging to manage these large numbers of servers effectively. Manager delivers unified, model-based management that applies personality and configures server, network, and storage connectivity resources, making it as easy to deploy hundreds of servers as it is to deploy a single server. Additionally, Manager can perform system maintenance activities such as firmware update operations across the entire cluster as a single operation. To ease the monitoring of these large clusters, the Greenplum MR control system raises alarms and sends notifications to alert IT personnel about cluster health and the status of services. Coexistence with Enterprise Applications In addition to introducing a Hadoop deployment, organizations need ways to transfer data transparently between their enterprise applications and Hadoop. This solution can connect, across the same management plane, to other deployments running enterprise applications, thereby radically simplifying data center management and connectivity. Rapid Deployment Deployment of large numbers of servers can take time. Systems need to be racked, networked, configured, and provisioned before they can be put into use. Manager uses a model-based approach to provision servers by applying a desired configuration to the physical infrastructure quickly, accurately, and automatically. The capability to create consistent configurations improves business agility and eliminates a major source of errors that can cause downtime. Enterprise Service and Support Enterprises want know that the vendors providing a solution have the expertise to help them quickly proceed through the design, deployment, and testing of strategic big data initiatives. Businesses also need to have confidence that if a critical system fails, they will be able to get timely and competent support. The Greenplum MR on Reference Configurations bring together world-class service and support from long-time collaborators Cisco and Greenplum. For More Information For complete details about, please visit http://www.cisco.com/go/ucs. For more information about Greenplum MR on, please visit: http://www.cisco.com/go/greenplum. 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 13

Americas Headquarters Cisco Systems, Inc. San Jose, CA Asia Pacific Headquarters Cisco Systems (USA) Pte. Ltd. Singapore Europe Headquarters Cisco Systems International BV Amsterdam, The Netherlands Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) LE-35101-04 09/12