Dell Cloudera Solution Reference Architecture v2.1.0



Similar documents
Dell Reference Configuration for Hortonworks Data Platform

Dell Red Hat Cloud Solutions Reference Architecture Guide

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

How To Write An Article On An Hp Appsystem For Spera Hana

Reference Architecture - Microsoft Exchange 2013 on Dell PowerEdge R730xd

Apache Hadoop Cluster Configuration Guide

NetApp Solutions for Hadoop Reference Architecture

SummitStack in the Data Center

Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0

Brocade Solution for EMC VSPEX Server Virtualization

SX1012: High Performance Small Scale Top-of-Rack Switch

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

Dell In-Memory Appliance for Cloudera Enterprise

How To Build A Cisco Ukcsob420 M3 Blade Server

An Oracle White Paper October How to Connect Oracle Exadata to 10 G Networks Using Oracle s Ethernet Switches

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

A Whitepaper on. Building Data Centers with Dell MXL Blade Switch

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

HP Cloudline Overview

Microsoft Exchange 2010 on Dell Systems. Simple Distributed Configurations

Private cloud computing advances

SX1024: The Ideal Multi-Purpose Top-of-Rack Switch

Platfora Big Data Analytics

SummitStack in the Data Center

Cisco UCS B200 M3 Blade Server

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Hortonworks Data Platform Reference Architecture

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cost Efficient VDI. XenDesktop 7 on Commodity Hardware

DVS Enterprise. Reference Architecture. VMware Horizon View Reference

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Dell s SAP HANA Appliance

Reference Architecture for Dell VIS Self-Service Creator and VMware vsphere 4

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Optimizing SQL Server Storage Performance with the PowerEdge R720

Intel RAID SSD Cache Controller RCS25ZB040

Cisco UCS B-Series M2 Blade Servers

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

MESOS CB220. Cluster-in-a-Box. Network Storage Appliance. A Simple and Smart Way to Converged Storage with QCT MESOS CB220

Cisco Nexus 5000 Series Switches: Decrease Data Center Costs with Consolidated I/O

Adobe Deploys Hadoop as a Service on VMware vsphere

Hadoop Architecture. Part 1

HP Reference Architecture for Hortonworks Data Platform on HP ProLiant SL4540 Gen8 Server

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell High Availability and Disaster Recovery Solutions Using Microsoft SQL Server 2012 AlwaysOn Availability Groups

MapR Enterprise Edition & Enterprise Database Edition

Parallels Cloud Storage

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

UCS M-Series Modular Servers

Data Communications Hardware Data Communications Storage and Backup Data Communications Servers

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Microsoft SharePoint Server 2010

Data Center Architecture with Panduit, Intel, and Cisco

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski

HUAWEI Tecal E6000 Blade Server

A QUICK AND EASY GUIDE TO SETTING UP THE DELL POWEREDGE C8000

Improving IT Operational Efficiency with a VMware vsphere Private Cloud on Lenovo Servers and Lenovo Storage SAN S3200

IBM System x reference architecture for Hadoop: MapR

HP reference configuration for entry-level SAS Grid Manager solutions

Unified Computing Systems

Get More Scalability and Flexibility for Big Data

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

Large Unstructured Data Storage in a Small Datacenter Footprint: Cisco UCS C3160 and Red Hat Gluster Storage 500-TB Solution

1000-Channel IP System Architecture for DSS

FLOW-3D Performance Benchmark and Profiling. September 2012

Cisco Unified Computing System Hardware

How To Design A Data Centre

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

Dell Force10. Data Center Networking Product Portfolio. Z-Series, E-Series, C-Series, and S-Series

Dell PowerEdge Servers Portfolio Guide Dell PowerEdge servers and the power to do more:

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER

High Performance SQL Server with Storage Center 6.4 All Flash Array

HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture

BUILDING A NEXT-GENERATION DATA CENTER

Lenovo ThinkServer and Cloudera Solution for Apache Hadoop

Achieving a High Performance OLTP Database using SQL Server and Dell PowerEdge R720 with Internal PCIe SSD Storage

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

A 10 GbE Network is the Backbone of the Virtual Data Center

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Big Data in the Enterprise: Network Design Considerations

Redundancy in enterprise storage networks using dual-domain SAS configurations

The Future of Cloud Networking. Idris T. Vasi

Cisco SmartPlay Select. Cisco Global Data Center Promotional Program

Pivot3 Reference Architecture for VMware View Version 1.03

Transcription:

Dell Cloudera Solution Reference Architecture v2.1.0 A Dell Reference Architecture Guide November 2012 Next Generation Cloud Solutions

Table of Contents Tables 3 Figures 4 Overview 5 Summary 5 Abbreviations 5 Dell Cloudera Solution 6 Solution Overview 6 Solution Taxonomy 7 Dell Cloudera Solution Hardware Architecture 8 High-level Architecture 8 Dell Cloudera Sizing Terms 9 Server Infrastructure Options 11 Dell Cloudera Solution Network Architecture 28 Network Components 28 Dell Open Switch Solution 34 IPv6 Capabilities 35 Network Connectivity 35 Dell Cloudera Solution Software Architecture 37 Linux File System Configuration Definition 37 Disk Partitioning Recommendation for the Name Node 37 Configuration Parameters: Recommended Values 37 Dell Cloudera Solution Cloudera Enterprise Software 40 Deployment Integration 40 Cloudera Manager 40 Cloudera Support 41 HDFS Highly Available Name Nodes 42 Dell Cloudera Solution Deployment Methodology 43 Site Preparation Needed for the Deployment 43 Dell Cloudera Solution Hardware Monitoring and Alerting 43 Nagios 44 Ganglia 44 Dell Cloudera Solution Security Design 44 What is Available in CDH4? 44 Implementing Secure Hadoop 44 Appendix A : Bill of Materials PowerEdge C8000 Series 46 Appendix B : Bill of Materials PowerEdge R720xd Nodes 55 Appendix C : Bill of Materials PowerEdge R720 Nodes 56 Appendix D : Bill of Materials PowerEdge R720xd Data node 58 Appendix E : Bill of Materials PowerEdge C6105 60 Appendix F : Bill of Materials Force10 Network Equipment 61 Network Equipment Notes 63 Appendix G : Bill of Materials Dell 6248 Network Equipment 64 Appendix H : Bill of Materials Software and Support 65 2.1.0 2 Dell Confidential

Appendix I : Dell Cloudera Solution Components Decoder Ring 66 Appendix J : External References 67 Update History 68 Changes in Version 2.1 68 To Learn More 68 Tables Table 1: Dell Cloudera Solution Use Cases 7 Table 2: Dell Cloudera Solution Software Locations 9 Table 3: Cluster Sizes PowerEdge C8000 Series 12 Table 4: Hardware Configurations PowerEdge C8000 Compute Sleds 13 Table 5: Hardware Configurations PowerEdge C8000 Storage Sleds 13 Table 6: Chassis Configuration PowerEdge C8000 Master Chassis 14 Table 7: Chassis Configuration PowerEdge C8000 High Availability Chassis 14 Table 8: Chassis Configuration PowerEdge C8000 Data Nodes 14 Table 9: Chassis Configuration PowerEdge C8000 Heavy Data Nodes 15 Table 10: Rack Configuration PowerEdge C8000 16 Table 11: Rack Configuration PowerEdge C8000 Heavy nodes 17 Table 12: Cluster Sizes PowerEdge R720xd 20 Table 13: Hardware Configurations PowerEdge R720xd 20 Table 14: Hardware Configurations PowerEdge R720/R720xd 22 Table 15: Rack Configuration PowerEdge R720xd (or R720/R720xd) 23 Table 16: Cluster Sizes PowerEdge C6105 with PowerEdge R720 25 Table 17: Hardware Configurations PowerEdge C6105 with PowerEdge R720 26 Table 18: Rack Configuration PowerEdge C6100 27 Table 19: Single Rack Network Equipment 30 Table 20: Multi Rack Network Equipment 32 Table 21: Dell Cloudera Solution Support Matrix 37 Table 22: HDFS parameters 37 Table 23: mapred parameters 38 Table 24: default environment 38 Table 25: hadoop-env.sh 38 Table 26: /etc/fstab 38 Table 27: hdfs (core-site) 39 Table 28: /etc/security/limits.conf 39 Table 29: Differences between Cloudera Manager Free Edition and Enterprise Edition 40 Table 30: Master Chassis PowerEdge C8000 46 Table 31: HA Chassis PowerEdge C8000 48 Table 32: Data Node Chassis PowerEdge C8000 51 Table 33: Heavy Data Node Chassis PowerEdge C8000 53 2.1.0 3 Dell Confidential

Table 34: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720xd 55 Table 35: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720 56 Table 36: Data node PowerEdge R720xd 58 Table 37: Data node PowerEdge C6105 60 Table 38: Network Equipment 1GbE Dell Force10 61 Table 39: Network Equipment 10GbE Dell Force10 62 Table 40: Network Equipment Dell 6248 (Optional) 64 Figures Figure 1: Dell Cloudera Solution Taxonomy 7 Figure 2: Dell Cloudera Hardware Architecture 8 Figure 3: PowerEdge C8000 Chassis 11 Figure 4: PowerEdge 720xd Server 19 Figure 5: PowerEdge C6105 24 Figure 6 Single Rack Networking Equipment 30 Figure 7 Multi-rack networking equipment 31 Figure 8: Multi-Rack View for 10G Servers Using Force10 s4810 Switches 32 Figure 9: Multi-Rack View for 10G Servers Using Force10 Z9000 (based on Layer-3) 33 Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2) 34 Figure 11: Dell Cloudera Compute PowerEdge R720xd Node Network Interconnects 35 Figure 12 Network Connections 36 Figure 13: HDFS with Highly Available Name Node 43 Figure 14: Kerberos Authentication in Hadoop 45 THIS PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. 2011 2012 Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. This document is for informational purposes only. Dell reserves the right to make changes without further notice to the products herein. The content provided is as-is and without expressed or implied warranties of any kind. 2.1.0 4 Dell Confidential

Overview Summary The document presents the reference architecture of the Dell Cloudera Solution for Apache Hadoop that Dell designed jointly with Cloudera. The reference architecture introduces all the high-level components, hardware, and software that are included in the stack. Each high-level component is then described individually. Abbreviations Abbreviation Definition BMC CDH DMBS EDW EoR HDFS IPMI NIC LOM OS ToR Baseboard management controller Cloudera Distribution for Hadoop Database management system Enterprise data warehouse End-of-row switch/router Hadoop File System Intelligent Platform Management Interface Network interface card Local area network on motherboard Operating system Top-of-rack switch/router 2.1.0 5 Dell Confidential

Dell Cloudera Solution Solution Overview The Dell Cloudera Solution lowers the barrier to adoption for organizations intending to use Hadoop in production. Hadoop is an Apache open source project being built and used by a global community of contributors, using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses. Other contributors and users include Facebook, LinkedIn, eharmony, and ebay. However, installing, configuring, and running Hadoop is not trivial. There are different roles and configurations that need to be deployed on various nodes. Designing, deploying, and optimizing the network layer to match Hadoop s scalability requires consideration for the type of workloads that will be running on the Hadoop cluster. These issues are complicated by both the fast-moving pace of the core Hadoop project and the challenges of managing a system designed to scale to thousands of nodes in a cluster. Dell s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on hyperscale hardware. Dell listened to its customers and designed a Hadoop solution that is unique in the marketplace, combining optimized hardware, software, and services to streamline deployment and improve the customer experience. The Dell Cloudera Solution embodies all the hardware, software, resources, and services needed to run Hadoop in a production environment. This end-to-end solution approach means that you can be in production with Hadoop in a shorter time than is typically possible with homegrown solutions. The Dell Cloudera Solution is based on the Cloudera Enterprise distribution of Hadoop, including CDH4. Cloudera has created a quality-controlled distribution of Hadoop and offers commercial management software, updates, support, and consulting services. The hardware platform for the Dell Cloudera Solution is the Dell PowerEdge C or R series. Dell PowerEdge servers are focused on hyperscale and cloud capabilities. Rather than emphasizing gigahertz and gigabytes, these servers deliver maximum density, memory, and serviceability while minimizing total cost of ownership. Dell s solution includes components that that span the entire solution stack: Reference architecture and best practices Optimized server configurations Optimized network infrastructure Dell Crowbar software framework for deployment and management at scale. Cloudera CDH Enterprise software Hadoop infrastructure management tools Monitoring with Ganglia and Nagios Integration with components from other ecosystem partners, including Pentaho and Datameer. One of Dell s core contributions to the Dell Cloudera Solution is a method to rapidly deploy and integrate Hadoop in production through the Dell Crowbar framework. Crowbar automates the deployment of the cluster from bare metal (no operating system installed) all the way to installing and configuring the Cloudera software components to your specific requirements. Going far beyond the capabilities of a simple PXE-boot installer, Crowbar handles system BIOS update and configuration, RAID/SAS configuration, operating system deployment, Hadoop software deployment, Hadoop software configuration, and integration with monitoring and alerting. These complementary functions are designed and implemented side-by-side with Hadoop core technology. This solution provides a foundation for Dell to offer additional solutions as the Hadoop environment evolves and expands. 2.1.0 6 Dell Confidential

The Dell Cloudera Solution is designed to address the following use cases: Use case Table 1: Dell Cloudera Solution Use Cases Description Data storage The user would like to be able to collect and store unstructured and semi-structured data in a fault-resilient scalable data store that can be organized and sorted for indexing and analysis. Batch processing of unstructured data Data archive Integration with data warehouse The user would like to batch-process (index, analyze, etc.) large quantities of unstructured and semi-structured data. The user would like medium-term (12 36 months) archival of data from EDW/DBMS to increase the length that data is retained or to meet data-retention policies/compliance. The user would like to transfer data stored in Hadoop into a separate DBMS for advanced analytics. Also the user may want to transfer the data from the DBMS back to Hadoop. Solution Taxonomy Figure 1: Dell Cloudera Solution Taxonomy Figure 1 describes the primary components in the Dell Cloudera Solution. The PowerEdge servers, the operating system, and the Java Virtual Machine make up the foundation on which the Hadoop software stack runs. The dark blue layer, depicting the core Hadoop components, comprises two frameworks: The Data Storage Framework is the file system that Hadoop uses to store data on the cluster nodes. Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system. The Data Processing Framework (MapReduce) is a massively-parallel compute framework inspired by Google s MapReduce papers. 2.1.0 7 Dell Confidential

The next layer of the stack is the network layer. This is a dedicated cluster network, implemented from a blueprint using tested and qualified components. This implementation provides predictable high performance without interference from other applications. The next three frameworks the Orchestration, the Data Access Framework, and the Client Access Tools are utilities that are part of the Hadoop ecosystem and provided by the CDH distribution. Dell Cloudera Solution Hardware Architecture High-level Architecture Figure 2: Dell Cloudera Hardware Architecture The Dell Cloudera environment consists of multiple software services, running on multiple server nodes. The solution implementation divides the server nodes into several categories, and each node has a configuration optimized for its purpose in the cluster. The categories are: Admin Node provides cluster deployment and management capabilities through Crowbar. Master Name Node runs all the services needed to manage the HDFS data storage and MapReduce task distribution and tracking. This is sometimes called the Name Node or Active Name Node. There are two types of services running on the Master Nodes: JobTracker (to support MapReduce job distribution) NameNode (to support HDFS data storage) Secondary Name Node runs the secondary namenode process, which provides a checkpoint function for the Master Name Node. In Active/Standby HA, this node runs a second namenode process, usually called the standby namenode process. In quorum-based HA mode, this node runs the second journal node. HA Node for Active/Standby HA, provides an NFS mount with durable storage that allows the Master Name Node and the Standby Name node to share an edits file so they stay in sync; for Quorum-based HA, provides the third journal node for HA the Master and Secondary Name Nodes provide the first and second journal nodes. 2.1.0 8 Dell Confidential

Edge Node provides an interface between the data and processing capacity available in the Hadoop cluster and a user of that capacity. The Edge Node is connected to the main access LAN, and is sometimes called a Gateway Node. Data Node runs all the services required to store blocks of data on the local hard drives and execute processing tasks against that data. The majority of the nodes in a cluster are Data Nodes. There are two types of services running on the Data Nodes: TaskTracker Daemon (to support MapReduce job execution) DataNode Daemon (to support HDFS data storage) Daemon JobTracker TaskTracker NameNode Secondary namenode Operating System Provisioning Chef Yum Repositories Cloudera Manager Zookeeper HMaster RegionServer Crowbar Admin Journal Table 2: Dell Cloudera Solution Software Locations Primary Location Master Name Node Data Node(x) Master Name Node Secondary Name Node Admin Node Admin Node Admin Node Edge Node(x) Data Node(x) Master Name Node Data Node(x) Admin Node Master Name Node, Secondary Name Node, HA Node Master Nodes can be further combined based on the size of the solution deployment. Consult with your Dell solution team for guidance on sizing and configuration. In addition to the Hadoop processes listed above, all nodes run additional software, such as Nagios, Ganglia, and chef-client. This software is used for cluster management through Crowbar. Dell Cloudera Sizing Terms The Dell Cloudera Solution Reference Architecture is organized into three components for sizing as the Hadoop environment grows. From smallest to largest, they are rack, pod, and cluster. Each has specific characteristics and sizing considerations documented in this reference architecture. The design goal for the Hadoop environment is to enable you to scale the environment by adding the additional capacity as needed, without the need to replace any existing components. Rack A rack is the smallest size designation for a Hadoop environment. A rack consists of all the necessary power, the network cabling, and the two Ethernet switches necessary to support up to 20 data nodes. These nodes should utilize their own power connectivity and space within the data center, separate from other racks, and be treated as a fault zone. 2.1.0 9 Dell Confidential

Pod Dell Cloudera Solution Reference Architecture Guide v2.1.0 A pod is an installation composed on three racks, based on server and network sizing. The three racks are capable of supporting enough Hadoop server nodes and network switches for a minimum commercial scale installation. In this reference architecture we discuss the administration and operational infrastructure to support three racks. Cluster A cluster is a set of racks dedicated to Hadoop that can be attached to a pair of distribution switches. It is a set of Hadoop nodes that share the same Name Node and management tools for operating the Hadoop environment. The size of the cluster can vary depending on the capacity of the aggregation network. For example, a Dell Force10 Z9000 aggregation switch can run a larger cluster than the Dell Force10 s4810 switches. 2.1.0 10 Dell Confidential

Server Infrastructure Options The Dell Cloudera Solution includes three choices for server infrastructure: Dell PowerEdge C8000 series Dell PowerEdge R720(xd) series Dell PowerEdge C6105 series Dell Cloudera Solution Reference Architecture Guide v2.1.0 These alternatives provide density and capacity choices to match customer requirements. The PowerEdge C8000 series and PowerEdge R720 series are recommended for new installations. The following sections describe the configurations required and the rack layouts. PowerEdge C8000 Series The PowerEdge C8000 series is Dell s hyperscale-inspired 4U shared infrastructure server that allows the mixing and matching of compute, storage, and GPU sleds in one chassis. The PowerEdge C8000 chassis holds up to eight single-wide compute PowerEdge C8220 server sleds, up to four double-wide PowerEdge C8220X compute/gpu sleds, or PowerEdge C8000XD storage sleds, or a combination of these, and two power sleds. This design allows the right balance of CPU-to-memory-to-disk ratio and large-scale storage nodes requiring 24 or more hard drives to run big data applications faster. The flexible PowerEdge C8000 can run Master, Slave, and Edge Hadoop nodes and multiple workloads from the same chassis or across racks, allowing for better use of IT resources, lower total cost of ownership over the lifecycle of the server, and more efficient use of space while increasing Hadoop POD compute/storage density and performance. Figure 3: PowerEdge C8000 Chassis PowerEdge C8000 feature summary: Up to eight independently serviceable PowerEdge C8220 compute sleds, four PowerEdge C8220x compute sleds, or four PowerEdge C8000XD storage sleds in a 4U rack chassis Cold aisle service Intel E5-2600 series processors with up to eight cores and support for up to 130W TDP Up to 256GB of memory with 16 DDR3 slots at 1600MHz per node (512GB RTS+) PowerEdge C8220 Single Width Compute (SWC) Up to 2 x 2.5-inch non-hot-plug hard drives per PowerEdge C8220 compute sled PowerEdge C8220X Double Width Compute (DWC) Up to 12 x 2.5-inch or 4 x 3.5-inch hot-plug hard drives per PowerEdge C8220X compute Up to 2 x 2.5-inch non-hot-plug hard drives per PowerEdge C8220X compute Up to 2 x 2.5-inch hot-plug hard drives per PowerEdge C8220X compute PowerEdge C8000XD Double Width Storage (DWS) Up to 12 x 3.5-inch or 12 x 2.5-inch hot-plug hard drives or 24 x 2.5-inch SSDs per PowerEdge C8000XD storage sled 2.1.0 11 Dell Confidential

Cluster Sizing The minimum configuration supported is eight nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes A minimum configuration can be implemented in three PowerEdge C8000 chassis, if one of the data nodes is installed in the HA chassis. When using NFS-based HA, the HA node provides a shared NFS mount. In quorum-based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 3 shows the minimum and maximum numbers of nodes for each rack, pod, and cluster within this reference architecture. Table 3: Cluster Sizes PowerEdge C8000 Series Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node 1 1 1 1 1 1 Master Name Node 1 1 1 1 1 1 Secondary Name Node 1 1 1 1 1 1 HA Node Edge Node Data node 1 1 1 2 2 1 1 1 2 3 3 12 3 36 36 To be determined based on sizing criteria To be determined based on sizing criteria To be determined based on sizing criteria Heavy Data node 3 8 (12 in racks 4 and up) 3 24 24 To be determined based on sizing criteria 2.1.0 12 Dell Confidential

Hardware Configurations Machine Function Sled 1 Active and Secondary Name Node Table 4: Hardware Configurations PowerEdge C8000 Compute Sleds Admin Node, HA Node Edge Node Data Node Heavy Data Node PowerEdge C8220X Processor 2 x E5-2600 (8-core) RAM (Minimum) 128 GB 64 GB 64 GB LOM Network Controller DISK (onboard) 2 x Intel X520 10GbE NIC, Dual Port, SFP+,Low Profile 2 x 1GbE None Intel X520 10GbE NIC, Dual Port, SFP+,Low Profile DISK (hotswap) N/A 2 x 2.5-in. 1TB 2 x 2.5-inch 1TB DISK (side) 6 x 1 TB 2.5-in. SATA 4 x 3 TB 3.5-in. NL SAS 4 x 3 TB 3.5-in. NL SAS DISK (expansion) None 1 x C8220XD 36Tb 2 x C8220XD 72Tb Storage Controller LSI 2008 (Mezzanine) Storage Controller 2 None LSI 9202 (PCI) RAID RAID 1 JBOD JBOD Table 5: Hardware Configurations PowerEdge C8000 Storage Sleds Machine Function Active and Secondary Name Node Admin Node, HA Node Edge Node Data Node Sled 2 N/A PowerEdge C8220XD DISK N/A 12 x 3 TB 3.5-in. Nearline SAS (NL-SAS) Sled 3 N/A PowerEdge C8220XD DISK N/A 12 x 3 TB 3.5-in. Nearline SAS (NL-SAS) 2.1.0 13 Dell Confidential

Table 6: Chassis Configuration PowerEdge C8000 Master Chassis C8220X C8220X C8220X DWC DWC Power Power DWC Empty Empty (Master) (Admin) (Edge) Refer to Table 30 in Appendix A : for the bill of materials for this chassis. Table 7: Chassis Configuration PowerEdge C8000 High Availability Chassis C8220X DWC C8220X DWC Power Power C8220X DWC C8220XD DWS (Secondary) (HA) Refer to Table 31 in Appendix A : for the bill of materials for this chassis Table 8: Chassis Configuration PowerEdge C8000 Data Nodes C8220X DWC C8220XD DWS Power Power C8220X DWC C8220XD DWS Refer to Table 32 in Appendix A : for the bill of materials for this chassis 2.1.0 14 Dell Confidential

Table 9: Chassis Configuration PowerEdge C8000 Heavy Data Nodes C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220XD DWS C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220X DWC C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220XD DWS Refer to Table 32 and Table 33 in Appendix A : for the bill of materials for these chassis. The Heavy Data node configuration is ordered in groups of three chassis two heavy data node chassis and one data node chassis. 2.1.0 15 Dell Confidential

Table 10: Rack Configuration PowerEdge C8000 RU RACK1 RACK2 RACK3 42 R1- Switch 2: 10Gb S4810 R2- Switch2: 10Gb S4810 R3- Switch2: 10Gb S4810 41 R1- Switch 1: 10Gb S4810 R2- Switch1: 10Gb S4810 R3- Switch1: 10Gb S4810 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 (1 RU) 37 OR Force10 Z9000 (2 RU) 36 Master Chassis HA Chassis R3 - Switch 1: Force10 S4810 (1 RU) 35 OR Force10 Z9000 (2 RU) 34 Cable Management Cable Management Cable Management 33 Cable Management Cable Management Cable Management 32 31 30 29 28-21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 R1- Chassis06: Data node x 2 R2- Chassis06: Data node x 2 R3- Chassis06: Data node x 2 Empty Empty S55 idrac Mgmt switch R1- Chassis05: Data node x 2 R2- Chassis05: Data node x 2 R3- Chassis05: Data node x 2 R1- Chassis04: Data node x 2 R2- Chassis04: Data node x 2 R3- Chassis04: Data node x 2 R1- Chassis03: Data node x 2 R2- Chassis03: Data node x 2 R3- Chassis03: Data node x 2 R1- Chassis02: Data node x 2 R2- Chassis02: Data node x 2 R3- Chassis02: Data node x 2 R1- Chassis01: Data node x 2 R2- Chassis01: Data node x 2 R3- Chassis01: Data node x 2 2.1.0 16 Dell Confidential

Table 11: Rack Configuration PowerEdge C8000 Heavy nodes RU RACK1 RACK2 RACK3 42 R1- Switch 2: 10Gb S4810 R2- Switch2: 10Gb S4810 R3- Switch2: 10Gb S4810 41 R1- Switch 1: 10Gb S4810 R2- Switch1: 10Gb S4810 R3- Switch1: 10Gb S4810 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 37 Master Chassis HA Chassis R3 - Switch 1: Force10 S4810 (1 RU) OR Force10 Z9000 (2 RU) 36 R3 - Switch 2: Force10 S4810 (1 RU) 35 OR Force10 Z9000 (2 RU) 34 Cable Management Cable Management Cable Management 33 Cable Management Cable Management Cable Management 25-32 Empty Empty S55 idrac Mgmt switch 24 23 22 21 R1- Chassis06: Data node x 4 (chassis 1 of 3) R2- Chassis06: Data node x 4 (chassis 1 of 3) R3- Chassis06: Data node x 4 (chassis 1 of 3) 20 19 18 17 R1- Chassis05: Data node x 4 (chassis 2 of 3) R2- Chassis05: Data node x 4 (chassis 2 of 3) R3- Chassis05: Data node x 4 (chassis 2 of 3) 16 15 14 13 R1- Chassis04 Data node x 4 (chassis 3 of 3) R2- Chassis04: Data node x 4 (chassis 3 of 3) R3- Chassis04: Data node x 4 (chassis 3 of 3) 12 11 10 9 R1- Chassis03: Data node x 4 (chassis 1 of 3) R2- Chassis03: Data node x 4 (chassis 1 of 3) R3- Chassis03: Data node x 4 (chassis 1 of 3) 8 7 6 5 R1- Chassis02: Data node x 4 (chassis 2 of 3) R2- Chassis02: Data node x 4 (chassis 2 of 3) R3- Chassis02: Data node x 4 (chassis 2 of 3) 4 3 2 1 R1- Chassis01: R2- Chassis01: Data node x 4 Data node x 4 (chassis 3 of 3) (chassis 3 of 3) NOTE: Four Heavy data nodes require 12U of rack space. R3- Chassis01: Data node x 4 (chassis 3 of 3) 2.1.0 17 Dell Confidential

Configuration Notes Appendix A : contains complete bill of materials (BOM) listing for the C8000 server configurations. Support nodes (Name Node, Crowbar Admin Node, HA Node, Edge Node) are configured with the LSI 2008 controller connected to the front hot-swap drives in the PowerEdge C8220X compute sled. The two rear motherboard drives in the PowerEdge C8220x compute sled are not required for any nodes. Data nodes require one PowerEdge C8220XD sled. Data nodes can alternatively be configured with two PowerEdge C8220XD sleds, referred to as heavy data nodes Data nodes use an LSI 9202 PCI HBA to connect to one or two PowerEdge C8220XD storage sleds. The connection requires one SAS extender cable per external sled. The reference BOM s in the appendices are organized by chassis to simplify ordering. Some configurations may require sled blanks for empty slots; the reference BOMs in the appendices account for this. The PowerEdge C8000 series is designed for cold-aisle service, with cabling in front of the chassis. Verify that rack configurations are compatible with this configuration. Be sure to consult your Dell account representative before changing the recommended disk sizes. 2.1.0 18 Dell Confidential

PowerEdge R720(xd) Server Dell Cloudera Solution Reference Architecture Guide v2.1.0 The PowerEdge R720 and R720xd servers are Dell s 12G PowerEdge mainstream 2S 2U rack servers. They are designed to deliver the most competitive feature set, best performance, and best value. In this generation, Dell offers a large storage footprint, best-in-class I/O capabilities, and more advanced management features. The PowerEdge R720 and R720xd are technically similar except the R720xd has a backplane that can accommodate more drives (up to 24). Figure 4: PowerEdge 720xd Server PowerEdge R720xd feature summary: Intel Romley platform and Intel Xeon E5-2600 processors 1600MHz DDR3 Network daughter cards for customer choice of LOM speed, fabric, and brand at point of sale PCIe SSD in a front-accessible, hot-plug format Internal GPGPU support Intel Node Manager power management technology Software RAID Platinum efficiency power supplies, common across 600 and 700 series platforms Cluster Sizing The minimum configuration supported is eight nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes When using NFS based HA, the HA node provides a shared NFS mount. In quorum-based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 12 shows the minimum and maximum numbers of nodes for in each rack, pod, and cluster within this reference architecture. 2.1.0 19 Dell Confidential

Table 12: Cluster Sizes PowerEdge R720xd Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node 1 1 1 1 1 1 Master Name Node 1 1 1 1 1 1 Secondary Name Node 1 1 1 1 1 1 HA Node Edge Node Data node 1 1 1 2 2 1 1 1 2 3 3 20 3 60 3 To be determined based on sizing criteria To be determined based on sizing criteria To be determined based on sizing criteria Hardware Configurations Table 13: Hardware Configurations PowerEdge R720xd Machine Function Platform Active and Secondary Name Node Admin Node, HA Node Edge Node Data node PowerEdge R720xd CPU 2 x E5-2640 (6-core) RAM (Minimum) 96 GB 48 GB LOM 4 x 1GbE DISK Storage Controller 6 x 600-GB 10K SAS 2.5-inch PERC H710 24 x 1-TB SATA 7.2K 2.5- inch 2.1.0 20 Dell Confidential

RAID RAID 10 Single Drive RAID 0 Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes. 2.1.0 21 Dell Confidential

Table 14: Hardware Configurations PowerEdge R720/R720xd Machine Function Platform CPU Active and Secondary Name Node Admin Node, HA Node Edge Node Data node PowerEdge R720 PowerEdge R720xd 2 x E5-2640 (6-core) RAM (Minimum) 96 GB 48 GB LOM 4 x 1GbE DISK Storage Controller 6 x 600-GB 10K SAS 3.5-inch PERC H710 24 x 1-TB SATA 7.2K 2.5-inch RAID RAID 10 Single Drive RAID 0 Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes. Refer to the JBOD versus single disk RAID 0 Configuration section for more information. 2.1.0 22 Dell Confidential

Table 15: Rack Configuration PowerEdge R720xd (or R720/R720xd) RU RACK1 RACK2 RACK3 42 R1- Switch 2: Force10 S60 R2- Switch2: Force10 S60 R3- Switch2: Force10 S60 41 R1- Switch 1: Force10 S60 R2- Switch1: Force10 S60 R3- Switch1: Force10 S60 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 Master Name Node:R720xd or R720 Edge01: R720xd or R720 37 R3 - Switch 2: Force10 S4810 36 Cable Management Cable Management Cable Management 35 Cable Management Cable Management Cable Management 34 33 Admin Node R720xd or R720 Secondary Name Node R720xd or R720 HA Node: R720xd or R720 32 R3 - S55 idrac Mgmt switch Empty Empty 31 Empty 21-30 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Empty Empty Empty R1- Chassis10: R720xd R2- Chassis10: R720xd R3- Chassis10: R720xd R1- Chassis09: R720xd R2- Chassis09: R720xd R3- Chassis09: R720xd R1- Chassis08: R720xd R2- Chassis08: R720xd R3- Chassis08: R720xd R1- Chassis07: R720xd R2- Chassis07: R720xd R3- Chassis07: R720xd R1- Chassis06: R720xd R2- Chassis06: R720xd R3- Chassis06: R720xd R1- Chassis05: R720xd R2- Chassis05: R720xd R3- Chassis05: R720xd R1- Chassis04: R720xd R2- Chassis04: R720xd R3- Chassis04: R720xd R1- Chassis03: R720xd R2- Chassis03: R720xd R3- Chassis03: R720xd R1- Chassis02: R720xd R2- Chassis02: R720xd R3- Chassis02: R720xd R1- Chassis01: R720xd R2- Chassis01: R720xd R3- Chassis01: R720xd 2.1.0 23 Dell Confidential

Configuration Notes Appendix B :, Appendix C :, and Appendix D : contain the full bill of materials (BOM) listing for the PowerEdge R720 and R720Xd server configurations. The R720 and R720xd configurations can be used with 10GbE networking. To use 10GbE networking support, an additional network card is required in each node refer to the BOM for the details on the supported card. JBOD versus single disk RAID 0 Configuration The Hadoop community s strong advocacy for the non-raided drives configuration known as Just a Bunch of Disks, or JBOD, has caused some confusion for readers of our reference architecture. We fully endorse this approach but feel a need for clarification because there are multiple valid ways to achieve this configuration. Normally, the optimum disk configuration for Hadoop data nodes is considered to be JBOD mode rather than RAID. This is because HDFS provides its own data replication, eliminating the need for the redundancy provided by RAID levels 1-6. HDFS also implements efficient round robin parallel I/O across multiple drives, eliminating the need for the parallelism provided by the striping capabilities of RAID 0. Some drive controllers support only RAID mode, and so can't be used in a plain host bus adapter (HBA) mode for JBOD. For these situations, configuring the controllers as multiple RAID 0 arrays allows HDFS to own them as a single drive. In this configuration, the controller is effectively operating just like a standard HBA in JBOD mode, and the RAID 0 and JBOD performance characteristics are comparable. While having a RAID controller adds a minor latency, it is offset by adaptive read-ahead caching. PowerEdge C6105 Server The PowerEdge C6105 is Dell s hyperscale inspired building block for high-performance cluster computing (HPCC), Web 2.0 environments, and cloud builders where performance and power consumption are key. It is purpose-built for scale-out rack deployments, large homogenous cloud/cluster application environments where density is required and the software stack provides platform availability and resiliency. Figure 5: PowerEdge C6105 PowerEdge C6105 feature summary Up to 4 server nodes in 2U 2 x Opteron 4180 12 x DDR3 RDIMM 24 x 2.5-inch or 12 x 3.5-inch HDD 2 x 1GbE Intel 82576 2.1.0 24 Dell Confidential

Cluster Sizing The minimum configuration supported is 8 nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes Configurations based on the C6105 use R720 servers for the infrastructure nodes, and C6105 for data nodes. When using NFS based HA, the HA node provides a shared NFS mount. In quorum based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 16 shows the minimum and maximum numbers of nodes for in each rack, pod, and cluster within this reference architecture. Table 16: Cluster Sizes PowerEdge C6105 with PowerEdge R720 Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node 1 1 1 1 1 1 Master Name Node 1 1 1 1 1 1 Secondary Name Node 1 1 1 1 1 1 HA Node Edge Node 1 1 1 2 3 1 1 1 2 3 To be determined based on sizing criteria To be determined based on sizing criteria Data node 2 PowerEdge C6105 chassis (3 nodes) 10 PowerEdge C6105 chassis (20 nodes) 2 PowerEdge C6105 chassis (3 nodes) 30 chassis (60 nodes) 30 chassis (60 nodes) To be determined based on sizing criteria 2.1.0 25 Dell Confidential

Hardware Configurations Machine Function Table 17: Hardware Configurations PowerEdge C6105 with PowerEdge R720 Edge Node, Active and Secondary Name Node, Admin Node, HA Node Data node Platform PowerEdge R720 PowerEdge C6105 (2-node) CPU 2 x E5-2640 (6-core) 2 x Opteron 4180 (6-core) RAM (Minimum) LOM 96 GB 48 GB 4 x 1GbE 2 x 1GbE Add-in NIC None None DISK 6 x 600-GB 10K SAS 2.5-inch 12 x 1-TB SATA 7.2K 2.5-inch Storage Controller PERC H710 LSI2008 RAID RAID 10 JBOD Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes. Configuration Notes Appendix E : contains the full bill of materials (BOM) for the PowerEdge C6105 data node configurations. Appendix C : contains the BOM for the PowerEdge R720 infrastructure nodes that should be used with this configuration. 2.1.0 26 Dell Confidential

Table 18: Rack Configuration PowerEdge C6100 RU RACK1 RACK2 RACK3 42 R1- Switch 2: Force10 S60 R2- Switch2: Force10 S60 R3- Switch2: Force10 S60 41 R1- Switch 1: Force10 S60 R2- Switch1: Force10 S60 R3- Switch1: Force10 S60 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 Master Name Node :R720 Edge01: R720 37 R3 - Switch 2: Force10 S4810 36 Cable Management Cable Management Cable Management 35 Cable Management Cable Management Cable Management 34 33 Admin: R720 Secondary Name Node R720 HA Node R720 32 Empty Empty Cable Management 31 Empty Empty Cable Management 21-30 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Empty Empty Empty R1- Chassis10: C6105 (2-Node) R2- Chassis10: C6105 (2-Node) R3- Chassis10: C6105 (2-Node) R1- Chassis09: C6105 (2-Node) R2- Chassis09: C6105 (2-Node) R3- Chassis09: C6105 (2-Node) R1- Chassis08: C6105 (2-Node) R2- Chassis08: C6105 (2-Node) R3- Chassis08: C6105 (2-Node) R1- Chassis07: C6105 (2-Node) R2- Chassis07: C6105 (2-Node) R3- Chassis07: C6105 (2-Node) R1- Chassis06: C6105 (2-Node) R2- Chassis06: C6105 (2-Node) R3- Chassis06: C6105 (2-Node) R1- Chassis05: C6105 (2-Node) R2- Chassis05: C6105 (2-Node) R3- Chassis05: C6105 (2-Node) R1- Chassis04: C6105 (2-Node) R2- Chassis04: C6105 (2-Node) R3- Chassis04: C6105 (2-Node) R1- Chassis03: C6105 (2-Node) R2- Chassis03: C6105 (2-Node) R3- Chassis03: C6105 (2-Node) R1- Chassis02: C6105 (2-Node) R2- Chassis02: C6105 (2-Node) R3- Chassis02: C6105 (2-Node) R1- Chassis01: C6105 (2-Node) R2- Chassis01: C6105 (2-Node) R3- Chassis01: C6105 (2-Node) 2.1.0 27 Dell Confidential

Dell Cloudera Solution Network Architecture The Dell Cloudera Solution uses Dell Force10 S60 or Dell PowerConnect 6248 (optional) Gigabit Ethernet switches as the top-of-rack connectivity to all Hadoop-related nodes. This reference architecture is used to support consistency in rapid deployments through the minimal differences in the network configuration. This reference architecture implements at a minimum three distinct, separate VLANs: Hadoop Cluster Data LAN connects the compute node NICs into the fabric used for sharing data and distributing work tasks among compute nodes. Hadoop Cluster Management LAN connects all the idrac/bmcs in the cluster nodes. Hadoop Cluster Edge LAN connects the cluster to the outside world. The network consists of three major network infrastructure layouts: Data network infrastructure the data network consists of the server NICs, the top-of-rack (ToR) switches, and the aggregation switches. Management network infrastructure the BMC management network, consisting of idrac ports and the out-of-band management ports of the switches, is aggregated into a 1-RU s55 switch in one of the three racks in the POD. This 1-RU switch in turn can connect to one of the Aggregation or Core switches to create a separate network with a separate VLAN. Core network infrastructure the connectivity of aggregation switches to the core for external connectivity. Network Components The data network is primarily composed of the ToR and the aggregation switches. Configurations for 1GbE and 10GbE are included in this reference architecture. The following component blocks make up this network: Server Nodes Server connections to the network switches could be one of four possible configurations: Active-Active LAG in load-balance bond formation Active-Backup in failover/failback formation Active-Active round robin based on gratuitous ARP Single port In the first case the connectivity on the switch side must be in a LAG (or port-channel). In cases B and C, we recommend that you do the configuration as a LAG but the ports should still be part of the same layer-2 domain. In some cases all members of the LAG connect to a single ToR switch. In others the LAG splits into two ToR switches. This is an optional setup as Hadoop has redundancy built into the application, and highavailability is not compromised by connecting into a single switch. The teaming configuration that Dell recommends is balanced-alb (mode = 6). This configuration setting is explained in greater deal in the Dell Cloudera Solution Deployment Guide. Please contact your sales representative for a copy of the deployment guide. The Dell Crowbar deployment software automatically configures this setting for Hadoop environments. Access Switch or Top of Rack (ToR) The servers connect to ToR switches. Typically there are two in each rack. The switches recommended by Dell are the Force10 S60 for 1GbE connectivity and S4810 for 10G servers. The 1GbE option is for PowerEdge C2100 and PowerEdge C6100 servers, while the PowerEdge C8000 requires the 10GbE option. PowerEdge R720 configurations can use 1GbE or 10GbE. The Force10 S60 ToR switches stack together in the same rack for 1GbE. This is useful in managing the two switches as a single unit and allowing the servers to connect into two different switches for redundancy. The ToR switches each have two expansion slots that can accept a two-port 10G module or a two-port stacking module. This architecture recommends one of each type in the two slots. The 10GbE module would be used to connect into the pod-interconnect switches, one port to each switch, forming a LAG. The stacking module connects the switches together as a single unit. The uplinks to the aggregation pair would be a single LAG of 2.1.0 28 Dell Confidential

four 10GbE ports, two from each switch. Each rack connects to the pod-interconnect independently; thereby scaling is easier. For the 10GbE configuration, the ToR switches are Force10 S4810, and we recommend this pair of switches run a high availability feature called the Virtual Link Trunking (VLT). This feature allows the servers to terminate their LAG interfaces into two different switches instead of one. This allows HA as well as active-active bandwidth utilization. This feature gives redundancy within the rack if one switch fails or needs maintenance. The uplink to the aggregation pair is 80Gb, using a LAG from each ToR switch. This is achieved using two 40G interfaces in a LAG connecting to the aggregation pair. Therefore, from each rack there is a collective bandwidth of 160G available. Each rack is managed as a separate entity from a switching perspective, and ToR switches connect only to the aggregation switches. Aggregation Switches For a medium-scale deployment of one to three PODs of 1G servers (12 racks max) the Dell Force10 S4810 is the recommended aggregation switch. It is both 10GbE and 40GbE capable. The 40GbE interfaces on the S4810 could be converted into four 10GbE, thereby converting this switch into 64 10GbE-capable ports. This potentially scales Hadoop deployments into tens of nodes. Hadoop ToR switches connect to aggregate switches via uplinks of 10GbE interfaces from the ToR Force10 S60 to the Force10 S4810. The recommended architecture uses Virtual Link Trunking (VLT) between the two Force10 S4810 switches in aggregation. This feature enables a multi-chassis LAG from the stacked ToR switches in each rack. The stacks in each rack would divide their links between this pair for switches to achieve the powerful capability of activeactive forwarding while using full bandwidth capability, in absence of any requirement for spanning tree. The aggregation switches also run layer-3 from the ToR as layer-2 alone is not a requirement from Hadoop s perspective. Therefore, for scaling to large deployments, layer-3 routing is a good option. Running 40GbE Ethernet switches like the Dell Force10 Z9000 in aggregation can achieve a scale of up to hundreds of 1G deployed nodes. For the 10G server deployment, depending on the scale at which the PODs are planned and depending on the how much future scale is required, we recommend the Force10 S4810 for aggregation for smaller scale and the Force10 Z9000 for larger deployment. The Force10 Z9000 is a 32-port, 40G high-capacity switch. It can aggregate up to 15 racks of high-density PowerEdge C8000 servers. The rack-to-rack bandwidth needed in Hadoop would be most suitably handled by a 40G-capable, non-blocking switch. The Force10 Z9000 can provide a cumulative bandwidth of 1.5TB of throughput at line-rate traffic from every port. Core The aggregation layer could itself be the network core in many cases but where it s not, it would connect to a larger core, which is represented by the cloud in Figure 7. Details on this topic are beyond the scope of this document. Layer-2 and Layer-3 The layer-2 and layer-3 boundaries are separated at either the ToR or the aggregation layer. Either of the options is equally viable. The colors blue and red in Figure 7 represent the layer-2 and layer-3 boundaries. This document uses layer-2 as the reference up to the aggregation layer. That is why VLT is used on the aggregation switches. Single Rack Configuration Figure 6 shows the single rack equipment. Dell recommends using Force10 S60 ToR switches in the rack. Each rack could have a maximum of 20 servers in some configurations, while a dense packing of sleds in the C- series server chassis can hold even more. Each rack has two ToR Force10 S60 switches that are stacked, and this stack connects to the two Force10 S4810 aggregation switches. The Force10 S60 stack offers a single switch view to the servers. Each Data node can have up to four 1G NIC ports. It forms a LAG of two ports with one port on each switch in the stack. The LAG of 2GbE offers a switch redundancy within the rack and enables high availability. 2.1.0 29 Dell Confidential

Table 19: Single Rack Network Equipment Total Racks 1 (6-20 nodes) Top-of-rack switch 2 Force10 S60 (2 per rack) Aggregation switch Not needed for a single rack Server 2RU PowerEdge R720/R720xd/C2100/C6100 Over-subscription at ToR 1:1 Modules in each ToR 1x 12-2port Stacking, 1x 10G -2 port uplink Figure 6 Single Rack Networking Equipment Figure 7 shows the Force10 S4810 switch aggregating the pods to enable inter-rack traffic and the management network. There are two separate VLANs for data and management. All port-channels on the Force10 S4810 and ToR are tagged in these two VLANs. 2.1.0 30 Dell Confidential

Figure 7 Multi-rack networking equipment The following table shows the network inventory details in a cluster of three racks. 2.1.0 31 Dell Confidential

Table 20: Multi Rack Network Equipment Total racks Top-of-rack switch Pod-interconnect switch Server 3 (15-20 nodes per rack) 6 Force10 S60 (2 per rack) 2 Force10 S4810 2RU PowerEdge R720/R720xd/C2100 Over-subscription at ToR 1:1 Modules in each ToR 1x 12-2port Stacking, 1x 10G -2 port uplink Multi-rack configuration for 10GbE with Force10 S4810 In this reference architecture we define a 10G solution with the PowerEdge C8000 and R720X servers. Hadoop applications are increasingly being deployed on 10GbE servers for the scale and price advantages they bring. That brings about an enormous economy of scale in the usage of hardware. That in turn needs 10GbE switches in the racks. This can be achieved using the Force10 S4810 as a top-of-rack switch and the option of using Force10 S4810 or Z9000, the 10G/40G high-density switch in the aggregation. The scale that is achieved by that configuration can grow into thousands of nodes using a CLOS architecture, which was used in the 600-node 1GbE solution above. Running 40GbE switches like the Force10 Z9000 in aggregation can achieve a scale of hundreds of nodes using high-density data center class switches. Figure 8: Multi-Rack View for 10G Servers Using Force10 s4810 Switches In Figure 8 we see that each rack with a pair of switches aggregates into the pair of Force10 S4810 switches. This connection could be based on layer-2, if the aggregation runs on the VLT feature. Alternately it could run on layer-3 point-to-point routing using some routing protocol such as OSPF. In both cases all links utilize the full bandwidth on all links due to load-balancing. The first scenario creates a complete layer-2 domain between all racks in the cluster. The Force10 s4810-based aggregation design is preferred for lower cost and medium scalability. This design can handle up to six racks or two PODs. The way this connectivity works is that the ToR Force10 s4810 uplinks using its 40G interface in quad-mode where each 40G interface runs as 4x 10G. The figure below shows the cables needed for this design. If using a passive copper break-out cable there is no need to any QSFP+ or SFP+ other than the cable as these are built into the copper twin-ax cable. This makes the arrangement cost less than the fiber option where there is a need for a QSFP+ optic and four SFP+ optics plus the break-out fiber. The benefit of the fiber option is the longer reach achieved with it compared to twin-ax, which is limited to 5 meters. 2.1.0 32 Dell Confidential

Multi-rack Configuration for 10GbE Using Force10 Z9000 Switches For a scale-out version of the deployment that is looking to expand its Hadoop environment into a larger setup or needs the Hadoop cluster co-located with other applications in different racks, the recommended option is the Force10 Z9000 core switch. The Force10 Z9000 does not need to connect into any other higher-tier core switches as the capacity is enough for a data center with hundreds of servers. Figure 9: Multi-Rack View for 10G Servers Using Force10 Z9000 (based on Layer-3) Figure 9 shows a routed version where each s4810 in the rack has a layer-3 connection to the core switch. LAG is not required for this option as each connection could be a point-to-point routed link. Optionally each switch could form a Layer-2 LAG as shown in Figure 10. This assumes that the Z9000 pair in the aggregation forms a VLT pair for HA. Now we have 2 tiers of VLT, one forming at the ToR for servers and another at the aggregation for the top of rack switches. Each option has its own merits and equally recommended depending how the user sees it fit the larger network in the Data Center. 2.1.0 33 Dell Confidential

Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2) In Figure 10 we see an example of a CLOS fabric that grows horizontally. This technique of network fabric deployment has been used by some of the largest Web 2.0 companies, whose businesses range from social media to public cloud in their data centers. Some of the largest recent Hadoop deployments also use this new approach to networking. Dell Data Center Solutions has hands-on experience in building Hadoop and Big Data analytics farms while Dell Force10 is a trusted vendor in the field of networking. Dell can help an enterprise solve its Big Data needs with a scalable end-to-end solution. Management Network The management network of all the servers and switches is aggregated into a Dell Force10 S55 switch that is located in rack-3 of the POD. It uplinks on a 10G link to the aggregation switches or the core directly wherever the split for out-of-band is required. Dell Open Switch Solution In addition to the Dell switch-based reference architecture, Dell provides an open standard that allows you to choose other brands and configurations of switches for your Hadoop environment. The following list of requirements will enable other brands of switches to properly operate with the tools and configurations in the Dell Hadoop Reference Architecture: Support for IEEE 802.1Q VLAN traffic and port tagging Ability to provide a minimum of 170 Gigabit Ethernet ports in a non-blocking configuration within VLAN 100 o Configuration can be a single switch or a combination of stacked switches to meet the additional requirements The ability to create link aggregation groups (LAGs) with a minimum of two physical links in each LAG If multiple switches are stacked: o The ability to create a LAG across stacked switches o Full-bisection bandwidth o Support for VLANs to be available across all switches in the stack The ability to provide a minimum 65 10/100 Ethernet ports on the untagged VLAN 250,000 packets-per-second capability per switch The ability to provide 12 10Gb ports for redundant uplinks contained in VLAN 10 A managed switch that supports SSH and serial line configuration SNMP v3 support 2.1.0 34 Dell Confidential

IPv6 Capabilities Dell Cloudera Solution Reference Architecture Guide v2.1.0 At this time, the Dell Cloudera Solution does not support or allow for the use of IPv6 for network connectivity. All deployments are configured by Crowbar based on IPv4, with IPv6 explicitly disabled on all nodes within the Hadoop environment. Network Connectivity The network interconnects between various hardware components of the Hadoop solution are depicted in Figure 11 and Figure 12. For more information, please see the deployment guide. BMC NIC1-4 Figure 11: Dell Cloudera Compute PowerEdge R720xd Node Network Interconnects 2.1.0 35 Dell Confidential

Figure 12 Network Connections 2.1.0 36 Dell Confidential

Dell Cloudera Solution Software Architecture Table 21: Dell Cloudera Solution Support Matrix RA Version OS Version Hadoop Version Available Support Supported JVM 2.1 Red Hat Enterprise Linux 6.2 CDH 4.1 Cloudera Manager 4.1 Dell Hardware support Cloudera Hadoop support Red Hat Linux support Sun Oracle JVM 2.1 CentOS 6.2 CDH 4.1 Cloudera Manager 4.1 Dell Hardware support Sun Oracle JVM Linux File System Configuration Definition Dell and Cloudera recommend and support the use of ext4 for all HDFS disks. Disk Partitioning Recommendation for the Name Node All disk configuration parameters are documented in the Dell Cloudera Solution Deployment Guide, as well as Linux Kickstart scripts for proper configuration at the time of operating system installation. Configuration Parameters: Recommended Values Table 22: HDFS parameters Property Description Value dfs.block.size Lower value offers parallelism 134217728 (128Mb) dfs.name.dir dfs.datanode.handler.count dfs.namenode.handler.count dfs.datanode.du.reserved Comma-separated list of folders (no space) where a SlaveNode stores its blocks Number of handlers dedicated to serve data block requests in Hadoop SlaveNodes More Master Node server threads to handle RPCs from large number of SlaveNodes The amount of space on each storage volume which HDFS should not use /mnt/hdfs/hdfs01/meta1 16 (Start 2 x CORE_COUNT in each SlaveNode ) Start with 10, increase large clusters (Higher count will drive higher CPU, RAM, and network utilization) 10M dfs.replication Data replication factor; default is 3 3 (default) fs.trash.interval dfs.permissions Time interval between HDFS space reclaiming 1440 (minutes) true (default) dfs.datanode.handler.count 8 dfs.data.dir Hadoop Data Node Location /mnt/hdfs/hdfs01/data1/hdfs commaseparated through /mnt/hdfs/hdfs01/datan/hdfs 2.1.0 37 Dell Confidential

Table 23: mapred parameters Property Description Value mapred.child.java.opts mapred.job.tracker mapred.job.tracker.handler.count mapred.reduce.tasks mapred.local.dir mapred.tasktracker.map.tasks. maximum mapred.tasktracker.reduce.tasks. maximum Larger heap-size for child JVMs of maps/reduces. Hostname or IP address and port of the JobTracker More JobTracker server threads to handle RPCs from large number of TaskTrackers The number of Reduce tasks per job Comma-separated list of folders (no space) where a TaskTracker stores runtime information Maximum number of map tasks to run on the node Maximum number of reduce tasks to run per node -Xmx1024M namenode:8021 Start with 32, increase large clusters (Higher count will drive higher CPU, RAM, and network utilization) Set to a prime close to the number of available hosts /mnt/hdfs/hdfs01/data1/mapred comma-separated through /mnt/hdfs/hdfs01/datan/mapred 2 + (2/3) * number of cores per node 2 + (1/3) * number of cores per node mapred.child.ulimit 2097152 mapred.map.tasks.speculative. execution mapred.reduce.tasks.speculative. execution FALSE FALSE mapred.job.reuse.jvm.num.tasks 1 Table 24: default environment Property Description Value SCAN_IPC_CACHE_LIMIT LOCAL_JOB_HANDLER_COUNT Number of rows cached in search engine for each scanner next call over the wire; it reduces the network round trip by 300 times caching 300 rows in each trip Number of parallel queries executed at one go; query requests above this limit are queued up 100 30 Table 25: hadoop-env.sh Property Description Value java.net.preferipv4stack HADOOP_*_OPTS true -Xmx2048m Table 26: /etc/fstab Property Description Value 2.1.0 38 Dell Confidential

File system mount options data=writeback,nodiratime, noatime Table 27: hdfs (core-site) Property Description Value io.file.buffer.size fs.default.name fs.checkpoint.dir The size of buffer for use in sequence files; the size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations The name of the default file system; a URI whose scheme and authority determine the file system implementation Comma-separated list of directories on the local file system of the Secondary Master Node where its checkpoint images are stored 65536 (64Kb) hdfs://namenode:8020 TBD io.sort.factor 80 Io.sort.mb 512 Table 28: /etc/security/limits.conf Property Description Value mapred nofile 32768 hdfs nofile 32768 hbase nofile 32768 2.1.0 39 Dell Confidential

Dell Cloudera Solution Cloudera Enterprise Software Dell Cloudera Solution Reference Architecture Guide v2.1.0 The Dell Cloudera Solution is based on Cloudera Enterprise, which includes Cloudera s distribution for Hadoop (CDH) 4.1 and Cloudera Manager. Deployment Integration The Dell Cloudera Solution v2.1 uses Crowbar to deploy the infrastructure: BIOS and RAID configurations, and the base operating system. Cloudera Manager is used to deploy the rest of the Hadoop software stack. Cloudera Manager installs the core Hadoop components (HDFS and Map Reduce) and some of the ecosystem components. Pig, Hive, and Sqoop ecosystem components have to be installed via Crowbar barclamps. Please refer to the Crowbar Administration User Guide for details on how to deploy Cloudera Manager and the ecosystem services supported by the solution. Cloudera Manager Cloudera Manager deploys and centrally operates a complete Hadoop stack. The application automates the installation process, thereby reducing deployment time from weeks to minutes. These are the functional characteristics of Cloudera Manager: Provides a cluster-wide, real-time view of the services running and the status of their hosts Provides a single, central place to enact configuration changes across the cluster Incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization Provides full lifecycle management for Hadoop deployments Enables the configuration of server roles and services across the cluster Provides the interface to gracefully start, stop, and restart services as needed Cloudera Manager Editions There are two editions of Cloudera Manager: the Free Edition and the Enterprise Edition. A license is not needed to use the Free Edition but the number of hosts supported is limited to 50. The Enterprise Edition supports an unlimited number of hosts, requires a license, and provides service monitoring and additional management features that are not included in the Free Edition. Table 29: Differences between Cloudera Manager Free Edition and Enterprise Edition Cloudera Manager Editions Free Enterprise Maximum Number of Nodes Supported 50 Unlimited Automated Deployment & Hadoop Readiness Checks Comprehensive API Service & Configuration Management Deploy & Configure HDFS, MapReduce, Flume, HBase, Hue, Impala*, Oozie & Zookeeper Services Configure High Availability & Federation Automated Configuration Client Configuration Management Audit Trail Add/Delete/Stop/(Re)Start/Decommission Role Instances Configuration Versioning & History Service Monitoring & Management Monitor HDFS, MapReduce, Hue, Flume, Hue, Oozie & Zookeeper (Cloudera Enterprise Core) Monitor HBase (Cloudera Enterprise RTD) 2.1.0 40 Dell Confidential

Monitor Impala* (Cloudera Enterprise RTQ) Proactive Health Checks Status & Health Summary Heatmaps/Performance Monitoring Host Monitoring Security LDAP Authentication Kerberos Configuration Multi-Cluster Management Intelligent Log Management Events Management & Alerts Activity Monitoring Operational Reporting File Browser & Quota Management Global Time Control Support Integration Maintenance Mode http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html Cloudera Support As the use of Hadoop grows and an increasing number of groups and applications move into production, your Hadoop users will expect greater levels of performance and consistency. Cloudera s proactive productionlevel support gives your administrators the expertise and responsiveness they need. Cloudera Support includes: Flexible Support Windows Choose 8 5 or 24 7 to meet SLA requirements. Configuration Checks Verify that your Hadoop cluster is fine-tuned for your environment. Escalation and Issue Resolution Resolve support cases with maximum efficiency. Comprehensive Knowledge Base Expand your Hadoop knowledge with hundreds of articles and tech notes. Support for Certified Integration Connect your Hadoop cluster to your existing data analysis tools. Proactive Notification Stay up-to-speed on new developments and events. With Cloudera Enterprise, you can leverage your existing team s experience and Cloudera s expertise to put your Hadoop system into effective operation. Built-in predictive capabilities anticipate shifts in the Hadoop infrastructure to support reliable function. Cloudera Enterprise makes it easy to run open source Hadoop in production, by: 2.1.0 41 Dell Confidential

Simplifying and accelerating Hadoop deployment Reducing the costs and risks of adopting Hadoop in production Reliably operating Hadoop in production with repeatable success Applying SLAs to Hadoop Increasing control over Hadoop cluster provisioning and management HDFS Highly Available Name Nodes The Dell Cloudera Solution v2.1 includes CDH 4.1 and Cloudera Manager 4.1, which feature Highly Available Name Nodes for HDFS. There are two options for name node high availability: Active/Passive using Shared Storage Quorum-based Only one of these can be used in a single cluster. Quorum-based high availability is recommended for new installations. Additional information about the high availability features of CDH4 can be found in the CDH4 High Availability Guide ( http://ccp.cloudera.com/display/cdh4doc/cdh4+high+availability+guide ) Active/Passive HA using Shared Storage This configuration will set up name nodes in an Active/Passive Configuration. The Active Name Node (formerly Master Name Node) is responsible for all client operations in the cluster, while the Standby (formerly Secondary Name Node) is simply acting as a slave, maintaining enough shared state via the filer to provide a fast failover if necessary. In order for the Standby node to keep its state synchronized with the Active node, the active/passive implementation requires that the two nodes both have access to a directory on a shared storage device, in this case an NFS mount from the HA Node. (http://www.cloudera.com/blog/2012/03/highavailability-for-the-hadoop-distributed-file-system-hdfs/). Quorum-based HA In Quorum-based HA, the Standby node keeps its state synchronized with the Active node by communicating with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JournalNodes. The Standby node is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs. There should be an odd number (and at least three) JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. The JournalNode daemons run on the Master, Secondary, and HA nodes in this reference architecture. 2.1.0 42 Dell Confidential

Figure 13: HDFS with Highly Available Name Node Dell Cloudera Solution Deployment Methodology Site Preparation Needed for the Deployment The heating, ventilation, air conditioning (HVAC) and power requirements for deployment can be estimated using the Dell Energy Smart Solution Advisor at: http://www.dell.com/content/topics/topic.aspx/global/products/pedge/topics/en/config_calculator?c=us&cs=555&l=en&s =biz Using this tool, you can plan the needs for your solution, order the correct PDUs, and have the proper HVAC ready for the installation. Detailed deployment instructions are documented in the Dell Cloudera Solution Deployment Guide. Dell Cloudera Solution Hardware Monitoring and Alerting To automate the alert and response to unexpected events and failures within the Dell Cloudera Solution, the software stack includes Nagios and Ganglia. The Dell Cloudera Solution includes capabilities for three primary components of the monitoring environment: Monitoring of cluster activities The Dell Cloudera Solution utilizes Nagios to monitor the cluster, including hardware, software, and users. The Nagios deployment as part of the Dell Cloudera Solution will keep historical information regarding system availability, maintenance, and failure events. Alerts on unexpected events The Dell Cloudera Solution utilizes Nagios to alert system operations staff to events that occur that deviate from normal operation, if the administrator has designated them for notification. Debugging of cluster runtime operations The Dell Cloudera Solution utilizes Cloudera Enterprise to provide the users and administrators of the Hadoop environment with the necessary tools for tracking, debugging, and monitoring job performance and characteristics. The Dell Cloudera Solution is designed to include the necessary components to monitor and respond to events in your Hadoop environment. It is flexible enough to allow integration with existing operations management frameworks in your environment. 2.1.0 43 Dell Confidential

The monitoring components of the Dell Cloudera Solution Reference Architecture are designed to be proactive in nature; they alert the IT operations team when a failure in the environment occurs, and they do so before the failure causes an outage that affects product workloads and users. Nagios Nagios is an open source solution for enterprise monitoring. Its pluggable architecture allows for consistent event handling. It supports a wide variety of sensors, plug-ins, applications, servers, and hardware platforms. The Dell Cloudera Solution includes Nagios as part of all default installations. The Dell Cloudera Solution will automatically install the Nagios console and the necessary Nagios plug-ins for monitoring the Hadoop cluster, including processes, operating systems, and physical servers. Ganglia Ganglia is a scalable distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies, using carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust. It has been ported to an extensive set of operating systems and processor architectures used to link clusters across university campuses and around the world. It can scale to handle clusters with 2,000 nodes. The Dell Cloudera Solution automates the installation and configuration of Ganglia within the Hadoop cluster, enabling IT operations staff to have detailed reporting on the status and utilization of all Hadoop nodes. Dell Cloudera Solution Security Design What is Available in CDH4? Cloudera s CDH4 release offers the following security features: Two levels of authentication: o Cluster: SlaveNode to NameNode, TaskTracker to JobTracker o User: Unix-style file permissions NOTE: Access to the cluster can be restricted to Kerberos-authorization users. Sqoop and Pig support for security with no configuration required The capability to integrate with a Kerberos server Implementing Secure Hadoop https://ccp.cloudera.com/display/ent4doc/configuring+hadoop+security+with+cloudera+manager 2.1.0 44 Dell Confidential

Figure 14: Kerberos Authentication in Hadoop 2.1.0 45 Dell Confidential

Appendix A : Bill of Materials PowerEdge C8000 Series For the PowerEdge C8000 series, the bill of materials is organized by chassis rather than node, to simplify ordering. Table 30: Master Chassis PowerEdge C8000 The master chassis includes the Administration Node, a Master Name Node, and an Edge Node SKU Component Group: 1 Quantity: 1 225-3550 PE C8000 Enclosure, Dual Power Supply 331-8341 PowerEdge C8000 Shipping 331-9573 SHIP,C8000,DAO 420-3323 No Factory Installed Operating System 318-2363 PowerEdge C8000 Sled Blank, Single Width - Quantity 2 330-7353 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity 4 331-8218 PowerEdge C8000 Static Rails, Toolless 936-3965 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 936-4695 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 936-4705 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 936-6035 Dell Hardware Limited Warranty Plus On Site Service Initial Year 936-6145 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined Group: 2 Quantity: 3 225-3555 PowerEdge C8220X, Double Width Compute Sled 331-4428 Performance Optimized 330-4118 System ordered as part of Multipack order 421-8663 No Factory Installed Operating System, v.2 430-3638 Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC 430-3643 Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile 331-8223 C2 LSI 2008 Mezz Card supporting up to 8 Hard Drives 331-8996 Cable for 2.5in Rear Hard Drives, PE-C8220X 342-5079 LSI 2008 SAS Controller Card, 6G, PE C8XXX 317-4928 Dual Processor Option 317-9596 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz 317-9610 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W 318-2308 Thermal Heatsink 318-2308 Thermal Heatsink 317-8998 128GB Memory (16x8GB),1600Mhz, Dual Ranked RDIMMs for 2 Processors 468-7687 Info, Memory for Dual Processor selection 342-4986 2.5in HDD Enclosure, PE-C8220X 2.1.0 46 Dell Confidential

342-5057 2.5in HDD Blank, PE-C8220X - Quantity 2 342-4821 Hard Drive Carrier 2.5 C8000 - Quantity 6 342-4871 1TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive - Quantity 6 342-4983 Hot Plug Hard Drive Carrier, PE-C8220X 934-0626 Dell Hardware Limited Warranty Plus On Site Service Extended Year 934-9845 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 935-0575 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 935-0585 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 996-9927 Dell Hardware Limited Warranty Plus On Site Service Initial Year 900-9997 On-Site Installation Declined 2.1.0 47 Dell Confidential

Table 31: HA Chassis PowerEdge C8000 The HA Chassis includes a secondary name node, the HA node, and one data node SKU Component Group: 1 Quantity: 1 225-3550 PE C8000 Enclosure, Dual Power Supply 331-8341 PowerEdge C8000 Shipping 331-9573 SHIP,C8000,DAO 420-3323 No Factory Installed Operating System 330-7353 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity 4 331-8218 PowerEdge C8000 Static Rails, Toolless 936-3965 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 936-4695 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 936-4705 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 936-6035 Dell Hardware Limited Warranty Plus On Site Service Initial Year 936-6145 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined Group: 2 Quantity: 2 225-3555 PowerEdge C8220X, Double Width Compute Sled 331-4428 Performance Optimized 330-4118 System ordered as part of Multipack order 421-8663 No Factory Installed Operating System, v.2 430-3638 Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC 430-3643 Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile 331-8223 C2 LSI 2008 Mezz Card supporting up to 8 Hard Drives 331-8996 Cable for 2.5in Rear Hard Drives, PE-C8220X 342-5079 LSI 2008 SAS Controller Card, 6G, PE C8XXX 317-4928 Dual Processor Option 317-9596 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz 317-9610 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W 318-2308 Thermal Heatsink 318-2308 Thermal Heatsink 317-8998 128GB Memory (16x8GB),1600Mhz, Dual Ranked RDIMMs for 2 Processors 468-7687 Info, Memory for Dual Processor selection 342-4986 2.5in HDD Enclosure, PE-C8220X 342-5057 2.5in HDD Blank, PE-C8220X - Quantity 2 342-4821 Hard Drive Carrier 2.5 C8000 - Quantity 6 2.1.0 48 Dell Confidential

342-4871 1TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive - Quantity 6 342-4983 Hot Plug Hard Drive Carrier, PE-C8220X 934-0626 Dell Hardware Limited Warranty Plus On Site Service Extended Year 934-9845 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 935-0575 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 935-0585 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 996-9927 Dell Hardware Limited Warranty Plus On Site Service Initial Year 900-9997 On-Site Installation Declined Group: 3 Quantity: 1 225-3555 PowerEdge C8220X, Double Width Compute Sled 331-4428 Performance Optimized 330-4118 System ordered as part of Multipack order 421-8663 No Factory Installed Operating System, v.2 331-9532 LSI 9202 SAS Controller Cable 430-3643 Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile 342-4851 LSI 9202-16E, LP, Controller, CE 331-8224 C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives 331-8999 SAS Controller Cable, PE-C8220X 342-5079 LSI 2008 SAS Controller Card, 6G, PE C8XXX 317-4928 Dual Processor Option 317-9596 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz 317-9610 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W 318-2308 Thermal Heatsink 318-2308 Thermal Heatsink 317-8810 Memory Filler Blank Dimm Quantity 8 317-8994 64GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs 468-7687 Info, Memory for Dual Processor selection 342-4820 Hard Drive Carrier 3.5 C8000 - Quantity 4 342-4874 3TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity 4 342-4987 3.5in HDD Enclosure, PE-C8220X 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4983 Hot Plug Hard Drive Carrier, PE-C8220X 934-0626 Dell Hardware Limited Warranty Plus On Site Service Extended Year 934-9845 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 2.1.0 49 Dell Confidential

935-0575 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 935-0585 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 996-9927 Dell Hardware Limited Warranty Plus On Site Service Initial Year 900-9997 On-Site Installation Declined Group: 4 Quantity: 1 225-3558 PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives 330-4118 System ordered as part of Multipack order 420-3323 No Factory Installed Operating System 342-3923 3TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity 12 342-4824 Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 - Quantity 12 934-3976 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 934-4706 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 934-4716 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 934-6046 Dell Hardware Limited Warranty Plus On Site Service Initial Year 934-6156 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined 2.1.0 50 Dell Confidential

The Data node chassis includes two data nodes. Table 32: Data Node Chassis PowerEdge C8000 SKU Component Group: 1 Quantity: 1 225-3550 PE C8000 Enclosure, Dual Power Supply 331-8341 PowerEdge C8000 Shipping 331-9573 SHIP,C8000,DAO 420-3323 No Factory Installed Operating System 330-7353 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity 4 331-8218 PowerEdge C8000 Static Rails, Toolless 936-3965 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 936-4695 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 936-4705 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 936-6035 Dell Hardware Limited Warranty Plus On Site Service Initial Year 936-6145 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined Group: 2 Quantity: 2 225-3555 PowerEdge C8220X, Double Width Compute Sled 331-4428 Performance Optimized 330-4118 System ordered as part of Multipack order 421-8663 No Factory Installed Operating System, v.2 331-9532 LSI 9202 SAS Controller Cable 430-3638 Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC 342-4851 LSI 9202-16E, LP, Controller, CE 331-8224 C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives 331-8999 SAS Controller Cable, PE-C8220X 342-5079 LSI 2008 SAS Controller Card, 6G, PE C8XXX 317-4928 Dual Processor Option 317-9596 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz 317-9610 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W 318-2308 Thermal Heatsink 318-2308 Thermal Heatsink 317-8810 Memory Filler Blank Dimm Quantity 8 317-8994 64GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs 468-7687 Info, Memory for Dual Processor selection 2.1.0 51 Dell Confidential

342-4820 Hard Drive Carrier 3.5 C8000 - Quantity 4 342-4874 3TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity 4 342-4987 3.5in HDD Enclosure, PE-C8220X 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4983 Hot Plug Hard Drive Carrier, PE-C8220X 934-0626 Dell Hardware Limited Warranty Plus On Site Service Extended Year 934-9845 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 935-0575 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 935-0585 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 996-9927 Dell Hardware Limited Warranty Plus On Site Service Initial Year 900-9997 On-Site Installation Declined Group: 3 Quantity: 2 225-3558 PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives 330-4118 System ordered as part of Multipack order 420-3323 No Factory Installed Operating System 342-3923 3TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity 12 342-4824 Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 - Quantity 12 934-3976 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 934-4706 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 934-4716 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 934-6046 Dell Hardware Limited Warranty Plus On Site Service Initial Year 934-6156 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined 2.1.0 52 Dell Confidential

Table 33: Heavy Data Node Chassis PowerEdge C8000 The Heavy Data node chassis is used to configure four heavy data nodes in three chassis. Order two heavy data node chassis and one data node chassis for this configuration. SKU Component Group: 1 Quantity: 1 225-3550 PE C8000 Enclosure, Dual Power Supply 331-8341 PowerEdge C8000 Shipping 331-9573 SHIP,C8000,DAO 420-3323 No Factory Installed Operating System 330-7353 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity 4 331-8218 PowerEdge C8000 Static Rails, Toolless 936-3965 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 936-4695 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 936-4705 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 936-6035 Dell Hardware Limited Warranty Plus On Site Service Initial Year 936-6145 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined Group: 2 Quantity: 2 225-3555 PowerEdge C8220X, Double Width Compute Sled 331-4428 Performance Optimized 330-4118 System ordered as part of Multipack order 421-8663 No Factory Installed Operating System, v.2 331-9532 LSI 9202 SAS Controller Cable 430-3638 Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC 342-4851 LSI 9202-16E, LP, Controller, CE 331-8224 C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives 331-8999 SAS Controller Cable, PE-C8220X 342-5079 LSI 2008 SAS Controller Card, 6G, PE C8XXX 317-4928 Dual Processor Option 317-9596 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz 317-9610 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W 318-2308 Thermal Heatsink 318-2308 Thermal Heatsink 317-8810 Memory Filler Blank Dimm Quantity 8 317-8994 64GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs 468-7687 Info, Memory for Dual Processor selection 2.1.0 53 Dell Confidential

342-4820 Hard Drive Carrier 3.5 C8000 - Quantity 4 342-4874 3TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity 4 342-4987 3.5in HDD Enclosure, PE-C8220X 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4841 Hard Drive,2.5 Rear Carrier,C8220 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4861 1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive 342-4983 Hot Plug Hard Drive Carrier, PE-C8220X 934-0626 Dell Hardware Limited Warranty Plus On Site Service Extended Year 934-9845 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 935-0575 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 935-0585 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 996-9927 Dell Hardware Limited Warranty Plus On Site Service Initial Year 900-9997 On-Site Installation Declined Group: 3 Quantity: 2 225-3558 PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives 330-4118 System ordered as part of Multipack order 420-3323 No Factory Installed Operating System 342-3923 3TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity 12 342-4824 Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 - Quantity 12 934-3976 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year 934-4706 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended 934-4716 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year 934-6046 Dell Hardware Limited Warranty Plus On Site Service Initial Year 934-6156 Dell Hardware Limited Warranty Plus On Site Service Extended Year 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 900-9997 On-Site Installation Declined 2.1.0 54 Dell Confidential

Appendix B : Bill of Materials PowerEdge R720xd Nodes SKU Table 34: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720xd Component 225-2110 PowerEdge R720xd Quantity:1 331-4437 PowerEdge R720 Shipping Quantity:1 421-5339 idrac7 Enterprise Quantity:1 430-4418 Broadcom 5720 QP 1Gb Network Daughter Card Quantity:1 342-3566 Chassis with up to 24, 2.5" Hard Drives Quantity:1 318-1375 Bezel Quantity:1 330-5116 Power Saving Dell Active Power Controller Quantity:1 331-4557 Unconfigured RAID for H710P/H710/H310 (1-24 HDDs) Quantity:1 342-3529 PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity:1 317-9595 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-8688 DIMM Blanks for Systems with 2 Processors Quantity:4 317-9609 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-9648 8GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x12quantity:12 331-4424 1600 MHz RDIMMS Quantity:1 331-4428 Performance Optimized Quantity:1 342-1998 HDD, 1TB 7.2K SATA,3G,2.5,HP Quantity:6 331-5914 Electronic System Documentation and OpenManage DVD Kit for R720 and R720xd Quantity:1 331-4433 ReadyRails Sliding Rails With Cable Management Arm Quantity:1 331-4605 Dual, Hot-plug, Redundant Power Supply Quantity:1 330-3151 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 Quantity:2 420-6320 No Operating System Quantity:1 421-5736 No Media Required Quantity:1 936-0967 Dell Hardware Limited Warranty Plus On Site Service Initial Year Quantity:1 936-7243 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended Quantity:1 936-7263 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Quantity:1 939-3398 Dell Hardware Limited Warranty Plus On Site Service Extended Year Quantity:1 989-2701 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year Quantity:1 989-3439 Thank you choosing Dell ProSupport Quantity:1 900-9997 On-Site Installation Declined Quantity:1 926-2979 Proactive Maintenance Service Declined Quantity:1 Add for 10GbE networking support: 430-444 Intel X520 DP 10Gb DA/SFP+ Server Adapter 2.1.0 55 Dell Confidential

Appendix C : Bill of Materials PowerEdge R720 Nodes SKU Table 35: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720 Component 225-2133 PowerEdge R720 331-4437 PowerEdge R720 Shipping 331-4439 Risers with up to 4, x8 PCIe Slots + 2, x16 PCIe Slot 421-5339 idrac7 Enterprise 430-4418 Broadcom 5720 QP 1Gb Network Daughter Card 342-3587 3.5" Chassis with up to 8 Hard Drives 318-1375 Bezel 330-5116 Power Saving Dell Active Power Controller 331-4379 No RAID for H310 (1-16 HDDs) 342-3529 PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity:1 317-9595 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-8688 DIMM Blanks for Systems with 2 Processors Quantity:4 317-9595 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-9648 8GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x4 - Quantity 12 331-4424 1600 MHz RDIMMS 331-4428 Performance Optimized 342-2098 1TB 7.2K RPM Near-Line SAS 6Gbps 3.5in Hot-plug Hard Drive - Quantity 6 310-5171 No System Documentation, No OpenManage DVD Kit 313-9090 DVD+/-RW, SATA, INTERNAL 331-4434 ReadyRails Sliding Rails Without Cable Management Arm 331-4607 Dual, Hot-plug, Redundant Power Supply (1+1), 1100W 330-3150 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 foot, Qty 1 - Quantity 2 420-6320 No Operating System 421-5736 No Media Required 936-4543 Basic Hardware Services: Business Hours (5X10) Next Business Day On Site Hardware Warranty Repair 2 Year Exten 939-2678 Dell Hardware Limited Warranty Plus On Site Service Extended Year 939-2768 Dell Hardware Limited Warranty Plus On Site Service Initial Year 988-9131 994-4019 Basic Hardware Services: Business Hours (5X10) Next Business Day On Site Hardware Warranty Repair Initial Year Basic support covers SATA Hard Drive for 1 year only regardless of support duration on the system 2.1.0 56 Dell Confidential

996-8029 DECLINED CRITICAL BUSINESS SERVER OR STORAGE SOFTWARE SUPPORT PACKAGE-CALL YOUR DELL SALES REP IF UPGRADE NEED 900-9997 On-Site Installation Declined 926-2979 Proactive Maintenance Service Declined Add for 10GbE networking support: 430-444 Intel X520 DP 10Gb DA/SFP+ Server Adapter 2.1.0 57 Dell Confidential

Appendix D : SKU Bill of Materials PowerEdge R720xd Data node Table 36: Data node PowerEdge R720xd Component 225-2110 PowerEdge R720xd Quantity:1 331-4437 PowerEdge R720 Shipping Quantity:1 421-5339 idrac7 Enterprise Quantity:1 430-4418 Broadcom 5720 QP 1Gb Network Daughter Card Quantity:1 342-3566 Chassis with up to 24, 2.5" Hard Drives Quantity:1 318-1375 Bezel Quantity:1 330-5116 Power Saving Dell Active Power Controller Quantity:1 331-4557 Unconfigured RAID for H710P/H710/H310 (1-24 HDDs) Quantity:1 342-3529 PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity:1 317-9595 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-8688 DIMM Blanks for Systems with 2 Processors Quantity:1 317-9609 Intel Xeon E5-2640 2.50GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity:1 331-4508 Heat Sink for PowerEdge R720 and R720xd Quantity:1 317-9648 8GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x4 Quantity:4 317-9647 4GB RDIMM,1600MHz,SV,DR,x8 Quantity:4 331-4424 1600 MHz RDIMMS Quantity:1 331-4428 Performance Optimized Quantity:1 342-1998 HDD, 1TB 7.2K SATA,3G,2.5,HP Quantity:24 331-5914 Electronic System Documentation and OpenManage DVD Kit for R720 and R720xd 331-4433 ReadyRails Sliding Rails With Cable Management Arm Quantity:1 331-4605 Dual, Hot-plug, Redundant Power Supply Quantity:1 330-3151 Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 Quantity:2 420-6320 No Operating System Quantity:1 421-5736 No Media Required Quantity:1 936-0967 Dell Hardware Limited Warranty Plus On Site Service Initial Year Quantity:1 936-7243 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended Quantity:1 936-7263 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Quantity:1 939-3398 Dell Hardware Limited Warranty Plus On Site Service Extended Year Quantity:1 989-2701 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year Quantity:1 989-3439 Thank you choosing Dell ProSupport Quantity:1 900-9997 On-Site Installation Declined Quantity:1 926-2979 Proactive Maintenance Service Declined Quantity:1 Add for 10GbE networking support: 430-444 Intel X520 DP 10Gb DA/SFP+ Server Adapter 2.1.0 58 Dell Confidential

2.1.0 59 Dell Confidential

Appendix E : Bill of Materials PowerEdge C6105 Table 37: Data node PowerEdge C6105 Component SKU PowerEdge C6105 2MB PowerEdge C6105 Chassis w/ 2 System Boards and support for 2.5-inch Hard Drives Operating System No Factory Installed Operating System Processor 2x AMD Opteron 4180, 6C 2.6GHz, 3M L2/6M L3, 1333Mhz Max Mem Memory 48GB Memory(4x4GB/4x8GB),1333 MHz, Dual Ranked RDIMMs for 2 Processors, Low Volt Documentation/Disks C6105 Documentation Hard Drive Controller and Configuration Add-in LSI 2008 SAS/SATA Mezz Card supporting up to 12, 2.5-inch HDs SAS/SATA - No RAID Rails C6100/C6105 Static Rails, Tool-less Hardware Support Services 3 Year ProSupport and NBD On-site Service [225-0024] [420-3323][420-3323] [317-4928][317-5758][317-5766] [317-3699][317-6758][317-6758][468-7687] [331-0589] [342-1050][342-1050][342-1883][342-1883] [330-8483] [925-0527][929-5569[929-5692] [929-5732] [931-6268][935-0180]89-3439] Power Supply [330-8537][330-8537][331-0588] Hard Drive 1TB 7.2K RPM SATA 3Gbps 2.5in Hot Plug Hard Drive Quantity 24 [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] [342-1032][342-1032][342-3063][342-3063] 2.1.0 60 Dell Confidential

Appendix F : Bill of Materials Force10 Network Equipment Table 38: Network Equipment 1GbE Dell Force10 SKU 225-2446 Description Force10, Z9000, 2U, 32 x 40Gbe QSFP+ Ports, 1 AC Pwr Supply, Fan w/io Panel to PSU (Normal) Airflow (Non-Redundant Pwr) 331-5996 Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series 331-5343 Force10, Z9000, AC Power Supply for Chassis with IO Panel to PSU (Normal) Airflow 430-4543 Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m Reach onom3/om4 331-7279 Force10, Z9000 Cable Management Kit 225-2477 225-2479 Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airflow Force10, S4810P, 48 x 10GbE SF P+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, PSU to IO Panel Airflow 331-5103 Force10, S4810, AC Power Supply, IO Panel to PSU Airflow 331-5105 Force10, S4810, AC Power Suppl y, PSU to IO Panel Airflow 331-5258 Force10, Cable, SFP+ to SFP+, 10GbE, Copper Twinax Direct Attach Cable, 2 Meters 331-5996 Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series 421-6981 Force10, Software, L3 Latest Version, S4810 430-4543 Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m Reach onom3/om4 331-5274 Force10, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach 430-4543 Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m Reach onom3/om4 331-5393 Force10, Rear Rack Mounting Bracket, 4 Post, S4810 225-2450 Force10, S60, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x fans, P SU to IO Panel Airflow 331-5233 Force10, SFP+ Expansion Module, 2 x 10 GbE Ports, S60 Series (SFP+ optics required) 331-5996 Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series 331-5226 Force10, S60, AC Power Supply, PSU to IO Panel Airflow 331-5398 Force10, Rear Rack Mounting Bracket, Metal, 4 Post, S60 Force10 S60 2 port, 12G, Stacking module Force10 S60 12 Gig 60cms stacking cable 2.1.0 61 Dell Confidential

Table 39: Network Equipment 10GbE Dell Force10 Quantity SKU Description Cluster Network 50 331-5274 Dell Networking, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach 36 330-8723 2 225-2477 SFP+, Short Range, Optical Transceiver, LC Connector, 10Gb and 1Gb compatible(intel 10G SFP+) Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airflow 2 331-5996 Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series 2 331-5272 Dell Networking, Transceiver, SFP, 1000BASE-LX, 1310nm Wavelength, 10km Reach 2 331-5393 Force10, Rear Rack Mounting Bracket, 4 Post, S4810 2 331-6279 Force10, User Documentation for S4810, DAO/BCC 2 935-0103 SW Support,Force10 Software,3 Years 2 935-0143 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years 2 931-3856 ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, Initial Year 2 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 2 996-2760 Dell Hardware Limited Warranty Extended Year(s) 2 935-0123 ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, 2 Year Extended 2 996-2670 Dell Hardware Limited Warranty Initial Year 2 900-9997 On-Site Installation Declined 2 996-3080 ProSupport for, Force10,Layer 3 Enablement, 1 Year 2 331-9460 Force10, Software, iscsi-optimized Configuration, S4810 1 331-5217 Customer Kit, Dell Networking, Cable, QSFP+, 40GbE SFP+ Passive Copper Direct Attach Cable, 1 Meter Administration Network 1 225-2503 Force10, S55, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airfl (225-2503) 1 331-5233 Forcd10 SFP+ Expansion Module 2x10 Gbe Ports 1 331-5243 Force10, S55, AC Power Supply, IO Panel to PSU Airflow (331-5243) 1 331-5996 Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series (331-5996) 1 331-5252 Force10, Rear Rack Mounting Bracket, 4 Post, S55 (331-5252) 1 331-9233 No Returns Allowed on Dell Force10 Switches (331-9233) 1 331-6271 Force10, User Documentation for S55/S60, DAO/BCC (331-6271) 1 935-1367 Dell Hardware Limited Warranty Initial Year (935-1367) 1 938-7578 Dell Hardware Limited Warranty Extended Year(s) (938-7578) 1 989-3439 Dell ProSupport. For tech support, visit http://support.dell.com/prosupport or call 1-800-945-3355 (989-2.1.0 62 Dell Confidential

3439) 1 995-0592 ProSupport: Next Business Day Parts Delivery, 2 Year Extended (995-0592) 1 995-0622 ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years (995-0622) 1 995-9649 SW Support,Force10 Software,5 Years (995-9649) 1 996-0530 ProSupport: Next Business Day Parts Delivery, Initial Year (996-0530) 1 996-0540 Force10, 5 Year Return To Depot Service, Base Warranty (996-0540) 1 990-9997 On-Site Installation Declined (900-9997) Network Equipment Notes The above list of SKUs includes switches that have specific air flow options. There are both I/O to PSU SKU numbers and PSU to I/O side options available for reverse air flow. Redundant FANs (other than the minimum supplied with chassis) should also be same direction as the base switch. The airflow cannot be reversed in the field at this time. The above list shows the AC power supplies only. All switch models are available in DC as well. 2.1.0 63 Dell Confidential

Appendix G : Bill of Materials Dell 6248 Network Equipment Table 40: Network Equipment Dell 6248 (Optional) Component Description SKU PowerConnect 6248P PowerConnect 6248, 48 GbE Ports, Managed Switch, 10GbE and Stacking Capable [222-6714] Front-end SFP Fiber Transceivers None - Modular Upgrade Bay 1: Modules Stacking Module, 48Gbps, Includes 1m Stacking Cable [320-5171] Modular Upgrade Bay 1: Optics None - Modular Upgrade Bay 2: Modules None - Modular Upgrade Bay 2: Optics None - Cables Stacking Cable, 3m [320-5171] External Redundant Power Supply Hardware Support Services Installation Services Asset Recovery Services None - 3 Year ProSupport and NBD On-site Service No Installation Services Selected None [980-5492] [981-1260] [985-6027] [985-6038] [991-8459] Cables (optional) Stacking Cable, 3m [320-5168] 2.1.0 64 Dell Confidential

Appendix H : Bill of Materials Software and Support Software, training, and support SKUs change regularly, and are related to specific global regions. Please refer to the Hadoop Solution SKUs document on Dell SalesEdge (Dell internal link) or contact your Dell account representative for the latest information. The Sample Bill of Materials appendices include service and support SKUs for the United States. These SKUs need to be changed for other regions. 2.1.0 65 Dell Confidential

Appendix I : Dell Cloudera Solution Components Decoder Ring Source: http://hadoop.apache.org/ 1. Hadoop: http://en.wikipedia.org/wiki/hadoop 2. Hadoop Distributed File System (HDFS): a distributed file system that provides high-throughput access to application data: http://en.wikipedia.org/wiki/hadoop_distributed_filesystem#hadoop_distributed_file_system 3. MapReduce: a software framework for distributed processing of large data sets on compute clusters: http://en.wikipedia.org/wiki/mapreduce 4. Avro: a data serialization system 5. HBase: a scalable, distributed database that supports structured data storage for large tables. 6. Hive: a data warehouse infrastructure that provides data summarization and ad-hoc querying. 7. ZooKeeper: a high-performance coordination service for distributed applications. 8. Pig: a platform for analyzing large data sets that consists of high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. 9. Sqoop: a tool designed to import data from relational databases into Hadoop; Sqoop uses JDBC to connect to a database. 10. Flume: a distributed service for collecting, aggregating, and moving large amounts of log data; its architecture is based on streaming data flows. 11. Oozie: an open-source workflow engine and coordination service to manage data processing jobs within Hadoop. 12. Hue: a browser-based interface for interacting with Hadoop clusters. 13. Crowbar: a toolset provided, supported, and maintained by Dell for system deployment and configuration automation; Crowbar supports the bare-metal bring-up of new hardware and configuration management of existing hardware. 2.1.0 66 Dell Confidential

Appendix J : External References Nagios: www.nagios.org Ganglia: ganglia.sourceforge.net/ Cloudera: www.cloudera.com 2.1.0 67 Dell Confidential

Update History Changes in Version 2.1 The following changes have been made to this guide since the 2.0 release. Added support for the PowerEdge C8000 series Added support for 10Gb Networking on the PowerEdge C8000 and R720 series Updated for Cloudera CDH 4.1 and Cloudera Manager CM 4.1 Removed support for PowerEdge C2100 series (hardware end of life) Removed Support for PowerEdge C6100 series (hardware end of life) To Learn More For more information on the Dell Cloudera Solution, visit: www.dell.com/hadoop 2011 2012 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer s statutory rights. Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc. 2.1.0 68 Dell Confidential