Dell Cloudera Solution Reference Architecture v2.1.0
|
|
|
- Frederick Stone
- 10 years ago
- Views:
Transcription
1 Dell Cloudera Solution Reference Architecture v2.1.0 A Dell Reference Architecture Guide November 2012 Next Generation Cloud Solutions
2 Table of Contents Tables 3 Figures 4 Overview 5 Summary 5 Abbreviations 5 Dell Cloudera Solution 6 Solution Overview 6 Solution Taxonomy 7 Dell Cloudera Solution Hardware Architecture 8 High-level Architecture 8 Dell Cloudera Sizing Terms 9 Server Infrastructure Options 11 Dell Cloudera Solution Network Architecture 28 Network Components 28 Dell Open Switch Solution 34 IPv6 Capabilities 35 Network Connectivity 35 Dell Cloudera Solution Software Architecture 37 Linux File System Configuration Definition 37 Disk Partitioning Recommendation for the Name Node 37 Configuration Parameters: Recommended Values 37 Dell Cloudera Solution Cloudera Enterprise Software 40 Deployment Integration 40 Cloudera Manager 40 Cloudera Support 41 HDFS Highly Available Name Nodes 42 Dell Cloudera Solution Deployment Methodology 43 Site Preparation Needed for the Deployment 43 Dell Cloudera Solution Hardware Monitoring and Alerting 43 Nagios 44 Ganglia 44 Dell Cloudera Solution Security Design 44 What is Available in CDH4? 44 Implementing Secure Hadoop 44 Appendix A : Bill of Materials PowerEdge C8000 Series 46 Appendix B : Bill of Materials PowerEdge R720xd Nodes 55 Appendix C : Bill of Materials PowerEdge R720 Nodes 56 Appendix D : Bill of Materials PowerEdge R720xd Data node 58 Appendix E : Bill of Materials PowerEdge C Appendix F : Bill of Materials Force10 Network Equipment 61 Network Equipment Notes 63 Appendix G : Bill of Materials Dell 6248 Network Equipment 64 Appendix H : Bill of Materials Software and Support Dell Confidential
3 Appendix I : Dell Cloudera Solution Components Decoder Ring 66 Appendix J : External References 67 Update History 68 Changes in Version To Learn More 68 Tables Table 1: Dell Cloudera Solution Use Cases 7 Table 2: Dell Cloudera Solution Software Locations 9 Table 3: Cluster Sizes PowerEdge C8000 Series 12 Table 4: Hardware Configurations PowerEdge C8000 Compute Sleds 13 Table 5: Hardware Configurations PowerEdge C8000 Storage Sleds 13 Table 6: Chassis Configuration PowerEdge C8000 Master Chassis 14 Table 7: Chassis Configuration PowerEdge C8000 High Availability Chassis 14 Table 8: Chassis Configuration PowerEdge C8000 Data Nodes 14 Table 9: Chassis Configuration PowerEdge C8000 Heavy Data Nodes 15 Table 10: Rack Configuration PowerEdge C Table 11: Rack Configuration PowerEdge C8000 Heavy nodes 17 Table 12: Cluster Sizes PowerEdge R720xd 20 Table 13: Hardware Configurations PowerEdge R720xd 20 Table 14: Hardware Configurations PowerEdge R720/R720xd 22 Table 15: Rack Configuration PowerEdge R720xd (or R720/R720xd) 23 Table 16: Cluster Sizes PowerEdge C6105 with PowerEdge R Table 17: Hardware Configurations PowerEdge C6105 with PowerEdge R Table 18: Rack Configuration PowerEdge C Table 19: Single Rack Network Equipment 30 Table 20: Multi Rack Network Equipment 32 Table 21: Dell Cloudera Solution Support Matrix 37 Table 22: HDFS parameters 37 Table 23: mapred parameters 38 Table 24: default environment 38 Table 25: hadoop-env.sh 38 Table 26: /etc/fstab 38 Table 27: hdfs (core-site) 39 Table 28: /etc/security/limits.conf 39 Table 29: Differences between Cloudera Manager Free Edition and Enterprise Edition 40 Table 30: Master Chassis PowerEdge C Table 31: HA Chassis PowerEdge C Table 32: Data Node Chassis PowerEdge C Table 33: Heavy Data Node Chassis PowerEdge C Dell Confidential
4 Table 34: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720xd 55 Table 35: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R Table 36: Data node PowerEdge R720xd 58 Table 37: Data node PowerEdge C Table 38: Network Equipment 1GbE Dell Force10 61 Table 39: Network Equipment 10GbE Dell Force10 62 Table 40: Network Equipment Dell 6248 (Optional) 64 Figures Figure 1: Dell Cloudera Solution Taxonomy 7 Figure 2: Dell Cloudera Hardware Architecture 8 Figure 3: PowerEdge C8000 Chassis 11 Figure 4: PowerEdge 720xd Server 19 Figure 5: PowerEdge C Figure 6 Single Rack Networking Equipment 30 Figure 7 Multi-rack networking equipment 31 Figure 8: Multi-Rack View for 10G Servers Using Force10 s4810 Switches 32 Figure 9: Multi-Rack View for 10G Servers Using Force10 Z9000 (based on Layer-3) 33 Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2) 34 Figure 11: Dell Cloudera Compute PowerEdge R720xd Node Network Interconnects 35 Figure 12 Network Connections 36 Figure 13: HDFS with Highly Available Name Node 43 Figure 14: Kerberos Authentication in Hadoop 45 THIS PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. This document is for informational purposes only. Dell reserves the right to make changes without further notice to the products herein. The content provided is as-is and without expressed or implied warranties of any kind Dell Confidential
5 Overview Summary The document presents the reference architecture of the Dell Cloudera Solution for Apache Hadoop that Dell designed jointly with Cloudera. The reference architecture introduces all the high-level components, hardware, and software that are included in the stack. Each high-level component is then described individually. Abbreviations Abbreviation Definition BMC CDH DMBS EDW EoR HDFS IPMI NIC LOM OS ToR Baseboard management controller Cloudera Distribution for Hadoop Database management system Enterprise data warehouse End-of-row switch/router Hadoop File System Intelligent Platform Management Interface Network interface card Local area network on motherboard Operating system Top-of-rack switch/router Dell Confidential
6 Dell Cloudera Solution Solution Overview The Dell Cloudera Solution lowers the barrier to adoption for organizations intending to use Hadoop in production. Hadoop is an Apache open source project being built and used by a global community of contributors, using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses. Other contributors and users include Facebook, LinkedIn, eharmony, and ebay. However, installing, configuring, and running Hadoop is not trivial. There are different roles and configurations that need to be deployed on various nodes. Designing, deploying, and optimizing the network layer to match Hadoop s scalability requires consideration for the type of workloads that will be running on the Hadoop cluster. These issues are complicated by both the fast-moving pace of the core Hadoop project and the challenges of managing a system designed to scale to thousands of nodes in a cluster. Dell s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on hyperscale hardware. Dell listened to its customers and designed a Hadoop solution that is unique in the marketplace, combining optimized hardware, software, and services to streamline deployment and improve the customer experience. The Dell Cloudera Solution embodies all the hardware, software, resources, and services needed to run Hadoop in a production environment. This end-to-end solution approach means that you can be in production with Hadoop in a shorter time than is typically possible with homegrown solutions. The Dell Cloudera Solution is based on the Cloudera Enterprise distribution of Hadoop, including CDH4. Cloudera has created a quality-controlled distribution of Hadoop and offers commercial management software, updates, support, and consulting services. The hardware platform for the Dell Cloudera Solution is the Dell PowerEdge C or R series. Dell PowerEdge servers are focused on hyperscale and cloud capabilities. Rather than emphasizing gigahertz and gigabytes, these servers deliver maximum density, memory, and serviceability while minimizing total cost of ownership. Dell s solution includes components that that span the entire solution stack: Reference architecture and best practices Optimized server configurations Optimized network infrastructure Dell Crowbar software framework for deployment and management at scale. Cloudera CDH Enterprise software Hadoop infrastructure management tools Monitoring with Ganglia and Nagios Integration with components from other ecosystem partners, including Pentaho and Datameer. One of Dell s core contributions to the Dell Cloudera Solution is a method to rapidly deploy and integrate Hadoop in production through the Dell Crowbar framework. Crowbar automates the deployment of the cluster from bare metal (no operating system installed) all the way to installing and configuring the Cloudera software components to your specific requirements. Going far beyond the capabilities of a simple PXE-boot installer, Crowbar handles system BIOS update and configuration, RAID/SAS configuration, operating system deployment, Hadoop software deployment, Hadoop software configuration, and integration with monitoring and alerting. These complementary functions are designed and implemented side-by-side with Hadoop core technology. This solution provides a foundation for Dell to offer additional solutions as the Hadoop environment evolves and expands Dell Confidential
7 The Dell Cloudera Solution is designed to address the following use cases: Use case Table 1: Dell Cloudera Solution Use Cases Description Data storage The user would like to be able to collect and store unstructured and semi-structured data in a fault-resilient scalable data store that can be organized and sorted for indexing and analysis. Batch processing of unstructured data Data archive Integration with data warehouse The user would like to batch-process (index, analyze, etc.) large quantities of unstructured and semi-structured data. The user would like medium-term (12 36 months) archival of data from EDW/DBMS to increase the length that data is retained or to meet data-retention policies/compliance. The user would like to transfer data stored in Hadoop into a separate DBMS for advanced analytics. Also the user may want to transfer the data from the DBMS back to Hadoop. Solution Taxonomy Figure 1: Dell Cloudera Solution Taxonomy Figure 1 describes the primary components in the Dell Cloudera Solution. The PowerEdge servers, the operating system, and the Java Virtual Machine make up the foundation on which the Hadoop software stack runs. The dark blue layer, depicting the core Hadoop components, comprises two frameworks: The Data Storage Framework is the file system that Hadoop uses to store data on the cluster nodes. Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system. The Data Processing Framework (MapReduce) is a massively-parallel compute framework inspired by Google s MapReduce papers Dell Confidential
8 The next layer of the stack is the network layer. This is a dedicated cluster network, implemented from a blueprint using tested and qualified components. This implementation provides predictable high performance without interference from other applications. The next three frameworks the Orchestration, the Data Access Framework, and the Client Access Tools are utilities that are part of the Hadoop ecosystem and provided by the CDH distribution. Dell Cloudera Solution Hardware Architecture High-level Architecture Figure 2: Dell Cloudera Hardware Architecture The Dell Cloudera environment consists of multiple software services, running on multiple server nodes. The solution implementation divides the server nodes into several categories, and each node has a configuration optimized for its purpose in the cluster. The categories are: Admin Node provides cluster deployment and management capabilities through Crowbar. Master Name Node runs all the services needed to manage the HDFS data storage and MapReduce task distribution and tracking. This is sometimes called the Name Node or Active Name Node. There are two types of services running on the Master Nodes: JobTracker (to support MapReduce job distribution) NameNode (to support HDFS data storage) Secondary Name Node runs the secondary namenode process, which provides a checkpoint function for the Master Name Node. In Active/Standby HA, this node runs a second namenode process, usually called the standby namenode process. In quorum-based HA mode, this node runs the second journal node. HA Node for Active/Standby HA, provides an NFS mount with durable storage that allows the Master Name Node and the Standby Name node to share an edits file so they stay in sync; for Quorum-based HA, provides the third journal node for HA the Master and Secondary Name Nodes provide the first and second journal nodes Dell Confidential
9 Edge Node provides an interface between the data and processing capacity available in the Hadoop cluster and a user of that capacity. The Edge Node is connected to the main access LAN, and is sometimes called a Gateway Node. Data Node runs all the services required to store blocks of data on the local hard drives and execute processing tasks against that data. The majority of the nodes in a cluster are Data Nodes. There are two types of services running on the Data Nodes: TaskTracker Daemon (to support MapReduce job execution) DataNode Daemon (to support HDFS data storage) Daemon JobTracker TaskTracker NameNode Secondary namenode Operating System Provisioning Chef Yum Repositories Cloudera Manager Zookeeper HMaster RegionServer Crowbar Admin Journal Table 2: Dell Cloudera Solution Software Locations Primary Location Master Name Node Data Node(x) Master Name Node Secondary Name Node Admin Node Admin Node Admin Node Edge Node(x) Data Node(x) Master Name Node Data Node(x) Admin Node Master Name Node, Secondary Name Node, HA Node Master Nodes can be further combined based on the size of the solution deployment. Consult with your Dell solution team for guidance on sizing and configuration. In addition to the Hadoop processes listed above, all nodes run additional software, such as Nagios, Ganglia, and chef-client. This software is used for cluster management through Crowbar. Dell Cloudera Sizing Terms The Dell Cloudera Solution Reference Architecture is organized into three components for sizing as the Hadoop environment grows. From smallest to largest, they are rack, pod, and cluster. Each has specific characteristics and sizing considerations documented in this reference architecture. The design goal for the Hadoop environment is to enable you to scale the environment by adding the additional capacity as needed, without the need to replace any existing components. Rack A rack is the smallest size designation for a Hadoop environment. A rack consists of all the necessary power, the network cabling, and the two Ethernet switches necessary to support up to 20 data nodes. These nodes should utilize their own power connectivity and space within the data center, separate from other racks, and be treated as a fault zone Dell Confidential
10 Pod Dell Cloudera Solution Reference Architecture Guide v2.1.0 A pod is an installation composed on three racks, based on server and network sizing. The three racks are capable of supporting enough Hadoop server nodes and network switches for a minimum commercial scale installation. In this reference architecture we discuss the administration and operational infrastructure to support three racks. Cluster A cluster is a set of racks dedicated to Hadoop that can be attached to a pair of distribution switches. It is a set of Hadoop nodes that share the same Name Node and management tools for operating the Hadoop environment. The size of the cluster can vary depending on the capacity of the aggregation network. For example, a Dell Force10 Z9000 aggregation switch can run a larger cluster than the Dell Force10 s4810 switches Dell Confidential
11 Server Infrastructure Options The Dell Cloudera Solution includes three choices for server infrastructure: Dell PowerEdge C8000 series Dell PowerEdge R720(xd) series Dell PowerEdge C6105 series Dell Cloudera Solution Reference Architecture Guide v2.1.0 These alternatives provide density and capacity choices to match customer requirements. The PowerEdge C8000 series and PowerEdge R720 series are recommended for new installations. The following sections describe the configurations required and the rack layouts. PowerEdge C8000 Series The PowerEdge C8000 series is Dell s hyperscale-inspired 4U shared infrastructure server that allows the mixing and matching of compute, storage, and GPU sleds in one chassis. The PowerEdge C8000 chassis holds up to eight single-wide compute PowerEdge C8220 server sleds, up to four double-wide PowerEdge C8220X compute/gpu sleds, or PowerEdge C8000XD storage sleds, or a combination of these, and two power sleds. This design allows the right balance of CPU-to-memory-to-disk ratio and large-scale storage nodes requiring 24 or more hard drives to run big data applications faster. The flexible PowerEdge C8000 can run Master, Slave, and Edge Hadoop nodes and multiple workloads from the same chassis or across racks, allowing for better use of IT resources, lower total cost of ownership over the lifecycle of the server, and more efficient use of space while increasing Hadoop POD compute/storage density and performance. Figure 3: PowerEdge C8000 Chassis PowerEdge C8000 feature summary: Up to eight independently serviceable PowerEdge C8220 compute sleds, four PowerEdge C8220x compute sleds, or four PowerEdge C8000XD storage sleds in a 4U rack chassis Cold aisle service Intel E series processors with up to eight cores and support for up to 130W TDP Up to 256GB of memory with 16 DDR3 slots at 1600MHz per node (512GB RTS+) PowerEdge C8220 Single Width Compute (SWC) Up to 2 x 2.5-inch non-hot-plug hard drives per PowerEdge C8220 compute sled PowerEdge C8220X Double Width Compute (DWC) Up to 12 x 2.5-inch or 4 x 3.5-inch hot-plug hard drives per PowerEdge C8220X compute Up to 2 x 2.5-inch non-hot-plug hard drives per PowerEdge C8220X compute Up to 2 x 2.5-inch hot-plug hard drives per PowerEdge C8220X compute PowerEdge C8000XD Double Width Storage (DWS) Up to 12 x 3.5-inch or 12 x 2.5-inch hot-plug hard drives or 24 x 2.5-inch SSDs per PowerEdge C8000XD storage sled Dell Confidential
12 Cluster Sizing The minimum configuration supported is eight nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes A minimum configuration can be implemented in three PowerEdge C8000 chassis, if one of the data nodes is installed in the HA chassis. When using NFS-based HA, the HA node provides a shared NFS mount. In quorum-based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 3 shows the minimum and maximum numbers of nodes for each rack, pod, and cluster within this reference architecture. Table 3: Cluster Sizes PowerEdge C8000 Series Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node Master Name Node Secondary Name Node HA Node Edge Node Data node To be determined based on sizing criteria To be determined based on sizing criteria To be determined based on sizing criteria Heavy Data node 3 8 (12 in racks 4 and up) To be determined based on sizing criteria Dell Confidential
13 Hardware Configurations Machine Function Sled 1 Active and Secondary Name Node Table 4: Hardware Configurations PowerEdge C8000 Compute Sleds Admin Node, HA Node Edge Node Data Node Heavy Data Node PowerEdge C8220X Processor 2 x E (8-core) RAM (Minimum) 128 GB 64 GB 64 GB LOM Network Controller DISK (onboard) 2 x Intel X520 10GbE NIC, Dual Port, SFP+,Low Profile 2 x 1GbE None Intel X520 10GbE NIC, Dual Port, SFP+,Low Profile DISK (hotswap) N/A 2 x 2.5-in. 1TB 2 x 2.5-inch 1TB DISK (side) 6 x 1 TB 2.5-in. SATA 4 x 3 TB 3.5-in. NL SAS 4 x 3 TB 3.5-in. NL SAS DISK (expansion) None 1 x C8220XD 36Tb 2 x C8220XD 72Tb Storage Controller LSI 2008 (Mezzanine) Storage Controller 2 None LSI 9202 (PCI) RAID RAID 1 JBOD JBOD Table 5: Hardware Configurations PowerEdge C8000 Storage Sleds Machine Function Active and Secondary Name Node Admin Node, HA Node Edge Node Data Node Sled 2 N/A PowerEdge C8220XD DISK N/A 12 x 3 TB 3.5-in. Nearline SAS (NL-SAS) Sled 3 N/A PowerEdge C8220XD DISK N/A 12 x 3 TB 3.5-in. Nearline SAS (NL-SAS) Dell Confidential
14 Table 6: Chassis Configuration PowerEdge C8000 Master Chassis C8220X C8220X C8220X DWC DWC Power Power DWC Empty Empty (Master) (Admin) (Edge) Refer to Table 30 in Appendix A : for the bill of materials for this chassis. Table 7: Chassis Configuration PowerEdge C8000 High Availability Chassis C8220X DWC C8220X DWC Power Power C8220X DWC C8220XD DWS (Secondary) (HA) Refer to Table 31 in Appendix A : for the bill of materials for this chassis Table 8: Chassis Configuration PowerEdge C8000 Data Nodes C8220X DWC C8220XD DWS Power Power C8220X DWC C8220XD DWS Refer to Table 32 in Appendix A : for the bill of materials for this chassis Dell Confidential
15 Table 9: Chassis Configuration PowerEdge C8000 Heavy Data Nodes C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220XD DWS C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220X DWC C8220XD DWS C8220X DWC Power Power C8220XD DWS C8220XD DWS Refer to Table 32 and Table 33 in Appendix A : for the bill of materials for these chassis. The Heavy Data node configuration is ordered in groups of three chassis two heavy data node chassis and one data node chassis Dell Confidential
16 Table 10: Rack Configuration PowerEdge C8000 RU RACK1 RACK2 RACK3 42 R1- Switch 2: 10Gb S4810 R2- Switch2: 10Gb S4810 R3- Switch2: 10Gb S R1- Switch 1: 10Gb S4810 R2- Switch1: 10Gb S4810 R3- Switch1: 10Gb S Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 (1 RU) 37 OR Force10 Z9000 (2 RU) 36 Master Chassis HA Chassis R3 - Switch 1: Force10 S4810 (1 RU) 35 OR Force10 Z9000 (2 RU) 34 Cable Management Cable Management Cable Management 33 Cable Management Cable Management Cable Management R1- Chassis06: Data node x 2 R2- Chassis06: Data node x 2 R3- Chassis06: Data node x 2 Empty Empty S55 idrac Mgmt switch R1- Chassis05: Data node x 2 R2- Chassis05: Data node x 2 R3- Chassis05: Data node x 2 R1- Chassis04: Data node x 2 R2- Chassis04: Data node x 2 R3- Chassis04: Data node x 2 R1- Chassis03: Data node x 2 R2- Chassis03: Data node x 2 R3- Chassis03: Data node x 2 R1- Chassis02: Data node x 2 R2- Chassis02: Data node x 2 R3- Chassis02: Data node x 2 R1- Chassis01: Data node x 2 R2- Chassis01: Data node x 2 R3- Chassis01: Data node x Dell Confidential
17 Table 11: Rack Configuration PowerEdge C8000 Heavy nodes RU RACK1 RACK2 RACK3 42 R1- Switch 2: 10Gb S4810 R2- Switch2: 10Gb S4810 R3- Switch2: 10Gb S R1- Switch 1: 10Gb S4810 R2- Switch1: 10Gb S4810 R3- Switch1: 10Gb S Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management Master Chassis HA Chassis R3 - Switch 1: Force10 S4810 (1 RU) OR Force10 Z9000 (2 RU) 36 R3 - Switch 2: Force10 S4810 (1 RU) 35 OR Force10 Z9000 (2 RU) 34 Cable Management Cable Management Cable Management 33 Cable Management Cable Management Cable Management Empty Empty S55 idrac Mgmt switch R1- Chassis06: Data node x 4 (chassis 1 of 3) R2- Chassis06: Data node x 4 (chassis 1 of 3) R3- Chassis06: Data node x 4 (chassis 1 of 3) R1- Chassis05: Data node x 4 (chassis 2 of 3) R2- Chassis05: Data node x 4 (chassis 2 of 3) R3- Chassis05: Data node x 4 (chassis 2 of 3) R1- Chassis04 Data node x 4 (chassis 3 of 3) R2- Chassis04: Data node x 4 (chassis 3 of 3) R3- Chassis04: Data node x 4 (chassis 3 of 3) R1- Chassis03: Data node x 4 (chassis 1 of 3) R2- Chassis03: Data node x 4 (chassis 1 of 3) R3- Chassis03: Data node x 4 (chassis 1 of 3) R1- Chassis02: Data node x 4 (chassis 2 of 3) R2- Chassis02: Data node x 4 (chassis 2 of 3) R3- Chassis02: Data node x 4 (chassis 2 of 3) R1- Chassis01: R2- Chassis01: Data node x 4 Data node x 4 (chassis 3 of 3) (chassis 3 of 3) NOTE: Four Heavy data nodes require 12U of rack space. R3- Chassis01: Data node x 4 (chassis 3 of 3) Dell Confidential
18 Configuration Notes Appendix A : contains complete bill of materials (BOM) listing for the C8000 server configurations. Support nodes (Name Node, Crowbar Admin Node, HA Node, Edge Node) are configured with the LSI 2008 controller connected to the front hot-swap drives in the PowerEdge C8220X compute sled. The two rear motherboard drives in the PowerEdge C8220x compute sled are not required for any nodes. Data nodes require one PowerEdge C8220XD sled. Data nodes can alternatively be configured with two PowerEdge C8220XD sleds, referred to as heavy data nodes Data nodes use an LSI 9202 PCI HBA to connect to one or two PowerEdge C8220XD storage sleds. The connection requires one SAS extender cable per external sled. The reference BOM s in the appendices are organized by chassis to simplify ordering. Some configurations may require sled blanks for empty slots; the reference BOMs in the appendices account for this. The PowerEdge C8000 series is designed for cold-aisle service, with cabling in front of the chassis. Verify that rack configurations are compatible with this configuration. Be sure to consult your Dell account representative before changing the recommended disk sizes Dell Confidential
19 PowerEdge R720(xd) Server Dell Cloudera Solution Reference Architecture Guide v2.1.0 The PowerEdge R720 and R720xd servers are Dell s 12G PowerEdge mainstream 2S 2U rack servers. They are designed to deliver the most competitive feature set, best performance, and best value. In this generation, Dell offers a large storage footprint, best-in-class I/O capabilities, and more advanced management features. The PowerEdge R720 and R720xd are technically similar except the R720xd has a backplane that can accommodate more drives (up to 24). Figure 4: PowerEdge 720xd Server PowerEdge R720xd feature summary: Intel Romley platform and Intel Xeon E processors 1600MHz DDR3 Network daughter cards for customer choice of LOM speed, fabric, and brand at point of sale PCIe SSD in a front-accessible, hot-plug format Internal GPGPU support Intel Node Manager power management technology Software RAID Platinum efficiency power supplies, common across 600 and 700 series platforms Cluster Sizing The minimum configuration supported is eight nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes When using NFS based HA, the HA node provides a shared NFS mount. In quorum-based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 12 shows the minimum and maximum numbers of nodes for in each rack, pod, and cluster within this reference architecture Dell Confidential
20 Table 12: Cluster Sizes PowerEdge R720xd Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node Master Name Node Secondary Name Node HA Node Edge Node Data node To be determined based on sizing criteria To be determined based on sizing criteria To be determined based on sizing criteria Hardware Configurations Table 13: Hardware Configurations PowerEdge R720xd Machine Function Platform Active and Secondary Name Node Admin Node, HA Node Edge Node Data node PowerEdge R720xd CPU 2 x E (6-core) RAM (Minimum) 96 GB 48 GB LOM 4 x 1GbE DISK Storage Controller 6 x 600-GB 10K SAS 2.5-inch PERC H x 1-TB SATA 7.2K 2.5- inch Dell Confidential
21 RAID RAID 10 Single Drive RAID 0 Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes Dell Confidential
22 Table 14: Hardware Configurations PowerEdge R720/R720xd Machine Function Platform CPU Active and Secondary Name Node Admin Node, HA Node Edge Node Data node PowerEdge R720 PowerEdge R720xd 2 x E (6-core) RAM (Minimum) 96 GB 48 GB LOM 4 x 1GbE DISK Storage Controller 6 x 600-GB 10K SAS 3.5-inch PERC H x 1-TB SATA 7.2K 2.5-inch RAID RAID 10 Single Drive RAID 0 Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes. Refer to the JBOD versus single disk RAID 0 Configuration section for more information Dell Confidential
23 Table 15: Rack Configuration PowerEdge R720xd (or R720/R720xd) RU RACK1 RACK2 RACK3 42 R1- Switch 2: Force10 S60 R2- Switch2: Force10 S60 R3- Switch2: Force10 S60 41 R1- Switch 1: Force10 S60 R2- Switch1: Force10 S60 R3- Switch1: Force10 S60 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 Master Name Node:R720xd or R720 Edge01: R720xd or R R3 - Switch 2: Force10 S Cable Management Cable Management Cable Management 35 Cable Management Cable Management Cable Management Admin Node R720xd or R720 Secondary Name Node R720xd or R720 HA Node: R720xd or R R3 - S55 idrac Mgmt switch Empty Empty 31 Empty Empty Empty Empty R1- Chassis10: R720xd R2- Chassis10: R720xd R3- Chassis10: R720xd R1- Chassis09: R720xd R2- Chassis09: R720xd R3- Chassis09: R720xd R1- Chassis08: R720xd R2- Chassis08: R720xd R3- Chassis08: R720xd R1- Chassis07: R720xd R2- Chassis07: R720xd R3- Chassis07: R720xd R1- Chassis06: R720xd R2- Chassis06: R720xd R3- Chassis06: R720xd R1- Chassis05: R720xd R2- Chassis05: R720xd R3- Chassis05: R720xd R1- Chassis04: R720xd R2- Chassis04: R720xd R3- Chassis04: R720xd R1- Chassis03: R720xd R2- Chassis03: R720xd R3- Chassis03: R720xd R1- Chassis02: R720xd R2- Chassis02: R720xd R3- Chassis02: R720xd R1- Chassis01: R720xd R2- Chassis01: R720xd R3- Chassis01: R720xd Dell Confidential
24 Configuration Notes Appendix B :, Appendix C :, and Appendix D : contain the full bill of materials (BOM) listing for the PowerEdge R720 and R720Xd server configurations. The R720 and R720xd configurations can be used with 10GbE networking. To use 10GbE networking support, an additional network card is required in each node refer to the BOM for the details on the supported card. JBOD versus single disk RAID 0 Configuration The Hadoop community s strong advocacy for the non-raided drives configuration known as Just a Bunch of Disks, or JBOD, has caused some confusion for readers of our reference architecture. We fully endorse this approach but feel a need for clarification because there are multiple valid ways to achieve this configuration. Normally, the optimum disk configuration for Hadoop data nodes is considered to be JBOD mode rather than RAID. This is because HDFS provides its own data replication, eliminating the need for the redundancy provided by RAID levels 1-6. HDFS also implements efficient round robin parallel I/O across multiple drives, eliminating the need for the parallelism provided by the striping capabilities of RAID 0. Some drive controllers support only RAID mode, and so can't be used in a plain host bus adapter (HBA) mode for JBOD. For these situations, configuring the controllers as multiple RAID 0 arrays allows HDFS to own them as a single drive. In this configuration, the controller is effectively operating just like a standard HBA in JBOD mode, and the RAID 0 and JBOD performance characteristics are comparable. While having a RAID controller adds a minor latency, it is offset by adaptive read-ahead caching. PowerEdge C6105 Server The PowerEdge C6105 is Dell s hyperscale inspired building block for high-performance cluster computing (HPCC), Web 2.0 environments, and cloud builders where performance and power consumption are key. It is purpose-built for scale-out rack deployments, large homogenous cloud/cluster application environments where density is required and the software stack provides platform availability and resiliency. Figure 5: PowerEdge C6105 PowerEdge C6105 feature summary Up to 4 server nodes in 2U 2 x Opteron x DDR3 RDIMM 24 x 2.5-inch or 12 x 3.5-inch HDD 2 x 1GbE Intel Dell Confidential
25 Cluster Sizing The minimum configuration supported is 8 nodes: Crowbar Administration Node Master Name Node Secondary Name Node High Availability (HA) Node Edge (or Gateway) Node Three Data nodes Configurations based on the C6105 use R720 servers for the infrastructure nodes, and C6105 for data nodes. When using NFS based HA, the HA node provides a shared NFS mount. In quorum based HA mode, this node is used as one of the three required quorum nodes, so the node counts remain the same. Table 16 shows the minimum and maximum numbers of nodes for in each rack, pod, and cluster within this reference architecture. Table 16: Cluster Sizes PowerEdge C6105 with PowerEdge R720 Machine Function Min Per Rack Max Per Rack Min per pod Max per pod Min per cluster Max Per Cluster Admin Node Master Name Node Secondary Name Node HA Node Edge Node To be determined based on sizing criteria To be determined based on sizing criteria Data node 2 PowerEdge C6105 chassis (3 nodes) 10 PowerEdge C6105 chassis (20 nodes) 2 PowerEdge C6105 chassis (3 nodes) 30 chassis (60 nodes) 30 chassis (60 nodes) To be determined based on sizing criteria Dell Confidential
26 Hardware Configurations Machine Function Table 17: Hardware Configurations PowerEdge C6105 with PowerEdge R720 Edge Node, Active and Secondary Name Node, Admin Node, HA Node Data node Platform PowerEdge R720 PowerEdge C6105 (2-node) CPU 2 x E (6-core) 2 x Opteron 4180 (6-core) RAM (Minimum) LOM 96 GB 48 GB 4 x 1GbE 2 x 1GbE Add-in NIC None None DISK 6 x 600-GB 10K SAS 2.5-inch 12 x 1-TB SATA 7.2K 2.5-inch Storage Controller PERC H710 LSI2008 RAID RAID 10 JBOD Notes: Be sure to consult your Dell account representative before changing the recommended disk sizes. Configuration Notes Appendix E : contains the full bill of materials (BOM) for the PowerEdge C6105 data node configurations. Appendix C : contains the BOM for the PowerEdge R720 infrastructure nodes that should be used with this configuration Dell Confidential
27 Table 18: Rack Configuration PowerEdge C6100 RU RACK1 RACK2 RACK3 42 R1- Switch 2: Force10 S60 R2- Switch2: Force10 S60 R3- Switch2: Force10 S60 41 R1- Switch 1: Force10 S60 R2- Switch1: Force10 S60 R3- Switch1: Force10 S60 40 Cable Management Cable Management Cable Management 39 Cable Management Cable Management Cable Management 38 R3 - Switch 1: Force10 S4810 Master Name Node :R720 Edge01: R R3 - Switch 2: Force10 S Cable Management Cable Management Cable Management 35 Cable Management Cable Management Cable Management Admin: R720 Secondary Name Node R720 HA Node R Empty Empty Cable Management 31 Empty Empty Cable Management Empty Empty Empty R1- Chassis10: C6105 (2-Node) R2- Chassis10: C6105 (2-Node) R3- Chassis10: C6105 (2-Node) R1- Chassis09: C6105 (2-Node) R2- Chassis09: C6105 (2-Node) R3- Chassis09: C6105 (2-Node) R1- Chassis08: C6105 (2-Node) R2- Chassis08: C6105 (2-Node) R3- Chassis08: C6105 (2-Node) R1- Chassis07: C6105 (2-Node) R2- Chassis07: C6105 (2-Node) R3- Chassis07: C6105 (2-Node) R1- Chassis06: C6105 (2-Node) R2- Chassis06: C6105 (2-Node) R3- Chassis06: C6105 (2-Node) R1- Chassis05: C6105 (2-Node) R2- Chassis05: C6105 (2-Node) R3- Chassis05: C6105 (2-Node) R1- Chassis04: C6105 (2-Node) R2- Chassis04: C6105 (2-Node) R3- Chassis04: C6105 (2-Node) R1- Chassis03: C6105 (2-Node) R2- Chassis03: C6105 (2-Node) R3- Chassis03: C6105 (2-Node) R1- Chassis02: C6105 (2-Node) R2- Chassis02: C6105 (2-Node) R3- Chassis02: C6105 (2-Node) R1- Chassis01: C6105 (2-Node) R2- Chassis01: C6105 (2-Node) R3- Chassis01: C6105 (2-Node) Dell Confidential
28 Dell Cloudera Solution Network Architecture The Dell Cloudera Solution uses Dell Force10 S60 or Dell PowerConnect 6248 (optional) Gigabit Ethernet switches as the top-of-rack connectivity to all Hadoop-related nodes. This reference architecture is used to support consistency in rapid deployments through the minimal differences in the network configuration. This reference architecture implements at a minimum three distinct, separate VLANs: Hadoop Cluster Data LAN connects the compute node NICs into the fabric used for sharing data and distributing work tasks among compute nodes. Hadoop Cluster Management LAN connects all the idrac/bmcs in the cluster nodes. Hadoop Cluster Edge LAN connects the cluster to the outside world. The network consists of three major network infrastructure layouts: Data network infrastructure the data network consists of the server NICs, the top-of-rack (ToR) switches, and the aggregation switches. Management network infrastructure the BMC management network, consisting of idrac ports and the out-of-band management ports of the switches, is aggregated into a 1-RU s55 switch in one of the three racks in the POD. This 1-RU switch in turn can connect to one of the Aggregation or Core switches to create a separate network with a separate VLAN. Core network infrastructure the connectivity of aggregation switches to the core for external connectivity. Network Components The data network is primarily composed of the ToR and the aggregation switches. Configurations for 1GbE and 10GbE are included in this reference architecture. The following component blocks make up this network: Server Nodes Server connections to the network switches could be one of four possible configurations: Active-Active LAG in load-balance bond formation Active-Backup in failover/failback formation Active-Active round robin based on gratuitous ARP Single port In the first case the connectivity on the switch side must be in a LAG (or port-channel). In cases B and C, we recommend that you do the configuration as a LAG but the ports should still be part of the same layer-2 domain. In some cases all members of the LAG connect to a single ToR switch. In others the LAG splits into two ToR switches. This is an optional setup as Hadoop has redundancy built into the application, and highavailability is not compromised by connecting into a single switch. The teaming configuration that Dell recommends is balanced-alb (mode = 6). This configuration setting is explained in greater deal in the Dell Cloudera Solution Deployment Guide. Please contact your sales representative for a copy of the deployment guide. The Dell Crowbar deployment software automatically configures this setting for Hadoop environments. Access Switch or Top of Rack (ToR) The servers connect to ToR switches. Typically there are two in each rack. The switches recommended by Dell are the Force10 S60 for 1GbE connectivity and S4810 for 10G servers. The 1GbE option is for PowerEdge C2100 and PowerEdge C6100 servers, while the PowerEdge C8000 requires the 10GbE option. PowerEdge R720 configurations can use 1GbE or 10GbE. The Force10 S60 ToR switches stack together in the same rack for 1GbE. This is useful in managing the two switches as a single unit and allowing the servers to connect into two different switches for redundancy. The ToR switches each have two expansion slots that can accept a two-port 10G module or a two-port stacking module. This architecture recommends one of each type in the two slots. The 10GbE module would be used to connect into the pod-interconnect switches, one port to each switch, forming a LAG. The stacking module connects the switches together as a single unit. The uplinks to the aggregation pair would be a single LAG of Dell Confidential
29 four 10GbE ports, two from each switch. Each rack connects to the pod-interconnect independently; thereby scaling is easier. For the 10GbE configuration, the ToR switches are Force10 S4810, and we recommend this pair of switches run a high availability feature called the Virtual Link Trunking (VLT). This feature allows the servers to terminate their LAG interfaces into two different switches instead of one. This allows HA as well as active-active bandwidth utilization. This feature gives redundancy within the rack if one switch fails or needs maintenance. The uplink to the aggregation pair is 80Gb, using a LAG from each ToR switch. This is achieved using two 40G interfaces in a LAG connecting to the aggregation pair. Therefore, from each rack there is a collective bandwidth of 160G available. Each rack is managed as a separate entity from a switching perspective, and ToR switches connect only to the aggregation switches. Aggregation Switches For a medium-scale deployment of one to three PODs of 1G servers (12 racks max) the Dell Force10 S4810 is the recommended aggregation switch. It is both 10GbE and 40GbE capable. The 40GbE interfaces on the S4810 could be converted into four 10GbE, thereby converting this switch into 64 10GbE-capable ports. This potentially scales Hadoop deployments into tens of nodes. Hadoop ToR switches connect to aggregate switches via uplinks of 10GbE interfaces from the ToR Force10 S60 to the Force10 S4810. The recommended architecture uses Virtual Link Trunking (VLT) between the two Force10 S4810 switches in aggregation. This feature enables a multi-chassis LAG from the stacked ToR switches in each rack. The stacks in each rack would divide their links between this pair for switches to achieve the powerful capability of activeactive forwarding while using full bandwidth capability, in absence of any requirement for spanning tree. The aggregation switches also run layer-3 from the ToR as layer-2 alone is not a requirement from Hadoop s perspective. Therefore, for scaling to large deployments, layer-3 routing is a good option. Running 40GbE Ethernet switches like the Dell Force10 Z9000 in aggregation can achieve a scale of up to hundreds of 1G deployed nodes. For the 10G server deployment, depending on the scale at which the PODs are planned and depending on the how much future scale is required, we recommend the Force10 S4810 for aggregation for smaller scale and the Force10 Z9000 for larger deployment. The Force10 Z9000 is a 32-port, 40G high-capacity switch. It can aggregate up to 15 racks of high-density PowerEdge C8000 servers. The rack-to-rack bandwidth needed in Hadoop would be most suitably handled by a 40G-capable, non-blocking switch. The Force10 Z9000 can provide a cumulative bandwidth of 1.5TB of throughput at line-rate traffic from every port. Core The aggregation layer could itself be the network core in many cases but where it s not, it would connect to a larger core, which is represented by the cloud in Figure 7. Details on this topic are beyond the scope of this document. Layer-2 and Layer-3 The layer-2 and layer-3 boundaries are separated at either the ToR or the aggregation layer. Either of the options is equally viable. The colors blue and red in Figure 7 represent the layer-2 and layer-3 boundaries. This document uses layer-2 as the reference up to the aggregation layer. That is why VLT is used on the aggregation switches. Single Rack Configuration Figure 6 shows the single rack equipment. Dell recommends using Force10 S60 ToR switches in the rack. Each rack could have a maximum of 20 servers in some configurations, while a dense packing of sleds in the C- series server chassis can hold even more. Each rack has two ToR Force10 S60 switches that are stacked, and this stack connects to the two Force10 S4810 aggregation switches. The Force10 S60 stack offers a single switch view to the servers. Each Data node can have up to four 1G NIC ports. It forms a LAG of two ports with one port on each switch in the stack. The LAG of 2GbE offers a switch redundancy within the rack and enables high availability Dell Confidential
30 Table 19: Single Rack Network Equipment Total Racks 1 (6-20 nodes) Top-of-rack switch 2 Force10 S60 (2 per rack) Aggregation switch Not needed for a single rack Server 2RU PowerEdge R720/R720xd/C2100/C6100 Over-subscription at ToR 1:1 Modules in each ToR 1x 12-2port Stacking, 1x 10G -2 port uplink Figure 6 Single Rack Networking Equipment Figure 7 shows the Force10 S4810 switch aggregating the pods to enable inter-rack traffic and the management network. There are two separate VLANs for data and management. All port-channels on the Force10 S4810 and ToR are tagged in these two VLANs Dell Confidential
31 Figure 7 Multi-rack networking equipment The following table shows the network inventory details in a cluster of three racks Dell Confidential
32 Table 20: Multi Rack Network Equipment Total racks Top-of-rack switch Pod-interconnect switch Server 3 (15-20 nodes per rack) 6 Force10 S60 (2 per rack) 2 Force10 S4810 2RU PowerEdge R720/R720xd/C2100 Over-subscription at ToR 1:1 Modules in each ToR 1x 12-2port Stacking, 1x 10G -2 port uplink Multi-rack configuration for 10GbE with Force10 S4810 In this reference architecture we define a 10G solution with the PowerEdge C8000 and R720X servers. Hadoop applications are increasingly being deployed on 10GbE servers for the scale and price advantages they bring. That brings about an enormous economy of scale in the usage of hardware. That in turn needs 10GbE switches in the racks. This can be achieved using the Force10 S4810 as a top-of-rack switch and the option of using Force10 S4810 or Z9000, the 10G/40G high-density switch in the aggregation. The scale that is achieved by that configuration can grow into thousands of nodes using a CLOS architecture, which was used in the 600-node 1GbE solution above. Running 40GbE switches like the Force10 Z9000 in aggregation can achieve a scale of hundreds of nodes using high-density data center class switches. Figure 8: Multi-Rack View for 10G Servers Using Force10 s4810 Switches In Figure 8 we see that each rack with a pair of switches aggregates into the pair of Force10 S4810 switches. This connection could be based on layer-2, if the aggregation runs on the VLT feature. Alternately it could run on layer-3 point-to-point routing using some routing protocol such as OSPF. In both cases all links utilize the full bandwidth on all links due to load-balancing. The first scenario creates a complete layer-2 domain between all racks in the cluster. The Force10 s4810-based aggregation design is preferred for lower cost and medium scalability. This design can handle up to six racks or two PODs. The way this connectivity works is that the ToR Force10 s4810 uplinks using its 40G interface in quad-mode where each 40G interface runs as 4x 10G. The figure below shows the cables needed for this design. If using a passive copper break-out cable there is no need to any QSFP+ or SFP+ other than the cable as these are built into the copper twin-ax cable. This makes the arrangement cost less than the fiber option where there is a need for a QSFP+ optic and four SFP+ optics plus the break-out fiber. The benefit of the fiber option is the longer reach achieved with it compared to twin-ax, which is limited to 5 meters Dell Confidential
33 Multi-rack Configuration for 10GbE Using Force10 Z9000 Switches For a scale-out version of the deployment that is looking to expand its Hadoop environment into a larger setup or needs the Hadoop cluster co-located with other applications in different racks, the recommended option is the Force10 Z9000 core switch. The Force10 Z9000 does not need to connect into any other higher-tier core switches as the capacity is enough for a data center with hundreds of servers. Figure 9: Multi-Rack View for 10G Servers Using Force10 Z9000 (based on Layer-3) Figure 9 shows a routed version where each s4810 in the rack has a layer-3 connection to the core switch. LAG is not required for this option as each connection could be a point-to-point routed link. Optionally each switch could form a Layer-2 LAG as shown in Figure 10. This assumes that the Z9000 pair in the aggregation forms a VLT pair for HA. Now we have 2 tiers of VLT, one forming at the ToR for servers and another at the aggregation for the top of rack switches. Each option has its own merits and equally recommended depending how the user sees it fit the larger network in the Data Center Dell Confidential
34 Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2) In Figure 10 we see an example of a CLOS fabric that grows horizontally. This technique of network fabric deployment has been used by some of the largest Web 2.0 companies, whose businesses range from social media to public cloud in their data centers. Some of the largest recent Hadoop deployments also use this new approach to networking. Dell Data Center Solutions has hands-on experience in building Hadoop and Big Data analytics farms while Dell Force10 is a trusted vendor in the field of networking. Dell can help an enterprise solve its Big Data needs with a scalable end-to-end solution. Management Network The management network of all the servers and switches is aggregated into a Dell Force10 S55 switch that is located in rack-3 of the POD. It uplinks on a 10G link to the aggregation switches or the core directly wherever the split for out-of-band is required. Dell Open Switch Solution In addition to the Dell switch-based reference architecture, Dell provides an open standard that allows you to choose other brands and configurations of switches for your Hadoop environment. The following list of requirements will enable other brands of switches to properly operate with the tools and configurations in the Dell Hadoop Reference Architecture: Support for IEEE 802.1Q VLAN traffic and port tagging Ability to provide a minimum of 170 Gigabit Ethernet ports in a non-blocking configuration within VLAN 100 o Configuration can be a single switch or a combination of stacked switches to meet the additional requirements The ability to create link aggregation groups (LAGs) with a minimum of two physical links in each LAG If multiple switches are stacked: o The ability to create a LAG across stacked switches o Full-bisection bandwidth o Support for VLANs to be available across all switches in the stack The ability to provide a minimum 65 10/100 Ethernet ports on the untagged VLAN 250,000 packets-per-second capability per switch The ability to provide 12 10Gb ports for redundant uplinks contained in VLAN 10 A managed switch that supports SSH and serial line configuration SNMP v3 support Dell Confidential
35 IPv6 Capabilities Dell Cloudera Solution Reference Architecture Guide v2.1.0 At this time, the Dell Cloudera Solution does not support or allow for the use of IPv6 for network connectivity. All deployments are configured by Crowbar based on IPv4, with IPv6 explicitly disabled on all nodes within the Hadoop environment. Network Connectivity The network interconnects between various hardware components of the Hadoop solution are depicted in Figure 11 and Figure 12. For more information, please see the deployment guide. BMC NIC1-4 Figure 11: Dell Cloudera Compute PowerEdge R720xd Node Network Interconnects Dell Confidential
36 Figure 12 Network Connections Dell Confidential
37 Dell Cloudera Solution Software Architecture Table 21: Dell Cloudera Solution Support Matrix RA Version OS Version Hadoop Version Available Support Supported JVM 2.1 Red Hat Enterprise Linux 6.2 CDH 4.1 Cloudera Manager 4.1 Dell Hardware support Cloudera Hadoop support Red Hat Linux support Sun Oracle JVM 2.1 CentOS 6.2 CDH 4.1 Cloudera Manager 4.1 Dell Hardware support Sun Oracle JVM Linux File System Configuration Definition Dell and Cloudera recommend and support the use of ext4 for all HDFS disks. Disk Partitioning Recommendation for the Name Node All disk configuration parameters are documented in the Dell Cloudera Solution Deployment Guide, as well as Linux Kickstart scripts for proper configuration at the time of operating system installation. Configuration Parameters: Recommended Values Table 22: HDFS parameters Property Description Value dfs.block.size Lower value offers parallelism (128Mb) dfs.name.dir dfs.datanode.handler.count dfs.namenode.handler.count dfs.datanode.du.reserved Comma-separated list of folders (no space) where a SlaveNode stores its blocks Number of handlers dedicated to serve data block requests in Hadoop SlaveNodes More Master Node server threads to handle RPCs from large number of SlaveNodes The amount of space on each storage volume which HDFS should not use /mnt/hdfs/hdfs01/meta1 16 (Start 2 x CORE_COUNT in each SlaveNode ) Start with 10, increase large clusters (Higher count will drive higher CPU, RAM, and network utilization) 10M dfs.replication Data replication factor; default is 3 3 (default) fs.trash.interval dfs.permissions Time interval between HDFS space reclaiming 1440 (minutes) true (default) dfs.datanode.handler.count 8 dfs.data.dir Hadoop Data Node Location /mnt/hdfs/hdfs01/data1/hdfs commaseparated through /mnt/hdfs/hdfs01/datan/hdfs Dell Confidential
38 Table 23: mapred parameters Property Description Value mapred.child.java.opts mapred.job.tracker mapred.job.tracker.handler.count mapred.reduce.tasks mapred.local.dir mapred.tasktracker.map.tasks. maximum mapred.tasktracker.reduce.tasks. maximum Larger heap-size for child JVMs of maps/reduces. Hostname or IP address and port of the JobTracker More JobTracker server threads to handle RPCs from large number of TaskTrackers The number of Reduce tasks per job Comma-separated list of folders (no space) where a TaskTracker stores runtime information Maximum number of map tasks to run on the node Maximum number of reduce tasks to run per node -Xmx1024M namenode:8021 Start with 32, increase large clusters (Higher count will drive higher CPU, RAM, and network utilization) Set to a prime close to the number of available hosts /mnt/hdfs/hdfs01/data1/mapred comma-separated through /mnt/hdfs/hdfs01/datan/mapred 2 + (2/3) * number of cores per node 2 + (1/3) * number of cores per node mapred.child.ulimit mapred.map.tasks.speculative. execution mapred.reduce.tasks.speculative. execution FALSE FALSE mapred.job.reuse.jvm.num.tasks 1 Table 24: default environment Property Description Value SCAN_IPC_CACHE_LIMIT LOCAL_JOB_HANDLER_COUNT Number of rows cached in search engine for each scanner next call over the wire; it reduces the network round trip by 300 times caching 300 rows in each trip Number of parallel queries executed at one go; query requests above this limit are queued up Table 25: hadoop-env.sh Property Description Value java.net.preferipv4stack HADOOP_*_OPTS true -Xmx2048m Table 26: /etc/fstab Property Description Value Dell Confidential
39 File system mount options data=writeback,nodiratime, noatime Table 27: hdfs (core-site) Property Description Value io.file.buffer.size fs.default.name fs.checkpoint.dir The size of buffer for use in sequence files; the size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations The name of the default file system; a URI whose scheme and authority determine the file system implementation Comma-separated list of directories on the local file system of the Secondary Master Node where its checkpoint images are stored (64Kb) hdfs://namenode:8020 TBD io.sort.factor 80 Io.sort.mb 512 Table 28: /etc/security/limits.conf Property Description Value mapred nofile hdfs nofile hbase nofile Dell Confidential
40 Dell Cloudera Solution Cloudera Enterprise Software Dell Cloudera Solution Reference Architecture Guide v2.1.0 The Dell Cloudera Solution is based on Cloudera Enterprise, which includes Cloudera s distribution for Hadoop (CDH) 4.1 and Cloudera Manager. Deployment Integration The Dell Cloudera Solution v2.1 uses Crowbar to deploy the infrastructure: BIOS and RAID configurations, and the base operating system. Cloudera Manager is used to deploy the rest of the Hadoop software stack. Cloudera Manager installs the core Hadoop components (HDFS and Map Reduce) and some of the ecosystem components. Pig, Hive, and Sqoop ecosystem components have to be installed via Crowbar barclamps. Please refer to the Crowbar Administration User Guide for details on how to deploy Cloudera Manager and the ecosystem services supported by the solution. Cloudera Manager Cloudera Manager deploys and centrally operates a complete Hadoop stack. The application automates the installation process, thereby reducing deployment time from weeks to minutes. These are the functional characteristics of Cloudera Manager: Provides a cluster-wide, real-time view of the services running and the status of their hosts Provides a single, central place to enact configuration changes across the cluster Incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization Provides full lifecycle management for Hadoop deployments Enables the configuration of server roles and services across the cluster Provides the interface to gracefully start, stop, and restart services as needed Cloudera Manager Editions There are two editions of Cloudera Manager: the Free Edition and the Enterprise Edition. A license is not needed to use the Free Edition but the number of hosts supported is limited to 50. The Enterprise Edition supports an unlimited number of hosts, requires a license, and provides service monitoring and additional management features that are not included in the Free Edition. Table 29: Differences between Cloudera Manager Free Edition and Enterprise Edition Cloudera Manager Editions Free Enterprise Maximum Number of Nodes Supported 50 Unlimited Automated Deployment & Hadoop Readiness Checks Comprehensive API Service & Configuration Management Deploy & Configure HDFS, MapReduce, Flume, HBase, Hue, Impala*, Oozie & Zookeeper Services Configure High Availability & Federation Automated Configuration Client Configuration Management Audit Trail Add/Delete/Stop/(Re)Start/Decommission Role Instances Configuration Versioning & History Service Monitoring & Management Monitor HDFS, MapReduce, Hue, Flume, Hue, Oozie & Zookeeper (Cloudera Enterprise Core) Monitor HBase (Cloudera Enterprise RTD) Dell Confidential
41 Monitor Impala* (Cloudera Enterprise RTQ) Proactive Health Checks Status & Health Summary Heatmaps/Performance Monitoring Host Monitoring Security LDAP Authentication Kerberos Configuration Multi-Cluster Management Intelligent Log Management Events Management & Alerts Activity Monitoring Operational Reporting File Browser & Quota Management Global Time Control Support Integration Maintenance Mode Cloudera Support As the use of Hadoop grows and an increasing number of groups and applications move into production, your Hadoop users will expect greater levels of performance and consistency. Cloudera s proactive productionlevel support gives your administrators the expertise and responsiveness they need. Cloudera Support includes: Flexible Support Windows Choose 8 5 or 24 7 to meet SLA requirements. Configuration Checks Verify that your Hadoop cluster is fine-tuned for your environment. Escalation and Issue Resolution Resolve support cases with maximum efficiency. Comprehensive Knowledge Base Expand your Hadoop knowledge with hundreds of articles and tech notes. Support for Certified Integration Connect your Hadoop cluster to your existing data analysis tools. Proactive Notification Stay up-to-speed on new developments and events. With Cloudera Enterprise, you can leverage your existing team s experience and Cloudera s expertise to put your Hadoop system into effective operation. Built-in predictive capabilities anticipate shifts in the Hadoop infrastructure to support reliable function. Cloudera Enterprise makes it easy to run open source Hadoop in production, by: Dell Confidential
42 Simplifying and accelerating Hadoop deployment Reducing the costs and risks of adopting Hadoop in production Reliably operating Hadoop in production with repeatable success Applying SLAs to Hadoop Increasing control over Hadoop cluster provisioning and management HDFS Highly Available Name Nodes The Dell Cloudera Solution v2.1 includes CDH 4.1 and Cloudera Manager 4.1, which feature Highly Available Name Nodes for HDFS. There are two options for name node high availability: Active/Passive using Shared Storage Quorum-based Only one of these can be used in a single cluster. Quorum-based high availability is recommended for new installations. Additional information about the high availability features of CDH4 can be found in the CDH4 High Availability Guide ( ) Active/Passive HA using Shared Storage This configuration will set up name nodes in an Active/Passive Configuration. The Active Name Node (formerly Master Name Node) is responsible for all client operations in the cluster, while the Standby (formerly Secondary Name Node) is simply acting as a slave, maintaining enough shared state via the filer to provide a fast failover if necessary. In order for the Standby node to keep its state synchronized with the Active node, the active/passive implementation requires that the two nodes both have access to a directory on a shared storage device, in this case an NFS mount from the HA Node. ( Quorum-based HA In Quorum-based HA, the Standby node keeps its state synchronized with the Active node by communicating with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JournalNodes. The Standby node is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs. There should be an odd number (and at least three) JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. The JournalNode daemons run on the Master, Secondary, and HA nodes in this reference architecture Dell Confidential
43 Figure 13: HDFS with Highly Available Name Node Dell Cloudera Solution Deployment Methodology Site Preparation Needed for the Deployment The heating, ventilation, air conditioning (HVAC) and power requirements for deployment can be estimated using the Dell Energy Smart Solution Advisor at: =biz Using this tool, you can plan the needs for your solution, order the correct PDUs, and have the proper HVAC ready for the installation. Detailed deployment instructions are documented in the Dell Cloudera Solution Deployment Guide. Dell Cloudera Solution Hardware Monitoring and Alerting To automate the alert and response to unexpected events and failures within the Dell Cloudera Solution, the software stack includes Nagios and Ganglia. The Dell Cloudera Solution includes capabilities for three primary components of the monitoring environment: Monitoring of cluster activities The Dell Cloudera Solution utilizes Nagios to monitor the cluster, including hardware, software, and users. The Nagios deployment as part of the Dell Cloudera Solution will keep historical information regarding system availability, maintenance, and failure events. Alerts on unexpected events The Dell Cloudera Solution utilizes Nagios to alert system operations staff to events that occur that deviate from normal operation, if the administrator has designated them for notification. Debugging of cluster runtime operations The Dell Cloudera Solution utilizes Cloudera Enterprise to provide the users and administrators of the Hadoop environment with the necessary tools for tracking, debugging, and monitoring job performance and characteristics. The Dell Cloudera Solution is designed to include the necessary components to monitor and respond to events in your Hadoop environment. It is flexible enough to allow integration with existing operations management frameworks in your environment Dell Confidential
44 The monitoring components of the Dell Cloudera Solution Reference Architecture are designed to be proactive in nature; they alert the IT operations team when a failure in the environment occurs, and they do so before the failure causes an outage that affects product workloads and users. Nagios Nagios is an open source solution for enterprise monitoring. Its pluggable architecture allows for consistent event handling. It supports a wide variety of sensors, plug-ins, applications, servers, and hardware platforms. The Dell Cloudera Solution includes Nagios as part of all default installations. The Dell Cloudera Solution will automatically install the Nagios console and the necessary Nagios plug-ins for monitoring the Hadoop cluster, including processes, operating systems, and physical servers. Ganglia Ganglia is a scalable distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies, using carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust. It has been ported to an extensive set of operating systems and processor architectures used to link clusters across university campuses and around the world. It can scale to handle clusters with 2,000 nodes. The Dell Cloudera Solution automates the installation and configuration of Ganglia within the Hadoop cluster, enabling IT operations staff to have detailed reporting on the status and utilization of all Hadoop nodes. Dell Cloudera Solution Security Design What is Available in CDH4? Cloudera s CDH4 release offers the following security features: Two levels of authentication: o Cluster: SlaveNode to NameNode, TaskTracker to JobTracker o User: Unix-style file permissions NOTE: Access to the cluster can be restricted to Kerberos-authorization users. Sqoop and Pig support for security with no configuration required The capability to integrate with a Kerberos server Implementing Secure Hadoop Dell Confidential
45 Figure 14: Kerberos Authentication in Hadoop Dell Confidential
46 Appendix A : Bill of Materials PowerEdge C8000 Series For the PowerEdge C8000 series, the bill of materials is organized by chassis rather than node, to simplify ordering. Table 30: Master Chassis PowerEdge C8000 The master chassis includes the Administration Node, a Master Name Node, and an Edge Node SKU Component Group: 1 Quantity: PE C8000 Enclosure, Dual Power Supply PowerEdge C8000 Shipping SHIP,C8000,DAO No Factory Installed Operating System PowerEdge C8000 Sled Blank, Single Width - Quantity Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity PowerEdge C8000 Static Rails, Toolless ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Group: 2 Quantity: PowerEdge C8220X, Double Width Compute Sled Performance Optimized System ordered as part of Multipack order No Factory Installed Operating System, v Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile C2 LSI 2008 Mezz Card supporting up to 8 Hard Drives Cable for 2.5in Rear Hard Drives, PE-C8220X LSI 2008 SAS Controller Card, 6G, PE C8XXX Dual Processor Option Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W Thermal Heatsink Thermal Heatsink GB Memory (16x8GB),1600Mhz, Dual Ranked RDIMMs for 2 Processors Info, Memory for Dual Processor selection in HDD Enclosure, PE-C8220X Dell Confidential
47 in HDD Blank, PE-C8220X - Quantity Hard Drive Carrier 2.5 C Quantity TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive - Quantity Hot Plug Hard Drive Carrier, PE-C8220X Dell Hardware Limited Warranty Plus On Site Service Extended Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Plus On Site Service Initial Year On-Site Installation Declined Dell Confidential
48 Table 31: HA Chassis PowerEdge C8000 The HA Chassis includes a secondary name node, the HA node, and one data node SKU Component Group: 1 Quantity: PE C8000 Enclosure, Dual Power Supply PowerEdge C8000 Shipping SHIP,C8000,DAO No Factory Installed Operating System Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity PowerEdge C8000 Static Rails, Toolless ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Group: 2 Quantity: PowerEdge C8220X, Double Width Compute Sled Performance Optimized System ordered as part of Multipack order No Factory Installed Operating System, v Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile C2 LSI 2008 Mezz Card supporting up to 8 Hard Drives Cable for 2.5in Rear Hard Drives, PE-C8220X LSI 2008 SAS Controller Card, 6G, PE C8XXX Dual Processor Option Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W Thermal Heatsink Thermal Heatsink GB Memory (16x8GB),1600Mhz, Dual Ranked RDIMMs for 2 Processors Info, Memory for Dual Processor selection in HDD Enclosure, PE-C8220X in HDD Blank, PE-C8220X - Quantity Hard Drive Carrier 2.5 C Quantity Dell Confidential
49 TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive - Quantity Hot Plug Hard Drive Carrier, PE-C8220X Dell Hardware Limited Warranty Plus On Site Service Extended Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Plus On Site Service Initial Year On-Site Installation Declined Group: 3 Quantity: PowerEdge C8220X, Double Width Compute Sled Performance Optimized System ordered as part of Multipack order No Factory Installed Operating System, v LSI 9202 SAS Controller Cable Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile LSI E, LP, Controller, CE C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives SAS Controller Cable, PE-C8220X LSI 2008 SAS Controller Card, 6G, PE C8XXX Dual Processor Option Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W Thermal Heatsink Thermal Heatsink Memory Filler Blank Dimm Quantity GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs Info, Memory for Dual Processor selection Hard Drive Carrier 3.5 C Quantity TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity in HDD Enclosure, PE-C8220X Hard Drive,2.5 Rear Carrier,C Hard Drive,2.5 Rear Carrier,C TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive Hot Plug Hard Drive Carrier, PE-C8220X Dell Hardware Limited Warranty Plus On Site Service Extended Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year Dell Confidential
50 ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Plus On Site Service Initial Year On-Site Installation Declined Group: 4 Quantity: PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives System ordered as part of Multipack order No Factory Installed Operating System TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C Quantity ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Dell Confidential
51 The Data node chassis includes two data nodes. Table 32: Data Node Chassis PowerEdge C8000 SKU Component Group: 1 Quantity: PE C8000 Enclosure, Dual Power Supply PowerEdge C8000 Shipping SHIP,C8000,DAO No Factory Installed Operating System Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity PowerEdge C8000 Static Rails, Toolless ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Group: 2 Quantity: PowerEdge C8220X, Double Width Compute Sled Performance Optimized System ordered as part of Multipack order No Factory Installed Operating System, v LSI 9202 SAS Controller Cable Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC LSI E, LP, Controller, CE C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives SAS Controller Cable, PE-C8220X LSI 2008 SAS Controller Card, 6G, PE C8XXX Dual Processor Option Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W Thermal Heatsink Thermal Heatsink Memory Filler Blank Dimm Quantity GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs Info, Memory for Dual Processor selection Dell Confidential
52 Hard Drive Carrier 3.5 C Quantity TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity in HDD Enclosure, PE-C8220X Hard Drive,2.5 Rear Carrier,C Hard Drive,2.5 Rear Carrier,C TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive Hot Plug Hard Drive Carrier, PE-C8220X Dell Hardware Limited Warranty Plus On Site Service Extended Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Plus On Site Service Initial Year On-Site Installation Declined Group: 3 Quantity: PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives System ordered as part of Multipack order No Factory Installed Operating System TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C Quantity ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Dell Confidential
53 Table 33: Heavy Data Node Chassis PowerEdge C8000 The Heavy Data node chassis is used to configure four heavy data nodes in three chassis. Order two heavy data node chassis and one data node chassis for this configuration. SKU Component Group: 1 Quantity: PE C8000 Enclosure, Dual Power Supply PowerEdge C8000 Shipping SHIP,C8000,DAO No Factory Installed Operating System Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 - Quantity PowerEdge C8000 Static Rails, Toolless ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Group: 2 Quantity: PowerEdge C8220X, Double Width Compute Sled Performance Optimized System ordered as part of Multipack order No Factory Installed Operating System, v LSI 9202 SAS Controller Cable Intel X520 DA 10GB, Dual Port SFP+,PCIe-8 NIC LSI E, LP, Controller, CE C2B LSI 2008 Mezz Card plus Onboard Controller supporting up to 12 Hard Drives SAS Controller Cable, PE-C8220X LSI 2008 SAS Controller Card, 6G, PE C8XXX Dual Processor Option Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W, Max Mem 1600MHz Intel Xeon E GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 115W Thermal Heatsink Thermal Heatsink Memory Filler Blank Dimm Quantity GB Memory (8x8GB), 1600Mhz, Dual Ranked RDIMMs Info, Memory for Dual Processor selection Dell Confidential
54 Hard Drive Carrier 3.5 C Quantity TB,7.2K RPM,Near Line SAS,6Gps,3.5in, Hard Drive - Quantity in HDD Enclosure, PE-C8220X Hard Drive,2.5 Rear Carrier,C Hard Drive,2.5 Rear Carrier,C TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive Hot Plug Hard Drive Carrier, PE-C8220X Dell Hardware Limited Warranty Plus On Site Service Extended Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Plus On Site Service Initial Year On-Site Installation Declined Group: 3 Quantity: PowerEdge C8220XD Storage Sled, Single, 12 Hard Drives System ordered as part of Multipack order No Factory Installed Operating System TB, Near Line SAS 6Gps, 7.2K RPM, 3.5 in Hard Drive - Quantity Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C Quantity ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell ProSupport. For tech support, visit or call On-Site Installation Declined Dell Confidential
55 Appendix B : Bill of Materials PowerEdge R720xd Nodes SKU Table 34: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720xd Component PowerEdge R720xd Quantity: PowerEdge R720 Shipping Quantity: idrac7 Enterprise Quantity: Broadcom 5720 QP 1Gb Network Daughter Card Quantity: Chassis with up to 24, 2.5" Hard Drives Quantity: Bezel Quantity: Power Saving Dell Active Power Controller Quantity: Unconfigured RAID for H710P/H710/H310 (1-24 HDDs) Quantity: PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: DIMM Blanks for Systems with 2 Processors Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x12quantity: MHz RDIMMS Quantity: Performance Optimized Quantity: HDD, 1TB 7.2K SATA,3G,2.5,HP Quantity: Electronic System Documentation and OpenManage DVD Kit for R720 and R720xd Quantity: ReadyRails Sliding Rails With Cable Management Arm Quantity: Dual, Hot-plug, Redundant Power Supply Quantity: Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 Quantity: No Operating System Quantity: No Media Required Quantity: Dell Hardware Limited Warranty Plus On Site Service Initial Year Quantity: ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended Quantity: ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Quantity: Dell Hardware Limited Warranty Plus On Site Service Extended Year Quantity: ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year Quantity: Thank you choosing Dell ProSupport Quantity: On-Site Installation Declined Quantity: Proactive Maintenance Service Declined Quantity:1 Add for 10GbE networking support: Intel X520 DP 10Gb DA/SFP+ Server Adapter Dell Confidential
56 Appendix C : Bill of Materials PowerEdge R720 Nodes SKU Table 35: Active and Standby Name, Admin, Edge and HA Nodes PowerEdge R720 Component PowerEdge R PowerEdge R720 Shipping Risers with up to 4, x8 PCIe Slots + 2, x16 PCIe Slot idrac7 Enterprise Broadcom 5720 QP 1Gb Network Daughter Card " Chassis with up to 8 Hard Drives Bezel Power Saving Dell Active Power Controller No RAID for H310 (1-16 HDDs) PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: DIMM Blanks for Systems with 2 Processors Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x4 - Quantity MHz RDIMMS Performance Optimized TB 7.2K RPM Near-Line SAS 6Gbps 3.5in Hot-plug Hard Drive - Quantity No System Documentation, No OpenManage DVD Kit DVD+/-RW, SATA, INTERNAL ReadyRails Sliding Rails Without Cable Management Arm Dual, Hot-plug, Redundant Power Supply (1+1), 1100W Power Cord, C13 to C14, PDU Style, 12 Amps, 2 foot, Qty 1 - Quantity No Operating System No Media Required Basic Hardware Services: Business Hours (5X10) Next Business Day On Site Hardware Warranty Repair 2 Year Exten Dell Hardware Limited Warranty Plus On Site Service Extended Year Dell Hardware Limited Warranty Plus On Site Service Initial Year Basic Hardware Services: Business Hours (5X10) Next Business Day On Site Hardware Warranty Repair Initial Year Basic support covers SATA Hard Drive for 1 year only regardless of support duration on the system Dell Confidential
57 DECLINED CRITICAL BUSINESS SERVER OR STORAGE SOFTWARE SUPPORT PACKAGE-CALL YOUR DELL SALES REP IF UPGRADE NEED On-Site Installation Declined Proactive Maintenance Service Declined Add for 10GbE networking support: Intel X520 DP 10Gb DA/SFP+ Server Adapter Dell Confidential
58 Appendix D : SKU Bill of Materials PowerEdge R720xd Data node Table 36: Data node PowerEdge R720xd Component PowerEdge R720xd Quantity: PowerEdge R720 Shipping Quantity: idrac7 Enterprise Quantity: Broadcom 5720 QP 1Gb Network Daughter Card Quantity: Chassis with up to 24, 2.5" Hard Drives Quantity: Bezel Quantity: Power Saving Dell Active Power Controller Quantity: Unconfigured RAID for H710P/H710/H310 (1-24 HDDs) Quantity: PERC H710 Integrated RAID Controller, 512MB NV Cache Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: DIMM Blanks for Systems with 2 Processors Quantity: Intel Xeon E GHz, 15M Cache, 7.2GT/s QPI, Turbo, 6C, 95W Quantity: Heat Sink for PowerEdge R720 and R720xd Quantity: GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x4 Quantity: GB RDIMM,1600MHz,SV,DR,x8 Quantity: MHz RDIMMS Quantity: Performance Optimized Quantity: HDD, 1TB 7.2K SATA,3G,2.5,HP Quantity: Electronic System Documentation and OpenManage DVD Kit for R720 and R720xd ReadyRails Sliding Rails With Cable Management Arm Quantity: Dual, Hot-plug, Redundant Power Supply Quantity: Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 1 Quantity: No Operating System Quantity: No Media Required Quantity: Dell Hardware Limited Warranty Plus On Site Service Initial Year Quantity: ProSupport: Next Business Day Onsite Service After Problem Diagnosis, 2 Year Extended Quantity: ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year Quantity: Dell Hardware Limited Warranty Plus On Site Service Extended Year Quantity: ProSupport: Next Business Day Onsite Service After Problem Diagnosis, Initial Year Quantity: Thank you choosing Dell ProSupport Quantity: On-Site Installation Declined Quantity: Proactive Maintenance Service Declined Quantity:1 Add for 10GbE networking support: Intel X520 DP 10Gb DA/SFP+ Server Adapter Dell Confidential
59 Dell Confidential
60 Appendix E : Bill of Materials PowerEdge C6105 Table 37: Data node PowerEdge C6105 Component SKU PowerEdge C6105 2MB PowerEdge C6105 Chassis w/ 2 System Boards and support for 2.5-inch Hard Drives Operating System No Factory Installed Operating System Processor 2x AMD Opteron 4180, 6C 2.6GHz, 3M L2/6M L3, 1333Mhz Max Mem Memory 48GB Memory(4x4GB/4x8GB),1333 MHz, Dual Ranked RDIMMs for 2 Processors, Low Volt Documentation/Disks C6105 Documentation Hard Drive Controller and Configuration Add-in LSI 2008 SAS/SATA Mezz Card supporting up to 12, 2.5-inch HDs SAS/SATA - No RAID Rails C6100/C6105 Static Rails, Tool-less Hardware Support Services 3 Year ProSupport and NBD On-site Service [ ] [ ][ ] [ ][ ][ ] [ ][ ][ ][ ] [ ] [ ][ ][ ][ ] [ ] [ ][ [ ] [ ] [ ][ ] ] Power Supply [ ][ ][ ] Hard Drive 1TB 7.2K RPM SATA 3Gbps 2.5in Hot Plug Hard Drive Quantity 24 [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] [ ][ ][ ][ ] Dell Confidential
61 Appendix F : Bill of Materials Force10 Network Equipment Table 38: Network Equipment 1GbE Dell Force10 SKU Description Force10, Z9000, 2U, 32 x 40Gbe QSFP+ Ports, 1 AC Pwr Supply, Fan w/io Panel to PSU (Normal) Airflow (Non-Redundant Pwr) Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series Force10, Z9000, AC Power Supply for Chassis with IO Panel to PSU (Normal) Airflow Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, m Reach onom3/om Force10, Z9000 Cable Management Kit Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airflow Force10, S4810P, 48 x 10GbE SF P+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, PSU to IO Panel Airflow Force10, S4810, AC Power Supply, IO Panel to PSU Airflow Force10, S4810, AC Power Suppl y, PSU to IO Panel Airflow Force10, Cable, SFP+ to SFP+, 10GbE, Copper Twinax Direct Attach Cable, 2 Meters Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series Force10, Software, L3 Latest Version, S Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, m Reach onom3/om Force10, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, m Reach onom3/om Force10, Rear Rack Mounting Bracket, 4 Post, S Force10, S60, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x fans, P SU to IO Panel Airflow Force10, SFP+ Expansion Module, 2 x 10 GbE Ports, S60 Series (SFP+ optics required) Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series Force10, S60, AC Power Supply, PSU to IO Panel Airflow Force10, Rear Rack Mounting Bracket, Metal, 4 Post, S60 Force10 S60 2 port, 12G, Stacking module Force10 S60 12 Gig 60cms stacking cable Dell Confidential
62 Table 39: Network Equipment 10GbE Dell Force10 Quantity SKU Description Cluster Network Dell Networking, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach SFP+, Short Range, Optical Transceiver, LC Connector, 10Gb and 1Gb compatible(intel 10G SFP+) Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airflow Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series Dell Networking, Transceiver, SFP, 1000BASE-LX, 1310nm Wavelength, 10km Reach Force10, Rear Rack Mounting Bracket, 4 Post, S Force10, User Documentation for S4810, DAO/BCC SW Support,Force10 Software,3 Years ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, Initial Year Dell ProSupport. For tech support, visit or call Dell Hardware Limited Warranty Extended Year(s) ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, 2 Year Extended Dell Hardware Limited Warranty Initial Year On-Site Installation Declined ProSupport for, Force10,Layer 3 Enablement, 1 Year Force10, Software, iscsi-optimized Configuration, S Customer Kit, Dell Networking, Cable, QSFP+, 40GbE SFP+ Passive Copper Direct Attach Cable, 1 Meter Administration Network Force10, S55, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x Fans, IO Panel to PSU Airfl ( ) Forcd10 SFP+ Expansion Module 2x10 Gbe Ports Force10, S55, AC Power Supply, IO Panel to PSU Airflow ( ) Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series ( ) Force10, Rear Rack Mounting Bracket, 4 Post, S55 ( ) No Returns Allowed on Dell Force10 Switches ( ) Force10, User Documentation for S55/S60, DAO/BCC ( ) Dell Hardware Limited Warranty Initial Year ( ) Dell Hardware Limited Warranty Extended Year(s) ( ) Dell ProSupport. For tech support, visit or call ( Dell Confidential
63 3439) ProSupport: Next Business Day Parts Delivery, 2 Year Extended ( ) ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years ( ) SW Support,Force10 Software,5 Years ( ) ProSupport: Next Business Day Parts Delivery, Initial Year ( ) Force10, 5 Year Return To Depot Service, Base Warranty ( ) On-Site Installation Declined ( ) Network Equipment Notes The above list of SKUs includes switches that have specific air flow options. There are both I/O to PSU SKU numbers and PSU to I/O side options available for reverse air flow. Redundant FANs (other than the minimum supplied with chassis) should also be same direction as the base switch. The airflow cannot be reversed in the field at this time. The above list shows the AC power supplies only. All switch models are available in DC as well Dell Confidential
64 Appendix G : Bill of Materials Dell 6248 Network Equipment Table 40: Network Equipment Dell 6248 (Optional) Component Description SKU PowerConnect 6248P PowerConnect 6248, 48 GbE Ports, Managed Switch, 10GbE and Stacking Capable [ ] Front-end SFP Fiber Transceivers None - Modular Upgrade Bay 1: Modules Stacking Module, 48Gbps, Includes 1m Stacking Cable [ ] Modular Upgrade Bay 1: Optics None - Modular Upgrade Bay 2: Modules None - Modular Upgrade Bay 2: Optics None - Cables Stacking Cable, 3m [ ] External Redundant Power Supply Hardware Support Services Installation Services Asset Recovery Services None - 3 Year ProSupport and NBD On-site Service No Installation Services Selected None [ ] [ ] [ ] [ ] [ ] Cables (optional) Stacking Cable, 3m [ ] Dell Confidential
65 Appendix H : Bill of Materials Software and Support Software, training, and support SKUs change regularly, and are related to specific global regions. Please refer to the Hadoop Solution SKUs document on Dell SalesEdge (Dell internal link) or contact your Dell account representative for the latest information. The Sample Bill of Materials appendices include service and support SKUs for the United States. These SKUs need to be changed for other regions Dell Confidential
66 Appendix I : Dell Cloudera Solution Components Decoder Ring Source: 1. Hadoop: 2. Hadoop Distributed File System (HDFS): a distributed file system that provides high-throughput access to application data: 3. MapReduce: a software framework for distributed processing of large data sets on compute clusters: 4. Avro: a data serialization system 5. HBase: a scalable, distributed database that supports structured data storage for large tables. 6. Hive: a data warehouse infrastructure that provides data summarization and ad-hoc querying. 7. ZooKeeper: a high-performance coordination service for distributed applications. 8. Pig: a platform for analyzing large data sets that consists of high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. 9. Sqoop: a tool designed to import data from relational databases into Hadoop; Sqoop uses JDBC to connect to a database. 10. Flume: a distributed service for collecting, aggregating, and moving large amounts of log data; its architecture is based on streaming data flows. 11. Oozie: an open-source workflow engine and coordination service to manage data processing jobs within Hadoop. 12. Hue: a browser-based interface for interacting with Hadoop clusters. 13. Crowbar: a toolset provided, supported, and maintained by Dell for system deployment and configuration automation; Crowbar supports the bare-metal bring-up of new hardware and configuration management of existing hardware Dell Confidential
67 Appendix J : External References Nagios: Ganglia: ganglia.sourceforge.net/ Cloudera: Dell Confidential
68 Update History Changes in Version 2.1 The following changes have been made to this guide since the 2.0 release. Added support for the PowerEdge C8000 series Added support for 10Gb Networking on the PowerEdge C8000 and R720 series Updated for Cloudera CDH 4.1 and Cloudera Manager CM 4.1 Removed support for PowerEdge C2100 series (hardware end of life) Removed Support for PowerEdge C6100 series (hardware end of life) To Learn More For more information on the Dell Cloudera Solution, visit: Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer s statutory rights. Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc Dell Confidential
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
Dell Red Hat Cloud Solutions Reference Architecture Guide
Dell Red Hat Cloud Solutions Reference Architecture Guide A Dell Reference Architecture Guide April 2, 2014 THIS DOCUMENT IS PROVIDED UNDER AN APACHE 2 LICENSE Contents Contents... 2 Tables... 4 Figures...
Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance
Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance This white paper compares the performance of blade-to-blade network traffic between two enterprise blade solutions: the Dell
Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers
Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers White Paper rev. 2015-11-27 2015 FlashGrid Inc. 1 www.flashgrid.io Abstract Oracle Real Application Clusters (RAC)
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
Reference Architecture - Microsoft Exchange 2013 on Dell PowerEdge R730xd
Reference Architecture - Microsoft Exchange 2013 on Dell PowerEdge R730xd Reference Implementation for up to 8000 mailboxes Dell Global Solutions Engineering June 2015 A Dell Reference Architecture THIS
Apache Hadoop Cluster Configuration Guide
Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources
NetApp Solutions for Hadoop Reference Architecture
White Paper NetApp Solutions for Hadoop Reference Architecture Gus Horn, Iyer Venkatesan, NetApp April 2014 WP-7196 Abstract Today s businesses need to store, control, and analyze the unprecedented complexity,
SummitStack in the Data Center
SummitStack in the Data Center Abstract: This white paper describes the challenges in the virtualized server environment and the solution that Extreme Networks offers a highly virtualized, centrally manageable
Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0
Dell Virtual Remote Desktop Reference Architecture Technical White Paper Version 1.0 July 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES.
Brocade Solution for EMC VSPEX Server Virtualization
Reference Architecture Brocade Solution Blueprint Brocade Solution for EMC VSPEX Server Virtualization Microsoft Hyper-V for 50 & 100 Virtual Machines Enabled by Microsoft Hyper-V, Brocade ICX series switch,
SX1012: High Performance Small Scale Top-of-Rack Switch
WHITE PAPER August 2013 SX1012: High Performance Small Scale Top-of-Rack Switch Introduction...1 Smaller Footprint Equals Cost Savings...1 Pay As You Grow Strategy...1 Optimal ToR for Small-Scale Deployments...2
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches
Migration Guide Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches Migration Guide November 2013 2013 Cisco and/or its affiliates. All rights reserved. This document is
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
How To Build A Cisco Ukcsob420 M3 Blade Server
Data Sheet Cisco UCS B420 M3 Blade Server Product Overview The Cisco Unified Computing System (Cisco UCS ) combines Cisco UCS B-Series Blade Servers and C-Series Rack Servers with networking and storage
An Oracle White Paper October 2013. How to Connect Oracle Exadata to 10 G Networks Using Oracle s Ethernet Switches
An Oracle White Paper October 2013 How to Connect Oracle Exadata to 10 G Networks Using Oracle s Ethernet Switches Introduction... 1 Exadata Database Machine X3-2 Full Rack Configuration... 1 Multirack
APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS
APACHE HADOOP PLATFORM BIG DATA HARDWARE INFRASTRUCTURE SOLUTIONS 1 BIG DATA. BIG CHALLENGES. BIG OPPORTUNITY. How do you manage the VOLUME, VELOCITY & VARIABILITY of complex data streams in order to find
A Whitepaper on. Building Data Centers with Dell MXL Blade Switch
A Whitepaper on Building Data Centers with Dell MXL Blade Switch Product Management Dell Networking October 2012 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for
HP Cloudline Overview
HP Cloudline Overview Infrastructure for Cloud Service Providers Dave Peterson Hewlett-Packard Company Copyright 2015 2012 Hewlett-Packard Development Company, L.P. The information contained herein is
Microsoft Exchange 2010 on Dell Systems. Simple Distributed Configurations
Microsoft Exchange 2010 on Dell Systems Simple Distributed Configurations Global Solutions Engineering Dell Product Group Microsoft Exchange 2010 on Dell Systems Simple Distributed Configurations This
Private cloud computing advances
Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud
SX1024: The Ideal Multi-Purpose Top-of-Rack Switch
WHITE PAPER May 2013 SX1024: The Ideal Multi-Purpose Top-of-Rack Switch Introduction...1 Highest Server Density in a Rack...1 Storage in a Rack Enabler...2 Non-Blocking Rack Implementation...3 56GbE Uplink
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
SummitStack in the Data Center
SummitStack in the Data Center Abstract: This white paper describes the challenges in the virtualized server environment and the solution Extreme Networks offers a highly virtualized, centrally manageable
Cisco UCS B200 M3 Blade Server
Data Sheet Cisco UCS B200 M3 Blade Server Product Overview The Cisco Unified Computing System (Cisco UCS ) combines Cisco UCS B-Series Blade Servers and C-Series Rack Servers with networking and storage
How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (
Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...
Hortonworks Data Platform Reference Architecture
Hortonworks Data Platform Reference Architecture A PSSC Labs Reference Architecture Guide December 2014 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Cost Efficient VDI. XenDesktop 7 on Commodity Hardware
Cost Efficient VDI XenDesktop 7 on Commodity Hardware 1 Introduction An increasing number of enterprises are looking towards desktop virtualization to help them respond to rising IT costs, security concerns,
DVS Enterprise. Reference Architecture. VMware Horizon View Reference
DVS Enterprise Reference Architecture VMware Horizon View Reference THIS DOCUMENT IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)
WHITE PAPER Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) July 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table of Contents
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
Dell s SAP HANA Appliance
Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,
Reference Architecture for Dell VIS Self-Service Creator and VMware vsphere 4
Reference Architecture for Dell VIS Self-Service Creator and VMware vsphere 4 Solutions for Large Environments Virtualization Solutions Engineering Ryan Weldon and Tom Harrington THIS WHITE PAPER IS FOR
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Optimizing SQL Server Storage Performance with the PowerEdge R720
Optimizing SQL Server Storage Performance with the PowerEdge R720 Choosing the best storage solution for optimal database performance Luis Acosta Solutions Performance Analysis Group Joe Noyola Advanced
Intel RAID SSD Cache Controller RCS25ZB040
SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster
Cisco UCS B-Series M2 Blade Servers
Cisco UCS B-Series M2 Blade Servers Cisco Unified Computing System Overview The Cisco Unified Computing System is a next-generation data center platform that unites compute, network, storage access, and
Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family
Dell MD Family Modular storage The Dell MD storage family Dell MD Family Simplifying IT The MD Family simplifies IT by optimizing your data storage architecture and ensuring the availability of your data.
MESOS CB220. Cluster-in-a-Box. Network Storage Appliance. A Simple and Smart Way to Converged Storage with QCT MESOS CB220
MESOS CB220 Cluster-in-a-Box Network Storage Appliance A Simple and Smart Way to Converged Storage with QCT MESOS CB220 MESOS CB220 A Simple and Smart Way to Converged Storage Tailored for SMB storage
Cisco Nexus 5000 Series Switches: Decrease Data Center Costs with Consolidated I/O
Cisco Nexus 5000 Series Switches: Decrease Data Center Costs with Consolidated I/O Introduction Data centers are growing at an unprecedented rate, creating challenges for enterprises. Enterprise-level
Adobe Deploys Hadoop as a Service on VMware vsphere
Adobe Deploys Hadoop as a Service A TECHNICAL CASE STUDY APRIL 2015 Table of Contents A Technical Case Study.... 3 Background... 3 Why Virtualize Hadoop on vsphere?.... 3 The Adobe Marketing Cloud and
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
HP Reference Architecture for Hortonworks Data Platform on HP ProLiant SL4540 Gen8 Server
Technical white paper HP Reference Architecture for Hortonworks Data Platform on HP Server HP Converged Infrastructure with the Hortonworks Data Platform for Apache Hadoop Table of contents Executive summary...
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate [email protected] Solution Architect Dell Solution Centers Dave
Dell High Availability and Disaster Recovery Solutions Using Microsoft SQL Server 2012 AlwaysOn Availability Groups
Dell High Availability and Disaster Recovery Solutions Using Microsoft SQL Server 2012 AlwaysOn Availability Groups Dell servers and storage options available for AlwaysOn Availability Groups deployment.
MapR Enterprise Edition & Enterprise Database Edition
MapR Enterprise Edition & Enterprise Database Edition Reference Architecture A PSSC Labs Reference Architecture Guide June 2015 Introduction PSSC Labs continues to bring innovative compute server and cluster
Parallels Cloud Storage
Parallels Cloud Storage White Paper Best Practices for Configuring a Parallels Cloud Storage Cluster www.parallels.com Table of Contents Introduction... 3 How Parallels Cloud Storage Works... 3 Deploying
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload
UCS M-Series Modular Servers
UCS M-Series Modular Servers The Next Wave of UCS Innovation Marian Klas Cisco Systems June 2015 Cisco UCS - Powering Applications at Every Scale Edge-Scale Computing Cloud-Scale Computing Seamlessly Extend
27 22 00 Data Communications Hardware 27 22 16 Data Communications Storage and Backup 27 22 19 Data Communications Servers
Pivot3 has over 1200 customers across the globe that rely on purpose-built Pivot3 appliances for highcapacity video surveillance and high-iop virtual desktop environments. The company is the leading supplier
VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014
VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup
Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This
Microsoft SharePoint Server 2010
Microsoft SharePoint Server 2010 Small Farm Performance Study Dell SharePoint Solutions Ravikanth Chaganti and Quocdat Nguyen November 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY
Data Center Architecture with Panduit, Intel, and Cisco
Data Center Architecture with Panduit, Intel, and Cisco 0GBASE-T Application Note Integrating Panduit Category 6A Interconnects with the Cisco Nexus TM and Intel Ethernet Server Adapter X0-T 0 PANDUIT
The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer
The Future of Computing Cisco Unified Computing System Markus Kunstmann Channels Systems Engineer 2009 Cisco Systems, Inc. All rights reserved. Data Centers Are under Increasing Pressure Collaboration
Hyperscale Use Cases for Scaling Out with Flash. David Olszewski
Hyperscale Use Cases for Scaling Out with Flash David Olszewski Business challenges Performanc e Requireme nts Storage Budget Balance the IT requirements How can you get the best of both worlds? SLA Optimized
HUAWEI Tecal E6000 Blade Server
HUAWEI Tecal E6000 Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The HUAWEI Tecal E6000 is a new-generation server platform that guarantees comprehensive and powerful
A QUICK AND EASY GUIDE TO SETTING UP THE DELL POWEREDGE C8000
A QUICK AND EASY GUIDE TO SETTING UP THE DELL POWEREDGE C8000 A Principled Technologies setup guide commissioned by Dell Inc. TABLE OF CONTENTS Table of contents... 2 Introduction... 3 Dell 42U rack...3
Improving IT Operational Efficiency with a VMware vsphere Private Cloud on Lenovo Servers and Lenovo Storage SAN S3200
Improving IT Operational Efficiency with a VMware vsphere Private Cloud on Lenovo Servers and Lenovo Storage SAN S3200 Most organizations routinely utilize a server virtualization infrastructure to benefit
IBM System x reference architecture for Hadoop: MapR
IBM System x reference architecture for Hadoop: MapR May 2014 Beth L Hoffman and Billy Robinson (IBM) Andy Lerner and James Sun (MapR Technologies) Copyright IBM Corporation, 2014 Table of contents Introduction...
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
Unified Computing Systems
Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified
Get More Scalability and Flexibility for Big Data
Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and
Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters
Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters Highlights a Brocade-EMC solution with EMC CLARiiON, EMC Atmos, Brocade Fibre Channel (FC) switches, Brocade FC HBAs, and Brocade
Large Unstructured Data Storage in a Small Datacenter Footprint: Cisco UCS C3160 and Red Hat Gluster Storage 500-TB Solution
Performance White Paper Large Unstructured Data Storage in a Small Datacenter Footprint: Cisco UCS C3160 and Red Hat Gluster Storage 500-TB Solution Executive Summary Today, companies face scenarios that
1000-Channel IP System Architecture for DSS
Solution Blueprint Intel Core i5 Processor Intel Core i7 Processor Intel Xeon Processor Intel Digital Security Surveillance 1000-Channel IP System Architecture for DSS NUUO*, Qsan*, and Intel deliver a
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
Cisco Unified Computing System Hardware
Cisco Unified Computing System Hardware C22 M3 C24 M3 C220 M3 C220 M4 Form Factor 1RU 2RU 1RU 1RU Number of Sockets 2 2 2 2 Intel Xeon Processor Family E5-2400 and E5-2400 v2 E5-2600 E5-2600 v3 Processor
How To Design A Data Centre
DATA CENTRE TECHNOLOGIES & SERVICES RE-Solution Data Ltd Reach Recruit Resolve Refine 170 Greenford Road Harrow Middlesex HA1 3QX T +44 (0) 8450 031323 EXECUTIVE SUMMARY The purpose of a data centre is
Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions
WHITE PAPER May 2014 Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions Contents Executive Summary...2 Background...2 Network Configuration...3 Test
Dell Force10. Data Center Networking Product Portfolio. Z-Series, E-Series, C-Series, and S-Series
Dell Force10 Data Center Networking Product Portfolio Z-Series, E-Series, C-Series, and S-Series Dell Force10 Solutions High-performance solutions for the data center and beyond The Dell Force10 product
Dell PowerEdge Servers Portfolio Guide Dell PowerEdge servers and the power to do more:
Dell PowerEdge Servers Portfolio Guide Dell PowerEdge servers and the power to do more: Achieve more, maximize efficiency, and ensure business continuity with Dell PowerEdge servers, the foundation for
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER
Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER WHITE PAPER Building Cloud- Scale Networks Abstract TABLE OF CONTENTS Introduction 2 Open Fabric-Based
High Performance SQL Server with Storage Center 6.4 All Flash Array
High Performance SQL Server with Storage Center 6.4 All Flash Array Dell Storage November 2013 A Dell Compellent Technical White Paper Revisions Date November 2013 Description Initial release THIS WHITE
HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture
Technical white paper HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture Table of contents Executive summary... 2 Solution overview... 3 Solution components... 4 Storage... 5 Compute...
BUILDING A NEXT-GENERATION DATA CENTER
BUILDING A NEXT-GENERATION DATA CENTER Data center networking has changed significantly during the last few years with the introduction of 10 Gigabit Ethernet (10GE), unified fabrics, highspeed non-blocking
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop For next-generation Lenovo ThinkServer systems Lenovo Enterprise Product Group Version 1.0 December 2014 2014 Lenovo. All rights reserved. LENOVO
Achieving a High Performance OLTP Database using SQL Server and Dell PowerEdge R720 with Internal PCIe SSD Storage
Achieving a High Performance OLTP Database using SQL Server and Dell PowerEdge R720 with This Dell Technical White Paper discusses the OLTP performance benefit achieved on a SQL Server database using a
BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX
White Paper BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH Abstract This white paper explains how to configure a Mellanox SwitchX Series switch to bridge the external network of an EMC Isilon
The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.
Performance benefit of MAX5 for databases The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5 Vinay Kulkarni Kent Swalin IBM
10GBASE T for Broad 10_Gigabit Adoption in the Data Center
10GBASE T for Broad 10_Gigabit Adoption in the Data Center Contributors Carl G. Hansen, Intel Carrie Higbie, Siemon Yinglin (Frank) Yang, Commscope, Inc 1 Table of Contents 10Gigabit Ethernet: Drivers
A 10 GbE Network is the Backbone of the Virtual Data Center
A 10 GbE Network is the Backbone of the Virtual Data Center Contents... Introduction: The Network is at the Epicenter of the Data Center. 1 Section II: The Need for 10 GbE in the Data Center 2 Section
Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics. Qin Yin Fall Semester 2013
Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall Semester 2013 1 Walmart s Data Center 2 Amadeus Data Center 3 Google s Data Center 4 Data Center
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Big Data in the Enterprise: Network Design Considerations
White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It
Redundancy in enterprise storage networks using dual-domain SAS configurations
Redundancy in enterprise storage networks using dual-domain SAS configurations technology brief Abstract... 2 Introduction... 2 Why dual-domain SAS is important... 2 Single SAS domain... 3 Dual-domain
The Future of Cloud Networking. Idris T. Vasi
The Future of Cloud Networking Idris T. Vasi Cloud Computing and Cloud Networking What is Cloud Computing? An emerging computing paradigm where data and services reside in massively scalable data centers
Cisco SmartPlay Select. Cisco Global Data Center Promotional Program
Cisco SmartPlay Select Cisco Global Data Center Promotional Program SmartPlay Select Program Program Goals and Benefits UCS Promotional offers to accelerate new UCS customers acquisition by showcase Cisco
Pivot3 Reference Architecture for VMware View Version 1.03
Pivot3 Reference Architecture for VMware View Version 1.03 January 2012 Table of Contents Test and Document History... 2 Test Goals... 3 Reference Architecture Design... 4 Design Overview... 4 The Pivot3
