HP Reference Architecture for Cloudera Enterprise on ProLiant DL Servers
|
|
|
- Monica Robinson
- 10 years ago
- Views:
Transcription
1 Technical white paper HP Reference Architecture for Cloudera Enterprise on ProLiant DL Servers HP Converged Infrastructure with Cloudera Enterprise for Apache Hadoop Table of contents Executive summary... 2 Introduction... 2 Cloudera Enterprise overview... 3 Management services... 4 Worker services... 4 High-availability considerations... 4 Pre-deployment considerations... 5 Operating system... 5 Computation... 5 Memory... 6 Storage... 6 Network... 6 Switches... 7 HP Insight Cluster Management Utility... 8 Server selection Management nodes Worker nodes Reference Architectures Single Rack Reference Architecture Multi-Rack Reference Architecture Vertica and Hadoop Summary For more information... 24
2 Executive summary HP and Apache Hadoop allow you to derive new business insights from Big Data by providing a platform to store, manage and process data at scale. However, Apache Hadoop is complex to deploy, configure, manage and monitor. This white paper provides several performance optimized configurations for deploying Cloudera Enterprise clusters of varying sizes on HP infrastructure that provide a significant reduction in complexity and increase in value and performance. The configurations are based on Cloudera s Distribution including Apache Hadoop (CDH), specifically CDH4.3, Cloudera Manager 4.6 and the HP ProLiant DL Gen8 server platform. The configurations reflected in this document have been jointly designed and developed by HP and Cloudera to provide optimum computational performance for Hadoop and are also compatible with other CDH4.x releases. HP Big Data solutions provide best-in-class performance and availability, with integrated software, services, infrastructure, and management all delivered as one proven solution as described at hp.com/go/hadoop. In addition to the benefits described above, the solution in this white paper also includes the following features that are unique to HP: For analytics database, the HP Vertica connectors for Hadoop allow seamless integration of both structured and unstructured data providing end-to-end analytics thereby simplifying bi-directional data movement for Hadoop and reducing customer integration costs. Vertica is a leading real-time, scalable, analytical platform for structured data. For networking, the HP 5830AF-48G 1GbE Top of Rack switch and the HP 5900AF-48XG 10GbE Aggregation switch provide IRF Bonding and sflow which simplifies the management, monitoring and resiliency of the customer s Hadoop network. In addition, the 1GB and 9MB respective packet buffers increase Hadoop network performance by seamlessly handling burst scenarios such as Shuffle, Sort and Block Replication which are common in a Hadoop network. For servers, the HP ProLiant Gen8 DL360p and DL380e include: The HP Smart Array P420/P420i controller which provides increased 1 I/O throughput performance resulting in a significant performance increase for I/O bound Hadoop workloads (a common use case) and the flexibility for the customer to choose the desired amount of resilience in the Hadoop Cluster with either JBOD or various RAID configurations. Two sockets with the fastest 6 core processors and the Intel C600 Series Chipset, providing the performance required for fastest time to completion for CPU bound Hadoop workloads. 14 Large Form Factor Disks on the DL380e server increase the storage capacity of the Hadoop Distributed File System while at the same time improve I/O performance by reducing MapReduce slot contention for the same disk. The HP ilo Management Engine on the servers contains HP Integrated Lights-Out 4 (ilo 4) which features a complete set of embedded management features for HP Power/Cooling, Agentless Management, Active Health System, and Intelligent Provisioning which reduces node and cluster level administration costs for Hadoop. For management, HP Insight Cluster Management Utility (CMU) provides push-button scale out and provisioning with industry leading provisioning performance (deployment of 800 nodes in 30 minutes), reducing deployments from days to hours. In addition, CMU provides real-time and historical infrastructure and Hadoop monitoring with 3D visualizations allowing customers to easily characterize Hadoop workloads and cluster performance reducing complexity and improving system optimization leading to improved performance and reduced cost. HP Insight Management and HP Service Pack for ProLiant, allow for easy management of firmware and the server. All of these features reflect HP s balanced building blocks of servers, storage and networking, along with integrated management software and bundled support. Introduction This white paper has been created to assist in the rapid design and deployment of Cloudera Enterprise software on HP infrastructure for clusters of various sizes. It is also intended to concretely identify the software and hardware components required in a solution in order to simplify the procurement process. The recommended HP Software, HP ProLiant servers, and HP Networking switches and their respective configurations have been carefully tested with a variety of I/O, CPU, network, and memory bound workloads. The configurations included provide the best value for optimum MapReduce and HBase computational performance, resulting in a significant performance increase 2. Target audience: This document is intended for decision makers, system and solution architects, system administrators and experienced users who are interested in reducing the time to design or purchase an HP and Cloudera solution. An intermediate knowledge of Apache Hadoop and scale out infrastructure is recommended. Those already possessing expert knowledge about these topics may proceed directly to High-availability considerations on page 4. 1 Compared to the previous generation of Smart Array controllers 2 hp.com/hpinfo/newsroom/press_kits/2012/hpdiscover2012/hadoop_appliance_fact_sheet.pdf 2
3 Cloudera Enterprise overview Apache Hadoop is an open-source project administered by the Apache Software Foundation. Hadoop s contributors work for some of the world s biggest technology companies. That diverse, motivated community has produced a genuinely innovative platform for consolidating, combining and understanding large-scale data in order to better comprehend the data deluge. Enterprises today collect and generate more data than ever before. Relational and data warehouse products excel at OLAP and OLTP workloads over structured data. Hadoop, however, was designed to solve a different problem: the fast, reliable analysis of both structured data and complex data. As a result, many enterprises deploy Hadoop alongside their legacy IT systems, which allows them to combine old data and new data sets in powerful new ways. Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce. Hadoop runs on a collection of commodity, shared-nothing servers. You can add or remove servers in a Hadoop cluster at will; the system detects and compensates for hardware or system problems on any server. Hadoop, in other words, is self-healing. It can deliver data and can run largescale, high-performance processing jobs in spite of system changes or failures. Originally developed and employed by dominant web companies like Yahoo and Facebook, Hadoop is now widely used in finance, technology, telecom, media and entertainment, government, research institutions and other markets with significant data. With Hadoop, enterprises can easily explore complex data using custom analyses tailored to their information and questions. Cloudera is an active contributor to the Hadoop project and provides an enterprise-ready, 100% open source distribution that includes Hadoop and related projects. Cloudera s distribution bundles the innovative work of a global open-source community; this includes critical bug fixes and important new features from the public development repository and applies all this to a stable version of the source code. In short, Cloudera integrates the most popular projects related to Hadoop into a single package, which is run through a suite of rigorous tests to ensure reliability during production. In addition, Cloudera Enterprise is a subscription offering which enables data-driven enterprises to run Apache Hadoop environments in production cost effectively with repeatable success. Comprised of Cloudera Support and Cloudera Manager, a software layer that delivers deep visibility into and across Hadoop clusters, Cloudera Enterprise gives Hadoop operators an efficient way to precisely provision and manage cluster resources. It also allows IT shops to apply familiar business metrics such as measurable SLAs and chargebacks to Hadoop environments so they can run at optimal utilization. Built-in predictive capabilities anticipate shifts in the Hadoop infrastructure, ensuring reliable operation. Cloudera Enterprise makes it easy to run open source Hadoop in production: Simplify and accelerate Hadoop deployment Reduce the costs and risks of adopting Hadoop in production Reliably operate Hadoop in production with repeatable success Apply SLAs to Hadoop Increase control over Hadoop cluster provisioning and management For detailed information on Cloudera Enterprise, please see cloudera.com/products-services/enterprise/ Figure 1. Cloudera Enterprise 3
4 Typically, Hadoop clusters are either used for batch MapReduce analysis of data or they are used to run HBase, which is an online distributed store for reading and writing structured data. It is up to the user to choose which services to install and configure. We recommend that you run either MapReduce or HBase on your worker nodes as running both the master (JobTracker and HBaseMaster) and worker services (TaskTracker and HBaseRegionServer) will result in both services competing for the same resources, thereby resulting in degraded performance. The platform functions within Cloudera Enterprise are provided by two key groups of services, namely the Management and Worker Services. Management Services manage the cluster and coordinate the jobs whereas Worker Services are responsible for the actual execution of work on the individual scale out nodes. The two tables below specify which services are management services and which services are workers services. Each table contains two columns. The first column is the description of the service and the second column specifies the number of nodes the service can be distributed to. The Reference Architectures (RAs) we provide in this document will map the Management and Worker services onto HP infrastructure for clusters of varying sizes. The RAs factor in the scalability requirements for each service so this is not something you will need to manage. Management services Table 1. Cloudera Management Services Service Maximum Distribution across Nodes Cloudera Manager 1 Hue Server 1 JobTracker 1 HBase Master Varies NameNode 2 Oozie 1 ZooKeeper Flume Varies Varies Worker services Table 2. Cloudera Enterprise Worker Services Service DataNode TaskTracker HBase RegionServer Maximum Distribution across Nodes Most or all nodes Most or all nodes Varies High-availability considerations The following are some of the high-availability features considered in this reference architecture configuration: Hadoop NameNode HA The configurations in this white paper utilize quorum-based journaling high-availability features in Cloudera CDH4. For this feature, servers should have similar I/O subsystems and server profiles so that each NameNode server could potentially take the role of another. Another reason to have similar configurations is to ensure that ZooKeeper s quorum algorithm is not affected by a machine in the quorum that cannot make a decision as fast as its quorum peers. OS availability and reliability For the reliability of the server, the OS disk is configured in a RAID 0+1 configuration thus preventing failure of the system from OS hard disk failures. Network reliability The reference architecture configuration uses two HP 5830AF-48G switches for redundancy, resiliency and scalability in using Intelligent Resilient Framework (IRF) bonding. We recommend using redundant power supplies. For Aggregation switches we used two HP 5900AF-48XG switches IRF bonded. 4
5 Power supply To ensure the servers and racks have adequate power redundancy we recommend that each server have a backup power supply, and each rack have at least two Power Distribution Units (PDUs). Pre-deployment considerations There are a number of key factors you should consider prior to designing and deploying a Hadoop Cluster. The following subsections articulate the design decisions in creating the baseline configurations for the reference architectures. The rationale provided includes the necessary information for you to take the configurations and modify them to suit a particular custom scenario. Table 3. Pre-deployment considerations Functional Component Operating System Computation Memory Storage Network Value Improves Availability and Reliability Ability to balance Price with Performance Ability to balance Price with Capacity and Performance Ability to balance Price with Capacity and Performance Ability to balance Price with Performance Operating system Cloudera Manager 4.6 supports only the following 64-bit operating systems: For Red Hat systems, Cloudera provides 64-bit packages for Red Hat Enterprise Linux (RHEL) 5 and Red Hat Enterprise Linux 6. Compatible versions in RHEL 6 are RHEL 6.2 and RHEL 6.4. Cloudera recommends using update 7 or later for Red Hat Enterprise Linux 5. For SUSE systems, Cloudera provides 64-bit packages for SUSE Linux Enterprise Server 11 (SLES 11). Service pack 1 or later is required. CDH 4.3 supports the following 32-bit and 64-bit operating systems: For Ubuntu systems, Cloudera provides 64-bit packages for Lucid (10.04) LTS and Precise (12.04) LTS For Debian systems, Cloudera provides 64-bit packages for Squeeze (6.0.2) For Red Hat systems, Cloudera provides 32-bit packages for Red Hat Enterprise Linux 6.2 and CentOS 6.2, and 64-bit packages for Red Hat Enterprise Linux 5.7, 6.2, 6.4 and CentOS 5.7, 6.2, 6.4. Cloudera recommends using update 7 or later of Red Hat Enterprise Linux 5. For SUSE systems, Cloudera provides 64-bit packages for SUSE Linux Enterprise Server 11 (SLES 11). Service pack 1 or later is required. Recommendation HP recommends using a 64-bit operating system to avoid constraining the amount of memory that can be used on worker nodes. 64-bit Red Hat Enterprise Linux 6.2 or greater is recommended due to better ecosystem support, more comprehensive functionality for components such as RAID controllers and compatibility with HP Insight CMU. The Reference Architectures listed in this document were tested with 64-bit Red Hat Enterprise Linux 6.4. Computation MapReduce slots are configured on a per server basis and are decided upon via an examination of the resources available on the server and how they can cater to the requirements of the tasks involved in a Hadoop Job. The processing or computational capacity of a Hadoop cluster is determined by the aggregate number of MapReduce slots available across all the worker nodes. Employing Hyper-Threading increases your effective core count, potentially allowing you to configure more MapReduce slots. Refer to the Storage section below to see how I/O performance issues arise from sub-optimal disk to core ratios (too many slots and too few disks). 5
6 Recommendation For customers where computation performance is key, HP recommends higher CPU powered DL380p servers for worker nodes with 128 GB RAM. Memory Use of Error Correcting Memory (ECC) is a practical requirement for Apache Hadoop and is standard on all HP ProLiant servers. Memory requirements differ between the management nodes and the worker nodes. The management nodes typically run one or more memory intensive management processes and therefore have higher memory requirements. Worker nodes need sufficient memory to manage the TaskTracker and DataNode processes in addition to the sum of all the memory assigned to each of the MapReduce slots. If you have a memory bound MapReduce job we recommend that you increase the amount of memory on all the worker nodes. In addition, the cluster can also be used for HBase which is very memory intensive. Best Practice It is important to saturate all the memory channels available to ensure optimal use of the memory bandwidth. For example, on a two socket processor with six memory channels available per server one would typically fully populate the channels with 8GB DIMMs resulting in a configuration of 48GB or 96GB (using 16GB DIMMs) of memory per server. Storage Fundamentally, Hadoop is designed to achieve performance and scalability by moving the compute activity to the data. It does this by distributing the Hadoop job to worker nodes close to their data, ideally running the tasks against data on local disks. Best Practice Given the architecture of Hadoop, the data storage requirements for the worker nodes are best met by direct attached storage (DAS) in a Just a Bunch of Disks (JBOD) configuration and not as DAS with RAID or Network Attached Storage (NAS). There are several factors to consider and balance when determining the number of disks a Hadoop worker node requires. Storage capacity The number of disks and their corresponding storage capacity determines the total amount of the HDFS storage capacity for your cluster. Redundancy Hadoop ensures that a certain number of block copies are consistently available. This number is configurable in the block replication factor setting, which is typically set to three. If a Hadoop worker node goes down, Hadoop will replicate the blocks that had been on that server onto other servers in the cluster to maintain the consistency of the number of block copies. For example, if the NIC (Network Interface Card) on a server with 16 TB of block data fails, 16 TB of block data will be replicated between other servers in the cluster to ensure the appropriate amount of replicas exist. Furthermore, the failure of a non-redundant TOR (Top of Rack) switch will generate even more replication traffic. One needs to ensure that the performance of the network is sufficient to adequately handle MapReduce shuffle and sort phases occurring at the same time as block replication. I/O performance Each worker node has a certain number of MapReduce slots available for processing Hadoop tasks. Each slot operates on one block of data at a time. The more disks you have, the less likely it is that you will have multiple tasks accessing a given disk at the same time. This avoids queued I/O requests and incurring the resulting I/O performance degradation. Disk configuration The management nodes are configured differently from the worker nodes because the management processes are generally not redundant and as scalable as the worker processes. For management nodes, storage reliability is therefore important and SAS drives are recommended. For worker nodes, one has the choice of SAS or SATA and as with any component there is a cost/performance tradeoff. If performance and reliability are important, we recommend SAS MDL disks otherwise we recommend SATA MDL disks. Specific details around disk and RAID configurations will be provided in the Server selection section. Network Configuring a single Top of Rack (TOR) switch per rack introduces a single point of failure for each rack. In a multi-rack system such a failure will result in a flood of network traffic as Hadoop rebalances storage, and in a single-rack system such a failure brings down the whole cluster. Consequently, configuring two TOR switches per rack is recommended for all 6
7 production configurations as it provides an additional measure of redundancy. This can be further improved by configuring link aggregation between the switches. The most desirable way to configure link aggregation is by bonding multiple physical NICs from a server to one TOR switch with multiple physical NICs from the same server to the other TOR switch in its rack. When done properly, this allows the bandwidth of both links to be used. If either of the switches fail, the servers will still have full network functionality, but with the performance of only a single link. Not all switches have the ability to do link aggregation from individual servers to multiple switches; however, the HP 5830AF-48G switch supports this through HP s Intelligent Resilient Framework (IRF) technology. In addition, switch failures can be further mitigated by incorporating dual power supplies for the switches. Best Practice Hadoop is rack-aware and tries to limit the amount of network traffic between racks. The bandwidth and latency provided by 4x bonded 1 Gigabit Ethernet (GbE) connection from worker nodes to the TOR switch is adequate for most Hadoop configurations. Multi-Rack Hadoop clusters, that are not using IRF bonding for inter-rack traffic, will benefit from having TOR switches connected by 10 GbE uplinks to core aggregation switches. Large Hadoop clusters introduce multiple issues that are not typically present in small to medium sized clusters. To understand the reasons for this, it is helpful to review the network activity associated with running Hadoop jobs and with exception events such as server failure. During the map phase of Hadoop jobs that utilize the HDFS, the majority of tasks reference data on the server that executes the task (node-local). For those tasks that must access data remotely, the data is usually on other servers in the same rack (rack-local). Only a small percentage of tasks need to access data from remote racks. Although the amount of remote-rack accesses increases for larger clusters, it is expected to put a relatively small load on the TOR and core switches. Best Practice During the shuffle phase, the intermediate data has to be pulled by the reduce tasks from mapper output files across the cluster. While network load can be reduced if partitioners and combiners are used, it is possible that the shuffle phase will place the core and TOR switches under a large traffic load. Consequently, large clusters will benefit from having TOR switches with packet buffering and connected by 10 GbE uplinks to core aggregation switches in order to accommodate this load. Each reduce task can concurrently request data from a default of five mapper output files. Thus, there is the possibility that servers will deliver more data than their network connections can handle which will result in dropped packets and can lead to a collapse in traffic throughput. This is why we recommend TOR switches with deep packet buffering. Switches Hadoop clusters contain two types of switches, namely Aggregation switches and Top of Rack switches. Top of Rack switches route the traffic between the nodes in each rack and Aggregation switches route the traffic between the racks. Aggregation switches The 5900AF-48XG-4QSFP+10GbE switch is an ideal aggregation switch as it is well suited to handle large volumes of interrack traffic and scenarios such as block replication occurring at the same time as a MapReduce shuffle and sort phase. The switch has better connectivity with GbE and 4x 40 GbE ports than HP 5800 switches, aggregation switch redundancy and high availability (HA) support with IRF bonding ports. For more information on the HP 5900AF-48XG-4QSFP+, please see The configuration for the HP 5900AF-48XG switch is provided below. Figure 2. HP 5900AF-48XG-4QSFP+Aggregation switch 7
8 Table 4. HP 5900AF-48XG-4QSFP+ Switch (JC772A) Single Aggregation Switch options Qty Part Number Description 2 JC772A HP 5900AF-48XG-4QSFP+ Switch 2 JC680A HP 58x0AF 650W AC Power Supply 2 JC683A HP 58xOAF Front (port-side) to Back (power-side) Airflow Fan Tray 24/48 JD092B HP X130 10G SFP+ LC SR Transceiver 24/48 QK735A HP 15m Premier Flex LC/LC Optical Cable Top of Rack (TOR) switches The HP 5830AF-48G is an ideal TOR switch and has a 1 GB packet buffer depth for very deep buffering, resiliency and high availability, scalability support, forty eight 1GbE ports, two 10GbE uplinks, and the option for adding two more 10GbE ports. A dedicated management switch for ilo traffic is not required as the ProLiant DL360p Gen8 and DL380e Gen8 are able to share ilo traffic over NIC 1. The volume of ilo traffic is minimal and does not degrade performance over that port. For more information on the HP 5830AF-48G switch, please see The configuration for the HP 5830AF-48G switch is provided below. Figure 3. HP 5830AF-48G Top of Rack (TOR) switch Table 5. HP 5830AF-48G Single Switch options Qty Part Number Description 2 JC691A HP 5830AF-48G Switch with 1 Interface Slot 2 JC680A HP 58x0AF 650W AC Power Supply 2 JD368B HP 5500/ port 10GbE SFP+ Module 2 JC692A HP 5830AF-48G Back(power)-Front(prt) Fan Tray 1 JD095C HP X240 10G SFP+ SFP+ 0.65m DAC Cable HP Insight Cluster Management Utility HP Insight Cluster Management Utility (CMU) is an efficient and robust hyper-scale cluster lifecycle management framework and suite of tools for large Linux clusters such as those found in High Performance Computing (HPC) and Big Data environments. A simple graphical interface enables an at-a-glance real-time or 3D historical view of the entire cluster for both infrastructure and application (including Hadoop) metrics, provides frictionless scalable remote management and analysis, and allows rapid provisioning of software to all nodes of the system. Insight CMU makes the management of a cluster more user friendly, efficient, and error free than if it were being managed by scripts, or on a node-by-node basis. Insight CMU offers full support for ilo 2, ilo 3, ilo 4 and LO100i adapters on all ProLiant servers in the cluster. 8
9 Best Practice HP Insight CMU allows one to easily correlate Hadoop metrics with cluster infrastructure metrics, such as CPU Utilization, Network Transmit/Receive, Memory Utilization and I/O Read/Write. This allows characterization of Hadoop workloads and optimization of the system thereby improving the performance of the Hadoop Cluster. CMU Time View Metric Visualizations will help you understand, based on your workloads, whether your cluster needs more memory, a faster network or processors with faster clock speeds. In addition, Insight CMU also greatly simplifies the deployment of Hadoop, with its ability to create a Golden Image from a Node and then deploy that image to up to 4000 Nodes. Insight CMU is able to deploy 800 nodes in 30 minutes. Insight CMU is highly flexible and customizable, offers both GUI and CLI interfaces, and is being used to deploy a range of software environments, from simple compute farms to highly customized, application-specific configurations. Insight CMU is available for HP ProLiant and HP BladeSystem servers, with Red Hat Enterprise Linux and Novell SUSE Linux operating systems, including Red Hat Enterprise Linux, SUSE Linux Enterprise, CentOS, and Ubuntu. Insight CMU also includes options for monitoring graphical processing units (GPUs) and for installing GPU drivers and software. For more information, please see hp.com/go/cmu. Table 6. HP Insight CMU options Qty Part Number Description 1 QL803B HP Insight CMU 1yr 24x7 Flex Lic 1 QL803BAE HP Insight CMU 1yr 24x7 Flex E-LTU 1 BD476A HP Insight CMU 3yr 24x7 Flex Lic 1 BD476AAE HP Insight CMU 3yr 24x7 Flex E-LTU 1 BD477A HP Insight CMU Media Figure 4. HP Insight CMU Interface real-time view 9
10 Figure 5. HP Insight CMU Interface real-time view BIOS settings Figure 6. HP Insight CMU Interface Time View Server selection Depending on the size of the cluster, a Hadoop deployment consists of one or more nodes running management services and a quantity of worker nodes. We have designed this reference architecture so that regardless of the size of the cluster, the server used for the management nodes and the server used for the worker nodes remains consistent. This section 10
11 specifies which server to use and the rationale behind it. The Reference Architectures section will provide topologies for the deployment of management and worker nodes for single and multi-rack clusters. Management nodes Management services are not distributed across as many nodes as the services that run on the worker nodes and therefore benefit from a server that contains redundant fans and power supplies, as well as an array controller supporting a variety of RAID schemes and SAS direct attached storage. In addition, the management services are memory and CPU intensive; therefore, a server capable of supporting a large amount of memory is also required. Management nodes do not participate in storing data for the HDFS and have much lower storage capacity requirements than worker nodes and thus a 2U server with a large amount of disks is not required. Best Practice The configurations reflected in this white paper are also cognizant of the high availability feature in Cloudera CDH4. For this feature, servers should have similar I/O subsystems and server profiles so that each management server could potentially take the role of another. Another reason to have similar configurations is to ensure that ZooKeeper s quorum algorithm is not affected by a machine in the quorum that cannot make a decision as fast as its quorum peers. This section contains 4 subsections: Server platform Management node JobTracker server NameNode server Server platform: HP ProLiant DL360p Gen8 The HP ProLiant DL360p Gen8 (1U) is an excellent choice as the server platform for the management nodes. Figure 6. HP ProLiant DL360p Gen8 Server Processor configuration The configuration features two sockets with 6 core processors and the Intel C600 Series chipset, which provide 12 physical cores and 24 Hyper-Threaded cores per server. We recommend that Hyper-Threading be turned on. The reference architecture was tested using processors with 6 cores for the management servers with the JobTracker, NameNode and Cloudera Manager services. Furthermore, the configurations for these servers are designed to be able to handle an increasing load as your Hadoop cluster grows so one needs to ensure the right processing capacity is available to begin with. Drive configuration Apache Hadoop does not provide software redundancy for the management servers of a Hadoop cluster the same way it does for the workers and thus RAID is appropriate. The P420i Smart Array Controller is specified to drive eight 900GB 2.5 SAS disks on the Management node and four 900GB 2.5 SAS disks on the JobTracker and NameNode servers. The Management node has more disks than the JobTracker and NameNode servers due to the fact that the Management node needs to have extra storage capacity for RAID Mirroring, the Cloudera Manager Databases, and logs, as well as to act as a multi-homed staging server for data import and export out of the HDFS. Hot pluggable drives are specified so that drives can be replaced without restarting the server. Due to this design, one should configure the Gen8 P420i controller to apply the following RAID schemes: Management node: 4 Disks with RAID 1+0 for OS and PostgreSQL database, 4 Disks with RAID 5 for data staging JobTracker and NameNode Servers: 4 Disks with RAID 1+0 for OS 11
12 Best Practice The Gen8 P420i controller provides two port connectors per controller with each containing 4 SAS links. Each drive cage for the DL360p contains 8 disks and thus each disk has a dedicated SAS link which ensures the server provides the maximum throughput that each drive can give you. For a performance oriented solution, we recommend SAS drives as they offer a significant read and write throughput performance enhancement over SATA disks. Memory configuration Servers running management services such as the HBaseMaster, JobTracker, NameNode and Cloudera Manager should have sufficient memory as they can be memory intensive. When configuring memory, one should always attempt to populate all the memory channels available to ensure optimum performance. The dual Intel Xeon E GHz processors in the HP ProLiant DL360p Gen8 have 4 memory channels per processor which equates to 8 channels per server. The configurations for the management servers were tested with 64GB of RAM, which equated to eight 8GB DIMMS. Network configuration The HP ProLiant DL360p Gen8 is designed for network connectivity to be provided via a FlexibleLOM. The FlexibleLOM can be ordered in a 4 x 1GbE NIC configuration or a 2 x 10GbE NIC configuration. This Reference Architecture was tested using the 4 x 1GbE NIC configuration (as specified in the server configuration below). Best Practice For each management server we recommend bonding and cabling only two of the 1GbE NICs to create a single bonded pair which will provide 2GbE of throughput as well as a measure of NIC redundancy. In the reference architecture configurations later in the document you will notice that we use two IRF Bonded switches. In order to ensure the best level of redundancy we recommend cabling NIC 1 to Switch 1 and NIC 2 to Switch 2. Note Part numbers are at time of publication and subject to change. The bill of materials does not include complete support options or other rack and power requirements. If you have questions regarding ordering, please consult with your HP Reseller or HP Sales Representative for more details. hp.com/large/contact/enterprise/index.html Table 7. The HP ProLiant DL360p Gen8 Server Configuration Qty Part Number Description B21 HP DL360p Gen8 8-SFF CTO Chassis L21 HP DL360p Gen8 E FIO Kit B21 HP DL360p Gen8 E Kit B21 HP 8GB 1Rx4 PC R-11 Kit B21 HP 900GB 6G SAS 10K 2.5in SC ENT HDD (Note: Management node needs 8) B21 HP Ethernet 1GbE 4P 331FLR FIO Adapter B21 HP 512MB FBWC for P-Series Smart Array B21 HP 460W CS Gold Hot Plug Power Supply Kit B21 HP 1U SFF BB Gen8 Rail Kit 1 NA ProLiant DL36x(p) HW Support 12
13 Management node The Management node hosts the applications that submit jobs to the Hadoop Cluster. We recommend that you install with the following software components: Table 8. Management node Software Software RHEL 6.4 HP Insight CMU 7.1 Oracle JDK 1.6.0_31 PostgreSQL Cloudera Manager 4.6 Cloudera Hue Server Flume NameNode HA Apache Pig and/or Apache Hive from CDH4 Beeswax ZooKeeper Description Recommended Operating System Infrastructure Deployment, Management, and Monitoring Java Development Kit Database Server for Cloudera Manager Cloudera Hadoop Cluster Management Software Web Interface for Cloudera Applications Flume NameNode HA (Journal Node) Analytical interfaces to the Hadoop Cluster HUE application to run queries on Hive Cluster coordination service Please see the following link for the Cloudera Manager and PostgreSQL Installation guide, cloudera.com/content/support/en/documentation/manager-enterprise/cloudera-manager-enterprise-v4-latest.html. The Management node contains the following base configuration: Dual Six-Core Intel E GHz Processors P420i Smart Array Controller Eight 900GB SFF SAS 10K RPM disks 64 GB DDR3 Memory 4 x 1GbE FlexibleLOM NICs JobTracker server The JobTracker server contains the following software. Please see the following link for more information on installing and configuring the JobTracker and NameNode HA: cloudera.com/content/support/en/documentation/manager-enterprise/cloudera-manager-enterprise-v4-latest.html Table 9. JobTracker Server Software Software RHEL 6.4 Oracle JDK 1.6.0_31 JobTracker NameNode HA Oozie HBaseMaster Description Recommended Operating System Java Development Kit The JobTracker for the Hadoop Cluster NameNode HA (Failover Controller, Journal Node, NameNode Standby) Oozie Workflow scheduler service The HBase Master for the Hadoop Cluster (Only if running HBase) 13
14 ZooKeeper Flume Cluster coordination service Flume The JobTracker server contains the following base configuration: Dual Six-Core Intel E GHz Processors Four 900GB SFF SAS 10K RPM disks 64 GB DDR3 Memory 4 x 1GbE FlexibleLOM NICs 1 x P420i Smart Array Controller NameNode server The NameNode server contains the following software. Please see the following link for more information on installing and configuring the NameNode: cloudera.com/content/support/en/documentation/manager-enterprise/cloudera-manager-enterprise-v4-latest.html Table 10. NameNode Server Software Software RHEL 6.4 Oracle JDK 1.6.0_31 NameNode HA Flume HBMaster ZooKeeper Description Recommended Operating System Java Development Kit The NameNode for the Hadoop Cluster (NameNode Active, Journal node, Failover Controller) Flume agent HBase Master (Only if running HBase) Cluster coordination service The NameNode server contains the following base configuration: Dual Six-Core Intel E GHz Processors Four 900GB SFF SAS 10K RPM disks 64 GB DDR3 Memory 4 x 1GbE FlexibleLOM NICs 1 x P420i Smart Array Controller Worker nodes The worker nodes run the TaskTracker (or HBaseRegionServer) and DataNode processes and thus storage capacity and performance are important factors. Server platform: HP ProLiant DL380e Gen8 The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the worker nodes. For ease of management we recommend a homogenous server infrastructure for your worker nodes. Figure 7. HP ProLiant DL380e Gen8 Server 14
15 Processor configuration The configuration features two sockets with 6 core processors and the Intel C600 Series chipset which provide 12 physical or 24 Hyper-Threaded cores per server. Hadoop manages the amount of work each server is able to undertake via the amount of MapReduce slots configured for that server. The more cores available to the server, the more MapReduce slots can be configured for the server (see the Computation section for more detail). We recommend that Hyper-Threading be turned on. Drive configuration Redundancy is built into the Apache Hadoop architecture and thus there is no need for RAID schemes to improve redundancy on the worker nodes as it is all coordinated and managed by Hadoop. Drives should use a Just a Bunch of Disks (JBOD) configuration, which can be achieved with the HP Smart Array P420 controller by configuring each individual disk as a separate RAID 0 volume. Additionally array acceleration features on the P420 should be turned off for the RAID 0 data volumes. The DL380e has the option to add a rear disk cage with 2 LFF drives which allows the placement of the OS drive in RAID1, leaving the front 12 LFF drives specifically for Hadoop Data. Best Practice For OS drives, it is recommended to use two 500GB SATA MDL LFF disks and use the HP Smart Array P420 Controller to configure those with a RAID 1 mirrored OS and Hadoop runtime. Other disks will be dedicated for Hadoop Data. This provides additional measures of redundancy on the worker nodes. We do not recommend sharing drives that contain the OS and Hadoop runtimes with drives that contain the temporary MapReduce data and the HDFS block data as it results in degraded I/O performance. Performance The HP Smart Array P420 controller provides two port connectors per controller with each containing 4 SAS links. The front drive cage for the DL380e contains 12 disks; thus, a total of 8 SAS links ensures the server provides the maximum throughput that each drive can give. For a performance oriented solution, we recommend SAS drives as they offer a significant read and write throughput performance enhancement over SATA disks. Core to disk ratio The more drives a server contains, the more efficiently it can service I/O requests because it reduces the likelihood of multiple threads contending for the same drive which can result in interleaved I/O and degraded performance. DataNode settings By default, the failure of a single dfs.data.dir or dfs.datanode.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed. To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir or dfs.datanode.data.dir directories; use the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup; in this example the DataNode will start up as long as no more than three directories have failed. Memory configuration Servers running the worker node processes should have sufficient memory for either HBase or for the amount of MapReduce Slots configured on the server. The Intel Xeon E has 3 memory channels per processor. When configuring memory, one should always attempt to populate all the memory channels available to ensure optimum performance. Best Practice To ensure optimal memory performance and bandwidth, we recommend using 8GB DIMMs to populate each of the 3 memory channels on the processor which will provide an aggregate of 48GB of RAM. We used 8GB DIMMs (which gave us an aggregate of 48GB of RAM per server) in our testing. See Figure 8 for the memory slots to populate when using 8GB memory DIMMS. 15
16 Figure 8. DL380e 8GB DIMM memory (part number example B21) configuration recommendation Network configuration For 1GbE networks, we recommend that the four 1GbE NICs be bonded to improve throughput performance to 4 Gb/s and thereby improve performance. In addition, in the reference architecture configurations later on in the document you will notice that we use two IRF Bonded switches. In order to ensure the best level of redundancy we recommend cabling NIC 1 and 3 to Switch 1 and NIC 2 and 4 to Switch 2. The worker node contains the following software. Please see the following link for more information on installing and configuring the TaskTracker (or HBaseRegionServer) and DataNode: cloudera.com/content/support/en/documentation/manager-enterprise/cloudera-manager-enterprise-v4-latest.html Table 11. Worker Node Software Software RHEL 6.4 Oracle JDK 1.6.0_31 TaskTracker DataNode HBaseRegionServer Description Recommended Operating System Java Development Kit The TaskTracker process for MapReduce (Only if running MapReduce) The DataNode process for HDFS The HBaseRegionServer for HBase (Only if running HBase) The ProLiant DL380e Gen8 (2U) configured as a worker node should have the following base configuration: Dual Six-Core Intel Xeon E GHz Processors with Hyper-Threading Twelve x 2TB K LFF SATA MDL (24 TB for Data) Two 500 GB SATA 7.2K LFF MDL (Mirrored OS and runtime) 48 GB DDR3 Memory (6 x HP 8GB), 3 channels per socket) 4 x 1GbE NICs FlexibleLOM (Bonded) 1 x P420 Smart Array Controller Note Customers also have the option of purchasing a second power supply for additional power redundancy. This is especially appropriate for single rack clusters where the loss of a node represents a noticeable percentage of the cluster. 16
17 Table 12. The HP ProLiant DL380e Gen8 Server Configuration Qty Part Number Description B21 HP ProLiant DL380e Gen8 12LFF CTO Base Server B21 HP 2U LFF BB Rail Gen8 Kit L21 Intel Xeon E (2.4GHz/6-core/15MB/95W) FIO Kit - CPU B21 Intel Xeon E (2.4GHz/6-core/15MB/95W) Additional CPU B21 HP DL380e Gen8 CPU1 Riser Kit B21 HP 8GB 2Rx4 PC3L-10600R-9 (1333MHz) - Memory B21 HP 2U Gen8 Rear 2LFF Kit B21 HP 500GB 6G SATA 7.2k 3.5in SC MDL HDD B21 HP Ethernet 1Gb 4-port 331T Adapter B21 HP Smart Array P420/1GB FBWC Controller B21 HP 750W CS Gold Ht Plg Pwr Supply Kit B21 HP DL380e Gen8 HP Fan Kit B21 HP 2TB 6G SATA 7.2k 3.5in SC MDL HDD High Performance Worker nodes: Customers do have the option of purchasing DL380p worker nodes for high performance. This is especially appropriate for use cases where higher powered CPU is very important. For workloads that use HBase (RTD), Impala (RTQ) or Search (RTS), 128 GB RAM is recommended. The ProLiant DL380p Gen8 (2U) configured as a worker node should have the following base configuration: HP DL380p Gen8 (12 cores each) (2U) 2x E CPU 64 GB or 128 GB RAM depending on workload 16 x 1 TB 6G SAS 7.2k HDD 1 x HP Smart Array P420/1GB controller 1 x HP Smart Array P420i/512MB controller 1 x HP Ethernet 4P 1GbE Reference Architectures The following illustrates a reference progression of Hadoop clusters from a single rack to a multi-rack configuration. Best practices for each of the components within the configurations specified have been articulated earlier in this document. Single Rack Reference Architecture The Single Rack Cloudera Enterprise Reference Architecture (RA) is designed to perform well as a single rack cluster design but also form the basis for a much larger multi-rack design. When moving from the single rack to multi-rack design, one can simply add racks to the cluster without having to change any components within the single rack. The Reference Architecture reflects the following: Single Rack Network As previously described in the Network section, two IRF Bonded HP 5830AF-48G TOR switches are specified for performance and redundancy. The HP 5830AF-48G includes up to four 10GbE uplinks which can be used to connect the switches in the rack into the desired network or the 10GbE HP 5900AF-48XG Aggregation switch. Keep in mind that if IRF 17
18 bonding is used, it requires up to 2 10GbE ports per switch, which would leave between 2 to 3 10GbE ports on each switch for uplinks Cluster isolation and access configuration It is important to isolate the Hadoop Cluster on the network so that external network traffic does not affect the performance of the cluster. In addition, this also allows for the Hadoop cluster to be managed independently from that of its users, which ensures that the cluster administrator is the only one capable of making changes to the cluster configurations. To achieve this, we recommend isolating the JobTracker, NameNode and Worker nodes on their own private Hadoop Cluster subnet. Key Point Once a Hadoop cluster is isolated, the users of the cluster will still need a way to access the cluster and submit jobs to it. To achieve this we recommend multi-homing the Management node so that it participates in both the Hadoop Cluster subnet and a subnet belonging to the users of the cluster. Cloudera Manager is a web application that runs on the Management node and allows users to be able to manage and configure the Hadoop cluster (including seeing the status of jobs) without being on the same subnet, provided the Management node is multi-homed. Furthermore, this allows users to be able to shell into the Management node and run the Apache Pig or Apache Hive command line interfaces and submit jobs to the cluster that way. Staging data In addition, once the Hadoop Cluster is on its own private network one needs to think about how to be able to reach the HDFS in order to move data onto it. The HDFS client needs to potentially be able to reach every Hadoop DataNode in the cluster in order to stream blocks onto it to move data onto the HDFS. The Reference Architecture provides two ways to do this. The first option is to use the already multi-homed Management node. This server has been configured with twice the amount of disk capacity (an additional 3.6 TB) compared to the other management servers in order to provide an immediate solution to move data into the Hadoop Cluster from another subnet. The other option is to make use of the open ports that have been left available in the switch. This Reference Architecture has been designed such that if all 4 NICs are used on each worker node and 2 NICs are used on each management node it leaves 16 ports still available across both the switches in the rack. These 16 1GbE ports or the remaining 10GbE ports on the switches can be used by other multi-homed systems outside of the Hadoop cluster to move data into the Hadoop Cluster. Lastly, one can leverage WebHDFS which provides an HTTP proxy to securely read and write data to and from the Hadoop Distributed File System. For more information on WebHDFS, please see cloudera.com/content/cloudera-content/cloudera-docs/cdh4/4.2.0/cdh4-security-guide/cdh4sg_topic_3_9.html Rack enclosure The rack contains eighteen HP ProLiant DL380e servers, three HP ProLiant DL360p servers and two HP 5830AF-48G switches within a 42U rack. This leaves 1U open for a 1U KVM switch. Network As previously described in the Switches section, two HP 5830AF-48G switches are specified for performance and redundancy. The HP 5830AF-48G includes up to four 10GbE uplinks which can be used to connect the switches in the rack. Management nodes Three ProLiant DL360p Gen8 management nodes are specified: The Management Node The JobTracker Node The NameNode Detailed information on the hardware and software configurations is available in the Server selection section of this document. Worker nodes As specified in this design, eighteen ProLiant DL380e Gen8 worker nodes will fully populate a rack. 18
19 Best Practice One can have as few nodes as a single worker node, however starting with at least three worker nodes is recommended to provide the redundancy that comes with the default replication factor of 3. Performance improves with additional worker nodes as the JobTracker can leverage idle nodes to land jobs on servers that have the appropriate blocks, leveraging data locality rather than pulling data across the network. These servers are homogenous and run the DataNode and the TaskTracker (or HBaseRegionServer) processes. Power and cooling In planning for large clusters, it is important to properly manage power redundancy and distribution. To ensure the servers and racks have adequate power redundancy we recommend that each server have a backup power supply, and each rack have at least two Power Distribution Units (PDUs). There is an additional cost associated with procuring redundant power supplies. This is less important for larger clusters as the inherent failover redundancy within the Cloudera Distribution of Hadoop will ensure there is less impact. Best Practice For each server, we recommend that each power supply is connected to a different PDU than the other power supply on the same server. Furthermore, the PDUs in the rack can each be connected to a separate data center power line to protect the infrastructure from a data center power line failure. Additionally, distributing the server power supply connections evenly to the in-rack PDUs, as well as distributing the PDU connections evenly to the data center power lines ensures an even power distribution in the data center and avoids overloading any single data center power line. When designing a cluster, check the maximum power and cooling that the data center can supply to each rack and ensure that the rack does not require more power and cooling than is available. Open rack space The design leaves 1U open in the rack allowing for a KVM switch when using a standard 42U rack. 19
20 Figure 9. Single Rack Reference Architecture Rack Level View Figure 10. Single Rack Reference Architecture Software Distribution Multi-Rack Reference Architecture The Multi-Rack design assumes the Single Rack RA Cluster design is already in place and extends its scalability. The Single Rack configuration ensures the required amount of management services are in place for large scale out. For Multi-Rack clusters, one simply adds more racks of the configuration provided below to the Single Rack configuration. This section reflects the design of those racks. Rack enclosure The rack contains eighteen HP ProLiant DL380e Gen8 servers and two HP 5830AF-48G switches within a 42U rack. This leaves 4U open for an additional two DL380e servers (2U) or a KVM switch (1U); or, install two HP 5900AF-48XG (1U) aggregation switches in the first expansion rack. 20
21 Multi-Rack Network As previously described in the Network section, two HP 5830AF-48G TOR switches are specified per rack for performance and redundancy. The HP 5830AF-48G includes up to four 10GbE uplinks which can be used to connect the switches in the rack into the desired network, typically via HP 5900AF-48XG aggregation switches. Software The ProLiant DL380e servers in the rack all are configured as Worker nodes in the cluster, as all required management processes are already configured in the Single Rack RA. Aside from the OS, the following worker processes are typically present. DataNode TaskTracker (or HBaseRegionServer if you are using HBase) Open rack space The design leaves 4U open in the rack allowing for two additional Worker nodes, a KVM switch or aggregation switches. Figure 11. Multi Rack Reference Architecture Rack Level View 21
22 Figure 12. Multi-Rack Reference Architecture (extension of the single rack reference architecture) Vertica and Hadoop Relational database management systems such as HP Vertica excel at analytic processing for big volumes of structured data including call detail records, financial tick streams and parsed weblog data. HP Vertica is designed for high speed load and query when the database schema and relationships are well defined. Cloudera s Distribution for Hadoop, built on the popular open source Apache Software Foundation project, addresses the need for large-scale batch processing of unstructured or semi-structured data. When the schema or relationships are not well defined, Hadoop can be used to employ massive MapReduce style processing to derive structure out of data. The Cloudera Distribution simplifies installation, configuration, deployment and management of the powerful Hadoop framework for enterprise users. Each can be used standalone HP Vertica for high-speed loads and ad-hoc queries over relational data, Cloudera s Distribution for general-purpose batch processing, for example from log files. Combining Hadoop and Vertica creates a nearly infinitely scalable platform for tackling the challenges of big data. Note Vertica was the first analytic database company to deliver a bi-directional Hadoop Connector enabling seamless integration and job scheduling between the two distributed environments. With Vertica s Hadoop and Pig Connectors, users have unprecedented flexibility and speed in loading data from Hadoop to Vertica and querying data from Vertica in Hadoop as part of MapReduce jobs for example. The Vertica Hadoop and Pig Connectors are supported by Vertica, and available for download. For more information, please see vertica.com/the-analytics-platform/native-bi-etl-and-hadoop-mapreduce-integration/ 22
23 Summary HP and Cloudera allow one to derive new business insights from Big Data by providing a platform to store, manage and process data at scale. However, designing and ordering Hadoop Clusters can be both complex and time consuming. This white paper provided several reference architecture configurations for deploying clusters of varying sizes with Cloudera Enterprise on HP infrastructure and management software. These configurations leverage HP s balanced building blocks of servers, storage and networking, along with integrated management software and bundled support. In addition, this white paper has been created to assist in the rapid design and deployment of Cloudera Enterprise software on HP infrastructure for clusters of various sizes. 23
24 For more information Cloudera, cloudera.com Hadoop on HP, hp.com/go/hadoop Hadoop and Vertica, vertica.com/the-analytics-platform/native-bi-etl-and-hadoop-mapreduce-integration HP Insight Cluster Management Utility (CMU), hp.com/go/cmu HP 5830 Switch Series, hp.com/networking/5800 HP ProLiant servers, hp.com/go/proliant HP Enterprise Software, hp.com/go/software HP Networking, hp.com/go/networking HP Integrated Lights-Out (ilo), hp.com/servers/ilo HP Product Bulletin (QuickSpecs), hp.com/go/quickspecs HP Services, hp.com/go/services HP Support and Drivers, hp.com/go/support HP Systems Insight Manager (HP SIM), hp.com/go/hpsim To help us improve our documents, please provide feedback at hp.com/solutions/feedback. Sign up for updates hp.com/go/getupdated Copyright 2012, 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Red Hat is a registered trademark of Red Hat, Inc. in the United States and other countries. 4AA4-2533ENW, January 2014, Rev. 1
HP Reference Architecture for Cloudera Enterprise
Technical white paper HP Reference Architecture for Cloudera Enterprise HP Converged Infrastructure with Cloudera Enterprise for Apache Hadoop Table of contents Executive summary 2 Cloudera Enterprise
HP Reference Architecture for Hortonworks Data Platform on HP ProLiant SL4540 Gen8 Server
Technical white paper HP Reference Architecture for Hortonworks Data Platform on HP Server HP Converged Infrastructure with the Hortonworks Data Platform for Apache Hadoop Table of contents Executive summary...
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS)
HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS) Building blocks at 1000, 3000, and 5000 mailboxes Table of contents Executive summary...
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node. HP Vertica Analytic Database
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node HP Vertica Analytic Database HP Big Data Foundations Document Release Date: February, 2015 Contents Using the HP DL380 Gen9 24-SFF
HP recommended configuration for Microsoft Exchange Server 2010: ProLiant DL370 G6 supporting 1000-2GB mailboxes
HP recommended configuration for Microsoft Exchange Server 2010: ProLiant DL370 G6 supporting 1000-2GB mailboxes Table of contents Executive summary... 2 Introduction... 3 Tiered solution matrix... 3 Recommended
Annex 9: Private Cloud Specifications
Annex 9: Private Cloud Specifications The MoICT Private Cloud Solution is based on the Data Center Services (DCS) offering from CMS. DCS is comprised of a fabric in the form of one or more resource pools
HP RA for SAS Visual Analytics on HP ProLiant BL460c Gen8 Servers running Linux
Technical white paper HP RA for SAS Visual Analytics on Servers running Linux Performance results with a concurrent workload of 5 light users and 1 heavy user accessing a 112GB dataset Table of contents
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN Table of contents Executive summary... 2 Introduction... 2 Solution criteria... 3 Hyper-V guest machine configurations...
HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering
Technical white paper HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering Table of contents Executive summary 2 Fast Track reference
HP Client Virtualization SMB Reference Architecture for Windows Server 2012
Technical white paper HP Client Virtualization SMB Reference Architecture for Windows Server 212 Affordable, high-performing desktop virtualization on HP ProLiant DL38p Gen8 and Windows Server 212 Table
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
Apache Hadoop Cluster Configuration Guide
Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture
WHITE PAPER HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture Based on Microsoft SQL Server 2014 Data Warehouse
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop For next-generation Lenovo ThinkServer systems Lenovo Enterprise Product Group Version 1.0 December 2014 2014 Lenovo. All rights reserved. LENOVO
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
HP ProLiant DL380p Gen8 1000 mailbox 2GB mailbox resiliency Exchange 2010 storage solution
Technical white paper HP ProLiant DL380p Gen8 1000 mailbox 2GB mailbox resiliency Exchange 2010 storage solution Table of contents Overview 2 Disclaimer 2 Features of the tested solution 2 Solution description
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture
Technical white paper HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture Table of contents Executive summary... 2 Solution overview... 3 Solution components... 4 Storage... 5 Compute...
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (
Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...
HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation
Technical white paper HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation HP Big Data Reference Architecture with Hortonworks Data Platform for Apache Hadoop
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS
APACHE HADOOP PLATFORM BIG DATA HARDWARE INFRASTRUCTURE SOLUTIONS 1 BIG DATA. BIG CHALLENGES. BIG OPPORTUNITY. How do you manage the VOLUME, VELOCITY & VARIABILITY of complex data streams in order to find
s/n item Requirements 1 Upgrading of Network Core Switches and Microsoft Servers Virtualization
Date: 29 th January, 2015 Invitation for bids The Bank of South Sudan would like to invite sealed/electronic bids from qualified companies to submit it Financial and Technical proposals to the bank on
NetApp Solutions for Hadoop Reference Architecture
White Paper NetApp Solutions for Hadoop Reference Architecture Gus Horn, Iyer Venkatesan, NetApp April 2014 WP-7196 Abstract Today s businesses need to store, control, and analyze the unprecedented complexity,
ProLiant DL380p Gen8 Customer presentation
ProLiant DL380p Gen8 Customer presentation Agenda Family/product positioning Product overview Key features and benefits What s new Gen to gen compare System detail Competitive comparison (TBD after launch)
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
White Paper. Better Performance, Lower Costs. The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere 5.
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper Better Performance, Lower Costs The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
FUJITSU Enterprise Product & Solution Facts
FUJITSU Enterprise Product & Solution Facts shaping tomorrow with you Business-Centric Data Center The way ICT delivers value is fundamentally changing. Mobile, Big Data, cloud and social media are driving
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
Terms of Reference Microsoft Exchange and Domain Controller/ AD implementation
Terms of Reference Microsoft Exchange and Domain Controller/ AD implementation Overview Maldivian Red Crescent will implement it s first Microsoft Exchange server and replace it s current Domain Controller
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
VTrak 15200 SATA RAID Storage System
Page 1 15-Drive Supports over 5 TB of reliable, low-cost, high performance storage 15200 Product Highlights First to deliver a full HW iscsi solution with SATA drives - Lower CPU utilization - Higher data
Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems
WHITE PAPER HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Overview & Features... 3 QuickSpecs...3 HP Supported
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Brainlab Node TM Technical Specifications
Brainlab Node TM Technical Specifications BRAINLAB NODE TM HP ProLiant DL360p Gen 8 CPU: Chipset: RAM: HDD: RAID: Graphics: LAN: HW Monitoring: Height: Width: Length: Weight: Operating System: 2x Intel
New to servers. Are you new to servers? Consider these HP ProLiant Essentials servers. Family guide HP ProLiant rack and tower servers
New to servers Are you new to servers? Consider these HP ProLiant Essentials servers. MicroServer Gen8 Micro-sized server for collaboration, centralization, and command of your business with remote management
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario
Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario Evaluation report prepared under contract with Dell Introduction Dell introduced its PowerEdge VRTX integrated IT solution for remote-office
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
Big Data - Infrastructure Considerations
April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright
Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop
Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated System PRIMEFLEX Your fast track to datacenter
HP Verified Reference Architecture BDRA and Hortonworks Data Platform implementation
HP Reference Architectures HP Verified Reference Architecture BDRA and Hortonworks Data Platform implementation New storage developments on BDRA with Apollo 4510 Table of contents Executive summary...
IBM System x reference architecture for Hadoop: MapR
IBM System x reference architecture for Hadoop: MapR May 2014 Beth L Hoffman and Billy Robinson (IBM) Andy Lerner and James Sun (MapR Technologies) Copyright IBM Corporation, 2014 Table of contents Introduction...
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
HP 85 TB reference architectures for Microsoft SQL Server 2012 Fast Track Data Warehouse: HP ProLiant DL980 G7 and P2000 G3 MSA Storage
Technical white paper HP 85 TB reference architectures for Microsoft SQL Server 2012 Fast Track Data Warehouse: HP ProLiant DL980 G7 and P2000 G3 MSA Storage Table of contents Executive summary... 2 Fast
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Intel RAID SSD Cache Controller RCS25ZB040
SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster
ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING
ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING 1 CONTENTS About Zabbix Software... 2 Main Functions... 3 Architecture... 4 Installation Requirements...
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
HP Cloudline Overview
HP Cloudline Overview Infrastructure for Cloud Service Providers Dave Peterson Hewlett-Packard Company Copyright 2015 2012 Hewlett-Packard Development Company, L.P. The information contained herein is
HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper
Technical white paper HP Moonshot System A new style of IT accelerating innovation at scale Table of contents Abstract... 2 Meeting the new style of IT requirements... 2 What makes the HP Moonshot System
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer
The Future of Computing Cisco Unified Computing System Markus Kunstmann Channels Systems Engineer 2009 Cisco Systems, Inc. All rights reserved. Data Centers Are under Increasing Pressure Collaboration
Certified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
HP ProLiant Server Booklet. Virtualization Reliability Efficiency Agility
HP ProLiant Server Booklet OST BO Virtualization Reliability Efficiency Agility Table of contents: HP Product Family HP Insight Control HP BladeSystem Matrix HP ProLiant BL2x220c Generation 6 HP ProLiant
Cost Efficient VDI. XenDesktop 7 on Commodity Hardware
Cost Efficient VDI XenDesktop 7 on Commodity Hardware 1 Introduction An increasing number of enterprises are looking towards desktop virtualization to help them respond to rising IT costs, security concerns,
ADDENDA FOR BID DOCUMENT FOR SUPPLY, INSTALLATION OF SERVER, STORAGE AND NETWORKING EQUIPMENTS FOR MGIEP&SD, NEW DELHI
ADDENDA FOR BID DOCUMENT FOR SUPPLY, INSTALLATION OF SERVER, STORAGE AND NETWORKING EQUIPMENTS FOR MGIEP&SD, NEW DELHI [Bid No.Ed.CIL/PROC/MGIEP&SD/SERVER/2013] Start/last date of sale of Bid : 30.09.2013
Certification: HP ATA Servers & Storage
HP ExpertONE Competency Model Certification: HP ATA Servers & Storage Overview Achieving an HP certification provides relevant skills that can lead to a fulfilling career in Information Technology. HP
IBM System x family brochure
IBM Systems and Technology System x IBM System x family brochure IBM System x rack and tower servers 2 IBM System x family brochure IBM System x servers Highlights IBM System x and BladeCenter servers
HP ConvergedSystem 300 for Microsoft Analytics Platform
Technical white paper HP ConvergedSystem 300 for Microsoft Analytics Platform Data Integration Platform Gen9 reference guide Table of contents Executive summary... 2 Overview of services for the Data Integration
SAN TECHNICAL - DETAILS/ SPECIFICATIONS
SAN TECHNICAL - DETAILS/ SPECIFICATIONS Technical Details / Specifications for 25 -TB Usable capacity SAN Solution Item 1) SAN STORAGE HARDWARE : One No. S.N. Features Description Technical Compliance
HP Verified Reference Architecture for running HBase on HP BDRA
HP Reference Architectures HP Verified Reference Architecture for running HBase on HP BDRA Configuration guide for running HBase on HP Big Data Reference Architecture Table of contents Executive summary...
HP D3600 Disk Enclosure 4,000 Mailbox Resiliency Exchange 2013 Storage Solution
Technical white paper HP D3600 Disk Enclosure 4,000 Mailbox Resiliency Exchange 2013 Storage Solution Table of contents Overview... 2 Disclaimer... 2 Features of the tested solution... 2 HP D3600 Disk
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
MapR Enterprise Edition & Enterprise Database Edition
MapR Enterprise Edition & Enterprise Database Edition Reference Architecture A PSSC Labs Reference Architecture Guide June 2015 Introduction PSSC Labs continues to bring innovative compute server and cluster
OPTIMIZING SERVER VIRTUALIZATION
OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Dell s SAP HANA Appliance
Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
White Paper Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations Contents Next-Generation Hadoop Solution... 3 Greenplum MR: Hadoop Reengineered... 3 : The Exclusive
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
