HP Verified Reference Architecture BDRA and Hortonworks Data Platform implementation
|
|
|
- Allison McCoy
- 10 years ago
- Views:
Transcription
1 HP Reference Architectures HP Verified Reference Architecture BDRA and Hortonworks Data Platform implementation New storage developments on BDRA with Apollo 4510 Table of contents Executive summary... 2 Introduction... 2 Solution overview... 2 Benefits of the HP BDRA solution... 4 Hortonworks Data Platform overview... 5 Solution components... 6 Minimum configuration... 7 Networking... 8 Capacity and sizing... 9 Expanding the base configuration... 9 Guidelines for calculating storage needs Configuration guidance Optimizing UDA settings Setting up HDP Configuring compression Bill of materials Summary Advantages of HP BDRA Implementing a proof-of-concept Appendix A: Alternate parts for storage nodes Appendix B: HP value added services and support For more information Click here to verify the latest version of this document
2 Executive summary HP Big Data Reference Architecture (BDRA) is a modern architecture for the deployment of big data solutions. It is designed to improve access to big data, rapidly deploy big data solutions, and provide the flexibility needed to optimize the infrastructure in response to ever-changing requirements in a Hadoop ecosystem. HP BDRA challenges the conventional wisdom used in current big data solutions, where compute resources are typically co-located with storage resources on a single server node. Instead, HP BDRA utilizes a much more flexible approach that leverages an asymmetric cluster featuring de-coupled tiers of workload-optimized compute and storage nodes in conjunction with modern networking. The result is a big data platform that makes it easier to deploy, use, and scale data management applications. As customers start to retain the dropped calls, mouse-clicks and other information that was previously discarded, they are adopting the use of single data repositories (also known as data lakes) to capture and store big data for later analysis. Thus, there is a growing need for a scalable, modern architecture for the consolidation, storage, access, and processing of big data. HP BDRA meets this need by offering an extremely flexible platform for the deployment of data lakes, while also providing access to a broad range of applications. Moreover, HP BDRA s tiered architecture allows organizations to grow specific elements of the platform without the chore of having to redistribute data. Clearly, big data solutions are evolving from a simple model where each application was deployed on a dedicated cluster of identical nodes. By integrating the significant changes that have occurred in fabrics, storage, container-based resource management, and workload-optimized servers, HP has created the next-generation, data center-friendly, big data cluster. This white paper describes the components and capabilities of the reference architecture, highlights recognizable benefits, and provides guidance on selecting the appropriate configuration for building a Hadoop cluster based on Hortonworks Data Platform (HDP) to meet particular business needs. In addition to simplifying the procurement process, the paper also provides guidelines for setting up HDP once the system has been deployed. Target audience: This paper is intended for decision makers, system and solution architects, system administrators and experienced users that are interested in reducing design time for or simplifying the purchase of a big data architecture based on solution components provided by HP and Hortonworks. An intermediate knowledge of Apache Hadoop and scaleout infrastructure is recommended. Document purpose: The purpose of this document is to describe the optimal way to configure Hortonworks HDP on HP BDRA. This Reference Architecture describes the solution testing performed in May Introduction HP BDRA is a reference architecture that is based on hundreds of man-hours of research and testing by HP engineers. To allow customers to deploy solutions based on this architecture, HP offers detailed Bills of Materials (BOMs) based on a proof-of-concept. The recommended solution components HP Software, HP Moonshot System components, HP ProLiant servers, and HP Networking switches and their respective configurations have been carefully tested with a variety of I/O-, CPU-, network-, and memory-bound workloads. The resulting architecture provides optimal price/performance for Hortonworks Data Platform (HDP). To facilitate installation, HP has developed a broad range of Intellectual Property (IP) that allows solutions to be implemented by HP or, jointly, with partners and customers. Additional assistance is available from HP Technical Services. For more information, refer to Appendix B: HP value added services and support. Solution overview As companies grow their big data implementations they often find themselves deploying multiple clusters to support their needs. This could be to support different big data environments (Hadoop, NoSQLs, MPP DBMSs, etc.) with optimal hardware, to support rigid workload partitioning for departmental requirements or simply as a byproduct of multi-generational hardware. This often leads to data duplication and movement of large amounts of data between systems to accomplish an organization s business requirements. We find some customers searching for a way to recapture some of the traditional benefits of a converged infrastructure such as the ability to more easily share data between different applications running on different platforms, the ability to scale compute and storage separately and the ability to rapidly provision new servers without repartitioning data to achieve optimal performance. To address these needs, HP engineers challenged the conventional wisdom that compute should always be collocated with data. While this approach works very well we believe that the power of taking the work to the data in Hadoop is that we are parallelizing the work into a pure shared nothing model where tasks run against specific slices of the data without the need 2
3 for coordination such as distributed lock management or distributed cache management as you might have in older parallel database designs. Actually collocating the work with the data within the sheet metal of a computer chassis is irrelevant next to the power of the function shipping paradigm that Hadoop offers. In fact in a modern server we often have more network bandwidth available to ship data off the server than we have bandwidth to disk. Given this and the dramatic increase in Ethernet networking bandwidth we decided to deploy the HP Big Data Reference Architecture as an asymmetric cluster with some nodes dedicated to compute and others dedicated to Hadoop Distributed File System (HDFS). Hadoop still works the same way tasks still have complete ownership of their portion of the data (functions are still being shipped to the data) but the compute is executed on a node optimized for compute work and the file system is executed on a node that is optimized for file system. Of particular interest is that we have found that this approach can actually perform better than a traditional Hadoop cluster for several reasons. For more information on this architecture and the benefits that it provides see the overview master document at It is very important to understand that the HP Big Data Reference Architecture is a highly optimized configuration built using unique servers offered by HP the HP ProLiant SL4540 for the high density storage layer, and HP Moonshot for the high density computational layer. It is also the result of a great deal of testing and optimization done by our HP engineers which resulted in the right set of software, drivers, firmware and hardware to yield extremely high density and performance. Simply deploying Hadoop onto a collection of traditional servers in an asymmetric fashion will not yield the kind of benefits that we see with our reference architecture. In order to simplify the build for customers, HP will provide the exact bill of materials in this document to allow a customer to purchase their cluster. Then, our Technical Services Consulting organization will provide a service that will install our prebuilt operating system images and verify all firmware and versions are correctly installed then run a suite of tests that verify that the configuration is performing optimally. Once this has been done, the customer is free to do a fairly standard Hortonworks installation on the cluster using the recommended guidelines in this document. Figure 1. HP BDRA, changing the economics of work distribution in big data The HP BDRA design is anchored by the following HP technologies. Storage nodes HP ProLiant SL4540 Gen8 2-Node Servers (SL4540) make up the HDFS storage layer, providing a single repository for big data. HP Apollo 4510 Server provides optional ultra-dense storage capacity and performance. Note that HP Apollo 4510 is recommended as an optional server, not a replacement to SL4540, to provide backup and archival storage for big data. Compute nodes HP Moonshot System cartridges deliver a scalable, high-density layer for compute tasks and provide a framework for workload-optimization. High-speed networking separates compute nodes and storage nodes, creating an asymmetric architecture that allows each tier to be scaled individually; there is no commitment to a particular CPU/storage ratio. Since big data is no longer co-located with storage, Hadoop does need to achieve node locality. However, rack locality works in exactly the same way as in a traditional converged infrastructure; that is, as long as you scale within a rack, overall scalability is not affected. 3
4 With compute and storage de-coupled, you can again enjoy many of the advantages of a traditional converged system. For example, you can scale compute and storage independently, simply by adding compute nodes or storage nodes. Testing carried out by HP indicates that most workloads respond almost linearly to additional compute resources. Hadoop YARN YARN is a key feature of the latest generation of Hadoop and of HP BDRA. It decouples MapReduce s resource management and scheduling capabilities from the data processing components, allowing Hadoop to support more varied processing approaches and a broader array of applications. New to YARN is a concept of labels for groupings of compute nodes. Jobs submitted through YARN can now be flagged to perform their work on a particular set of nodes when the appropriate label name is included with the job. Thus, you can now create groups of compute resources that are designed, built, or optimized for particular types of computational work, allowing for jobs to be passed out to groups of hardware that are more geared to the type of computation work. In addition, the use of labels allows you to isolate compute resources that may be required to perform a high-priority job, for example, ensuring that sufficient resources are available at any given time. Benefits of the HP BDRA solution While the most obvious benefits of the HP BDRA solution center on density and price/performance, other benefits include: Elasticity HP BDRA has been designed for flexibility. Compute nodes can be allocated very flexibly without redistributing data; for example, based on time-of-day or even a single job. You are no longer committed to yesterday s CPU/storage ratios; now, you have much more flexibility in design and cost. You only need to grow your system in areas that are needed. YARN-compliant workloads access big data directly via HDFS; other workloads can access the same data via appropriate connectors Consolidation HP BDRA is based on HDFS, which has enough performance and can scale to large enough capacities to be the single source for big data within any organization. You can consolidate the various pools of data currently being used in your big data projects into a single, central repository. Workload-optimization There is no one go-to software for big data; instead there is a federation of data management tools. After selecting the appropriate tool for your requirements, you can then run your job using the compute node that is best optimized for the workload, such as a low-power Moonshot cartridge or a compute-intense cartridge. Enhanced capacity management Compute nodes can be provisioned on the fly. Storage nodes are a smaller subset of the cluster and, as such, are less costly to overprovision. Managing a single data repository rather than multiple different clusters reduces overall management costs. Faster time-to-solution Processing big data typically requires the use of multiple data management tools. If these tools are deployed on their own Hadoop clusters with their own often fragmented copies of the data, time-to-solution can be lengthy. With HP BDRA, data is unfragmented, consolidated in a single data lake; and tools access the same data via YARN or a connector. Thus, there is more time spent on analysis, less on shipping data; time-to-solution is typically faster. 4
5 Hortonworks Data Platform overview Apache Hadoop is an open-source project administered by the Apache Software Foundation. Hadoop s contributors work for some of the world s biggest technology companies, creating a diverse, motivated community that has produced an innovative platform for consolidating, combining, and understanding big data. While today s enterprises collect and generate more data than ever before, the focus of conventional relational and data warehouse products is on OLAP and OLTP, which utilize structured data. Hadoop, however, is designed to solve a different business problem the fast, reliable analysis of both structured and complex data. As a result, many organizations are now deploying Hadoop alongside existing IT systems, thus combining semi-structured data and new data sets in powerful new ways. HDP is a platform for multi-workload data processing; it can utilize a range of processing methods from batch through interactive and real-time all supported by solutions for governance, integration, security, and operations. HDP integrates with and augments solutions like HP BDRA, allowing you to maximize the value of big data, and it allows you to deploy Hadoop wherever you want from the cloud, on-premises, across both Linux and Microsoft Windows. Figure 2 provides an overview of HDP. Figure 2. HDP provides a blueprint for Enterprise Hadoop HDP enables Enterprise Hadoop, a full suite of essential Hadoop capabilities in the following functional areas: data management, data access, data governance and integration, security, and operations. Key highlights of HDP 2.2 include the following: Batch and interactive SQL queries via Apache Hive and Apache Tez, along with a cost-based optimizer powered by Apache Calcite High-performance ETL via Pig and Tez Stream processing via Apache Storm and Apache Kafka YARN labels Search via Apache Solr Streamlined cluster operations via Apache Ambari Data lifecycle management via Apache Falcon Perimeter security via Apache Knox Centralized security administration for HDFS, Hive, HBase, Storm, and Knox via Apache Ranger For more information on HDP, refer to hortonworks.com/hdp. 5
6 Solution components Figure 3 provides a basic conceptual diagram of HP BDRA. Figure 3. HP BDRA concept For full BOM listings of products selected for the proof of concept, refer to the Bill of materials section of this white paper. 6
7 Minimum configuration Figure 4 shows a minimum HP BDRA configuration, with 45 worker nodes and 4 storage nodes housed in a single 42U rack. Figure 4. Base HP BDRA configuration, with detailed lists of components The following nodes are used in the base HP BDRA configuration: Compute nodes The base HP BDRA configuration features one HP Moonshot 1500 Chassis containing a total of 45 m710 cartridge servers. Storage nodes There are two SL4500 Chassis, each with two SL4540 servers, for a total of four storage nodes. Each node is configured with 25 disks; each typically runs DataNode. HP Apollo 4510 Server provides optional ultra-dense storage capacity and performance. Note that HP Apollo 4510 is recommended as an optional server, not a replacement to SL4540, to provide backup and archival storage for big data. Each Apollo 4510 is configured with 68 disks. Management/head nodes Three HP ProLiant DL360 Gen9 servers are configured as management/head nodes with the following functionality: Management node, with Ambari, HP Insight Cluster Management Utility (Insight CMU), ZooKeeper, and JournalNode Head node, with active ResourceManager/standby NameNode, ZooKeeper, and JournalNode Head node, with active NameNode/standby ResourceManager, ZooKeeper, and JournalNode Best Practice HP recommends starting with a minimum of one full Moonshot 1500 chassis, with 45 compute nodes. To provide high availability, start with a minimum of two SL4500 chassis, with a total of four storage nodes in order to provide the redundancy that comes with the default replication factor of three. 7
8 Power and cooling When planning large clusters, it is important to properly manage power redundancy and distribution. To ensure servers and racks have adequate power redundancy, HP recommends that each chassis (Moonshot 1500 or SL4500) and each HP ProLiant DL360 Gen9 server should have a backup power supply and each rack should have at least two Power Distribution Units (PDUs). There is additional cost associated with procuring redundant power supplies; however, the need for redundancy is less critical in larger clusters where the inherent redundancy within HDP ensures there would be less impact in the event of a failure. Best Practice For each chassis and HP ProLiant DL360 Gen9 server, HP recommends connecting each of the device s power supplies to a different PDU. Furthermore, PDUs should each be connected to a separate data center power line to protect the infrastructure from a line failure. Distributing server power supply connections evenly to the PDUs while also distributing PDU connections evenly to data center power lines ensures an even power distribution in the data center and avoids overloading any single power line. When designing a cluster, check the maximum power and cooling that the data center can supply to each rack and ensure that the rack does not require more power and cooling than is available. Networking Two IRF-bonded HP QSFP+ switches are specified in each rack for high performance and redundancy. Each provides six 40GbE uplinks that can be used to connect to the desired network or, in a multi-rack configuration, to another pair of HP QSFP+ switches that are used for aggregation. The other 22 ports are available for Hadoop nodes. Note IRF-bonding requires four 40GbE ports per switch, leaving six 40GbE ports on each switch for uplinks. The three NICs in each of the two switch modules in a Moonshot 1500 chassis are LACP-bonded. ilo network A single HP 5900 switch is used exclusively to provide connectivity to HP Integrated Lights-Out (HP ilo) management ports, which run at or below 1GbE. The HP ilo network is used for system provisioning and maintenance. Cluster isolation and access configuration It is important to isolate the Hadoop cluster so that external network traffic does not affect the performance of the cluster. In addition, isolation allows the Hadoop cluster to be managed independently from its users, ensuring that the cluster administrator is the only person able to make changes to the cluster configuration. Thus, HP recommends deploying ResourceManager, NameNode, and Worker nodes on their own private Hadoop cluster subnet. Key Point Once a Hadoop cluster is isolated, the users still need a way to access the cluster and submit jobs to it. To achieve this, HP recommends multi-homing the management node so that it can participate in both the Hadoop cluster subnet and a subnet belonging to users. Ambari is a web application that runs on the management node, allowing users to manage and configure the Hadoop cluster and view the status of jobs without being on the same subnet provided that the management node is multihomed. Furthermore, this approach allows users to shell into the management node and run Apache Pig or Apache Hive command line interfaces and submit jobs to the cluster in that way. 8
9 Staging data After the Hadoop cluster has been isolated on its own private network, you must determine how to access HDFS in order to ingest data. The HDFS client must be able to reach every Hadoop DataNode in the cluster in order to stream blocks of data on to the HDFS. HP BDRA provides the following options for staging data: Using the management node You can utilize the already multi-homed management server to stage data. HP recommends configuring this server with eight disks to provide a sufficient amount of disk capacity to provide a staging area for ingesting data into the Hadoop cluster from another subnet. Using the ToR switches You can make use of open ports in the ToR switches. HP BDRA is designed such that if both NICs on each storage node are used, six NICs are used on each Moonshot 1500 Chassis (three ports from Switch A and three from Switch B), and two NICs are used on each management/head node, the remaining 40GbE ports on the ToR switches can be used by multi-homed systems outside the Hadoop cluster to move data into the cluster. Note The benefit of using dual-homed management node(s) to isolate in-cluster Hadoop traffic from the ETL traffic flowing to the cluster may be debated. This enhances security; however, it may result in ETL performance/connectivity issues, since relatively few nodes are capable of ingesting data. For example, you may wish to initiate Sqoop tasks on the compute nodes to ingest data from an external RDBMS in order to maximize the ingest rate. However, this approach requires worker nodes to be exposed to the external network to parallelize data ingestion, which is less secure. You should consider your options before committing to an optimal network design for your particular environment. HP recommends dual-homing the entire cluster, allowing every node to ingest data. Using WebHDFS You can leverage WebHDFS, which provides HTTP access to securely read and write data to and from HDFS. For more information on WebHDFS, refer to docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.1.5/bk_systemadmin-guide/content/sysadminguides_webhdfs.html. Capacity and sizing Expanding the base configuration As needed, compute and/or storage nodes can be added to a base HP BDRA configuration. The minimum HP BDRA configuration with one Moonshot 1500 chassis and two SL4500 chassis can expand to include a total of eight mixed chassis in a single rack. There are a number of options for configuring a single-rack HP BDRA solution, ranging from hot (with a large number of compute nodes and minimal storage) to cold (with a large number of storage nodes and minimal compute). As dictated by the customer s requirements, the base configuration can also be expanded by adding the optional HP Apollo 4510 server for archival and backup. Multi-rack configuration The single-rack HP BDRA configuration is designed to perform well as a standalone solution but also form the basis of a much larger multi-rack solution, as shown in Figure 5. When moving from a single rack to a multi-rack solution, you simply add racks without having to change any components within the base rack. A multi-rack solution assumes the base rack is already in place and extends its scalability. For example, the base rack already provides sufficient management services for a large scale-out system. 9
10 Figure 5. Multi-rack HP BDRA, extending the capabilities of a single rack Note While much of the architecture for the multi-rack solution was borrowed from the single-rack design, the architecture suggested here for multi-rack solutions is based on previous iterations of Hadoop testing on the HP ProLiant DL380e platform rather than cartridge servers. It is provided here as a general guideline for designing multi-rack Hadoop clusters. Extending networking in a multi-rack configuration For performance and redundancy, two HP QSFP+ ToR switches are specified per expansion rack. The HP QSFP+ switch includes up to six 40GbE uplinks that can be used to connect these switches to the desired network via a pair of HP QSFP+ aggregation switches. Guidelines for calculating storage needs Hadoop cluster storage sizing requires careful planning, based on the identification of current and future storage and compute needs. The following are general considerations for data inventory: Sources of data Frequency of data Raw storage Processed HDFS storage Replication factor 10
11 Default compression turned on Space for intermediate files To calculate your storage needs, you should identify the number of TB of data needed per day, week, month, and year; and then add the ingestion rates of all data sources. It makes sense to identify storage requirements for the short-, medium-, and long-term. Another important consideration is data retention both size and duration. Which data must you keep? For how long? In addition, consider maximum fill-rate and file system format space requirements on hard drives when estimating storage size. Configuration guidance The instructions provided here assume you have already created your Hadoop cluster on an HP BDRA solution. They are intended to help you: 1. Optimize settings for the Mellanox Unstructured Data Accelerator for Hadoop (UDA). 2. Set up HDP. Optimizing UDA settings UDA is a software plug-in designed to accelerate Hadoop networks and improve the scaling of Hadoop clusters that execute intensive applications such as data analytics. UDA facilitates efficient data shuffles and merges over Mellanox InfiniBand and 10GbE/40GbE/56GbE RDMA over Converged Ethernet (RoCE) adapter cards. UDA is integrated into Apache Hadoop 2.0.x and 3.0.x; patches are available for all binary-compatible distributions. UDA installation is a simple RPM addition to the execution library. UDA software is installed in /usr/lib64/uda and should be installed on every node, including the management node. UDA documentation is available from the following sources: Mellanox Unstructured Data Acceleration (UDA) Release Notes Rev _v3_4_0.pdf?dl=1 Mellanox Unstructured Data Acceleration (UDA) Quick Start Guide Rev Guide_v3_4_0.pdf?dl=1 This section provides general assistance for configuring Mellanox Ethernet cards and related software to work with Hadoop. CompletedMaps This parameter gives all MapReduce resources (RAM/CPU) to mappers at the map phase and all MapReduce resources to reducers at the reduce phase. UDA s levitated merge causes reducers to wait until all mappers complete; thus, there is no need to give reducers resources while mappers are running and vice versa. mapreduce.job.reduce.slowstart.completedmaps=1.00 Invoking the UDA shuffle The UDA plug-in for shuffles is not configured as the default shuffle plug-in. To invoke the UDA shuffle, you need to specify the plug-in when starting the job, as in the following example: hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort -Dmapreduce.job.reduce.shuffle.consumer.plugin.class= com.mellanox.hadoop.mapred.udashuffleconsumerplugin - Dmapred.reduce.child.log.level=WARN -Dyarn.app.mapreduce.am.job.cbdmode.enable=false -Dyarn.app.mapreduce.am.job.map.pushdown=false - Dmapreduce.job.maps=538 -Dmapreduce.job.reduces=178 /moonshot/terasort/input /moonshot/terasort/output 11
12 Adding a symbolic link Unless you add a symbolic link to the UDA jar file, NodeManagers do not start. The following example must be updated whenever the Hadoop version changes unless you create a link that is common across releases pointing to the latest version of jar files: ln -s /usr/lib64/uda/uda-hadoop-2.x.jar /usr/lib/hadoop-mapreduce/udahadoop-2.x.jar Setting up HDP Compute node components Compute nodes contain the software shown in Table 1. Table 1. Compute node base software components Software Red Hat Enterprise Linux (RHEL) 6.5 Oracle Java Development Kit (JDK) 1.7.0_45 NodeManager Description Recommended operating system JDK NodeManager process for MR2/YARN Visit docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.1.5/bk_system-admin-guide/content/admin_add-nodes-2.html for more information on the following topics: Manually installing and configuring NodeManager (or HBaseRegionServer) and DataNode Adding nodes Storage node components Storage nodes contain the software shown in Table 2. Table 2. Storage node base software components Software RHEL 6.5 Oracle JDK 1.7.0_45 DataNode Description Recommended operating system JDK DataNode process for HDFS Visit docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.1.5/bk_system-admin-guide/content/admin_add-nodes-2.html for more information on the following topics: Manually installing and configuring DataNode Adding nodes Configuration changes Whenever possible, make the changes described here using Ambari. Hosts Memory Overcommit Validation Threshold.75 (default.80) HDFS Make the following changes to the HDFS configuration: Increase the dfs.blocksize value to allow more data to be processed by each map task, thus reducing the total number of mappers and NameNode memory consumption. dfs.ha.fencing.method shell (true) dfs.block.size, dfs.blocksize 256 (default 128) Increase the dfs.namenode.handler.count value to better manage multiple HDFS operations from multiple clients. dfs.namenode.handler.count 120 (default 30) 12
13 Increase the dfs.namenode.service.handler.count value to better manage multiple HDFS service requests from multiple clients. dfs.namenode.service.handler.count 120 (default 30) Increase the Java heap size of the NameNode to provide more memory for NameNode metadata and ensure Maximum Process File Descriptors is set to 65,536: # Maximum Process File descripts can be set in many services. Maximum Process File Descriptors Java Heap Size of NameNode in Bytes 2138 Mib (default 1 Gib) Java Heap Size of Secondary NameNode in Bytes 2138 Mib (default 1 Gib) Increase certain timeout values and numbers of retries for network-intensive environments. Make the following changes in the HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml: <property> <name>dfs.socket.timeout</name> <value> </value> </property> <property> <name>dfs.datanode.socket.write.timeout</name> <value> </value> </property> <property> <name>dfs.client.block.write.locatefollowingblock.retries</name> <value>30</value> </property> YARN Both yarn-site.xml and mapreduce-site.xml appear in several groups. The following values, which are used in all locations, are used to configure the UDA shuffle plug-in. Make the following changes in the YARN Service Advanced Configuration Snippet (Safety Valve) for yarn-site.xml: Note The uda_shuffle can only be configured after UDA installation and configuration is complete. NodeManagers may fail to start if not properly set up. Replace the default MapReduce shuffle handler with the Mellanox UDA shuffle handler: <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,uda_shuffle</value> </property> Specify the Java class file for the Mellanox UDA shuffle handler: <property> <name>yarn.nodemanager.aux-services.uda_shuffle.class</name> <value>com.mellanox.hadoop.mapred.udashufflehandler</value> </property> Set the block size for Snappy Compression, which is used in HP BDRA, helping to reduce the amount of I/O: <property> <name>io.compression.codec.snappy.buffersize</name> <value>32768</value> </property> Mellanox UDA shuffle handler Make the following changes in YARN Service MapReduce Advanced Configuration Snippet (Safety Valve). Refer to Optimizing UDA settings for more information. 13
14 Specify Java class files for the Mellanox shuffle provider: <property> <name>mapreduce.shuffle.provider.plugin.classes</name> <value>com.mellanox.hadoop.mapred.udashuffleproviderplugin,org.apache.hadoop.mapred.tasktracker$defaultshuffleprovider</value> </property> Replace the default MapReduce shuffle handler with the Mellanox shuffle handler: <property> <name>mapreduce.job.shuffle.provider.services</name> <value>uda_shuffle</value> </property> Specify the total amount of memory per reducer allocated for UDA s RDMA shuffle phase and levitated merge algorithm. More memory is preferable. <property> <name>mapred.rdma.mem.use.contig.pages</name> <value>1</value> </property> <property> <name>mapred.rdma.shuffle.total.size</name> <value>4g</value> </property> Specify the replication factor for the MapReduce job jar file. Whenever a client submits a job, the jar file is copied to many nodes (the default is 10) so that it is available to TaskTrackers and NodeManagers. However, since there are only four storage nodes, HP sets the replication factor to three to avoid under-replication errors: <property> <name>mapred.submit.replication</name> <value>3</value> </property> Optimizing compute nodes The changes described here apply to compute nodes, which have less memory than storage nodes. Specify the amount of memory for the Reduce task. Note that UDA needs a lot of memory because it works with RAM, not disk; as a result, Mellanox recommends configuring more memory per reducer but fewer reducers. Thus, HP increased the amount of memory from the default 1GB to 6GB: yarn.app.mapreduce.am.resource.mb 2048 Mib (default 1 Gib) mapreduce.reduce.memory.mb 6144 Mib (default 1 Gib) Specify the amount of memory for all YARN containers on a node. Since compute nodes each have 32GB of memory, HP recommends allocated 24GB for YARN: yarn.nodemanager.resource.memory-mb 24 Gib (default 8 Gib) Specify the number of virtual CPU cores for all YARN containers on a node; HP recommends using discretion: yarn.nodemanager.resource.cpu-vcores 14 (default 8) Increase the minimum memory allocation by ResourceManager for each container from the default 1GB to 1,536MB to match the map task memory setting: yarn.scheduler.minimum-allocation-mb 1536 Mib Reduce the maximum memory allocation by ResourceManager for each container from the default 64GB to 24GB: yarn.scheduler.maximum-allocation-mb 24 Gib (default 64 Gib) #below three settings not valid for capacity scheduler yarn.scheduler.fair.assignmultiple true (default false) yarn.scheduler.fair.locality-delay-node-ms 0 yarn.scheduler.fair.locality-delay-rack-ms 0 Increase the maximum number of job counters to accommodate workloads such as Hive metrics: mapreduce.job.counters.max 1000 (default 120) 14
15 Give all MapReduce resources (RAM/CPU) to mappers during the map phase and to reducers during the reduce phase. Since UDA s levitated merge causes reducers to wait until all mappers complete, there is no need to give reducers resources while mappers are running and vice versa. resource manager java heap size 4 GiB Compute node configuration The compute node configuration is as follows: System memory: 32GB Logical cores: 8 Disks: 1 Reserved memory: 12GB Available memory: 20GB Minimum container size: 2GB Number of containers: 10 RAM per container: 2GB Based on this configuration, the following settings are recommended in mapred-site.xml: yarn.app.mapreduce.am.command-opts "-Xmx3277" mapreduce.map.memory.mb 1536 mapreduce.reduce.memory.mb 6144 mapreduce.map.java.opts -Xmx1228 mapreduce.reduce.java.opts -Xmx4915 Configuring compression HP recommends using compression since it reduces file size on disk and speeds up disk and network I/Os. Set the following MapReduce parameter to compress job output files: mapreduce.output.fileoutputformat.compress=true Bill of materials The BOMs outlined in Tables 3-10 below are based on the tested configuration for a Single-Rack Reference Architecture with one management node, two Head nodes, 18 worker nodes and two ToR switches. Quantities specified on each table are based on per server configuration. The following BOMs contain electronic license to use (E-LTU) parts. Electronic software license delivery is now available in most countries. HP recommends purchasing electronic products over physical products (when available) for faster delivery and for the convenience of not tracking and managing confidential paper licenses. For more information, please contact your reseller or an HP representative. Red Hat Entitlements With the Red Hat Enterprise Linux unlimited VM entitlement (BC323AAE RHEL Srv, 2 skt, Unltd Guest 24x7 3yr Lic), users have the right to run an unlimited number of desktop or server VMs on each entitled Red Enterprise Virtualization Hypervisor node. As each environment is unique and may not run Red Hat Enterprise Linux exclusively, this entitlement is not included in the parts list for the reference architecture. Licensing of the virtual machines is the responsibility of the customer. Note Part numbers used are based on the part numbers that were used for testing and subject to change. The bill of materials does not include complete support options or other rack and power requirements. If you have questions regarding ordering, please consult with your HP Reseller or HP Sales Representative for more details. 15
16 Management node and head nodes Important Table 3 provides a BOM for one HP ProLiant DL360 Gen9 server. The tested solution featured one management server and two head nodes, requiring a total of three HP ProLiant DL360 Gen9 servers. Table 3. BOM for a single HP ProLiant DL360 Gen9 server Quantity Part number Description B21 HP DL360 Gen9 8SFF CTO Server L21 HP DL360 Gen9 E5-2650v3 FIO Kit B21 HP DL360 Gen9 E5-2650v3 Kit B21 HP 16GB 2Rx4 PC4-2133P-R Kit B21 HP 600GB 6G SAS 10K 2.5in SC ENT HDD B21 HP H240ar FIO Smart HBA B21 HP DL360 Gen9 SFF Embed SATA Cable B21 HP IB FDR/EN 10/40Gb 2P 544QSFP Adptr B21 HP DL360 Gen9 Loc Disc Svc Ear Kit B21 HP 1U SFF Easy Install Rail Kit B21 HP QSFP/SFP+ Adaptor Kit B21 HP 500W FS Plat Ht Plg Pwr Supply Kit 1 C6N36A HP Insight Control ML/DL/BL FIO Bndl Lic 1 G3J28AAE RHEL Svr 2 Sckt/2 Gst 1yr 24x7 E-LTU 2 SG508A HP C13 - C14 WW 250V 10Amp IPD 1.37m 1pc Jumper Cord B23 HP 3M 4X DDR/QDR QSFP IB Cu Cable 16
17 Compute nodes Important Table 4 provides a BOM for one HP Moonshot 1500 chassis with 45 m710 server cartridges. The tested solution featured one Moonshot chassis with a total of 45 server cartridges. Table 4. BOM for a single HP Moonshot chassis with 45 server cartridges Quantity Part number Description B21 HP Moonshot 1500 Chassis B21 HP 1500W Ht Plg Pwr Supply Kit B21 HP Moonshot-45XGc Switch Kit B21 HP Moonshot 4QSFP Uplink Kit B21 HP ProLiant m710 Server Cartridge B21 HP Moonshot 480GB M FIO Kit B21 HP 4.3U Rail Kit 45 C6N36AAE HP Insight Control ML/DL/BL Bundle E-LTU C6N36A HP Insight Control ML/DL/BL FIO Bndl Lic (optional if E-LTU is not available) 45 G3J28AAE RHEL Svr 2 Sckt/2 Gst 1yr 24x7 E-LTU 17
18 Storage nodes Important Table 5 provides a BOM for one HP SL4500 chassis with two SL4540 servers. The tested solution featured two chassis and a total of four servers. Table 5. BOM for a single HP SL4500 chassis with two nodes Quantity Part number Description B22 HP 2xSL4500 Chassis B21 HP 750W CS Gold Ht Plg Pwr Supply Kit B21 HP 4.3U Rail Kit B21 HP 0.66U Spacer Blank Kit B22 HP 2xSL4540 Gen8 Tray Node Svr L21 HP SL4540 Gen8 E5 2450v2 FIO Kit B21 HP SL4540 Gen8 E5-2450v2 Kit B21 HP 16GB 2Rx4 PC3L-12800R-11 Kit B21 HP 2GB FBWC for P-Series Smart Array B21 HP 500GB 6G SATA 7.2k 2.5in SC MDL HDD B21 HP 3TB 6G SAS 7.2K 3.5in SC MDL HDD B21 HP Smart Array P420i Mezz Ctrllr FIO Kit B21 HP SL4500 Storage Mezz to PCIe Opt Kit B21 HP 12in Super Cap for Smart Array 2 G3J28AAE RHEL Svr 2 Sckt/2 Gst 1yr 24x7 E-LTU 2 C6N27A HP Insight Control Lic B21 HP IB FDR/EN 10/40Gb 2P 544QSFP Adptr B23 HP 3M 4X DDR/QDR QSFP IB Cu Cable 18
19 Networking Table 6 provides a BOM for two ToR switches and one HP ilo switch, as featured in the tested configuration. Table 6. BOM for two HP 5930 switches (ToR) switches and one HP 5900 switch (HP ilo) Quantity Part number Description 2 JG726A HP FF QSFP+ Switch 4 JG553A HP X712 Bck(pwr)-Frt(prt) HV Fan Tray 4 JC680A HP A58x0AF 650W AC Power Supply 4 JC680A#B2B Power PDU Cable 1 JG510A HP 5900AF-48G-4XG-2QSFP+ Switch 2 JC680A HP A58x0AF 650W AC Power Supply 2 JC682A HP 58x0AF Bck(pwr)-Frt(ports) Fan Tray 4 JG330A QSFP+ to 4SFP+ 3m DAC cable 20 JG327A HP X240 40G QSFP+ QSFP+ 3m DAC Cable 10 JG326A HP X240 40G QSFP+ QSFP+ 1m DAC Cable 12 AF595A HP 3.0M,Blue,CAT6 STP,Cable Data Other hardware Important Quantities listed in Table 7 are based on a full rack with three switches, 45 compute nodes, and four storage nodes. Table 7. BOM for a single rack with four PDUs Quantity Part number Description 1 BW908A HP mm Shock Intelligent Rack 1 BW908A 001 HP Factory Express Base Racking Service 1 BW946A HP 42U Location Discovery Kit 1 BW930A HP Air Flow Optimization Kit 1 BW930A B01 Include with complete system 1 BW909A HP 42U 1200mm Side Panel Kit 1 BW891A HP Rack Grounding Kit 4 AF520A HP Intelligent Mod PDU 24a Na/Jpn Core 6 AF547A HP 5xC13 Intlgnt PDU Ext Bars G2 Kit 19
20 HP Insight Cluster Management Utility options Important Options listed in Table 8 are based on a single node. Table 8. BOM for HP Insight Cluster Management Utility (Insight CMU) options, per-node Quantity Part number Description 1 QL803B HP Insight CMU 1yr 24x7 Flex Lic 1 QL803BAE HP Insight CMU 1yr 24x7 Flex E-LTU 1 BD476A HP Insight CMU 3yr 24x7 Flex Lic 1 BD476AAE HP Insight CMU 3yr 24x7 Flex E-LTU 1 BD477A HP Insight CMU Media Hortonworks software Important Options listed in Table 9 are based on a single node. While HP is a certified reseller of Hortonworks software subscriptions, all application support (level-one through level-three) is provided by Hortonworks. Table 9. BOM for Hortonworks software Quantity Part number Description 5 F5Z52A Hortonworks Data Platform Enterprise 4 Nodes or 50TB Raw Storage 1 year 24x7 Support LTU 20
21 Archival node Note Table 10 lists the parts for HP Apollo 4510, the recommended optional server to provide ultra-dense backup and archival storage as required by the big data needs of the organization. Table 10. BOM for an HP Apollo 4510 server Quantity Part number Description B21 HP Apollo 4510 Gen9 CTO Chassis B21 HP ProLiant XL450 Gen9 Configure-to-order Server Node L21 HP Apollo 4510 Gen8 E5-2630v3 FIO Processor B21 HP Apollo 4510 Gen8 E5-2630v3 Kit B21 HP 16GB 2Rx4 DDR Kit B21 HP H244br FIO Smart HBA B21 HP H240 Smart HBA B21 HP 500GB 6G SATA 7.2k 2.5in SC MDL HDD B21 HP 4TB 6G SATA 7.2K rpm LFF (3.5-inch) Low Profile Midline 1yr Warranty Hard Drive B21 HP IB FDR/EN 10/40Gb2P 544FLR-QSFP Adptr B21 HP 800W FS Universal Hot Plug Power Supply Kit B21 HP XL4510 8HDD Cage Kit B21 HP A4500 MiniSAS X4-X4 1A 1B Cable Kit B21 HP 4.3U Rail Kit 1 G3J28AAE RHEL Svr 2 Sckt/2 Gst 1yr 24x7 E-LTU 1 C6N27A HP Insight Control Lic B23 HP 3M 4X DDR/QDR QSFP IB Cu Cable Summary HP BDRA is a modern, flexible architecture for big data solutions that improves overall access to information and time-tosolution, while providing the inherent flexibility to support the rapidly changing requirements of big data applications. HP BDRA leverages HP s innovative big data building blocks of servers, storage, and networking, along with integrated management software and bundled support. HP BDRA and HDP allow you to derive new business insights from big data by providing a platform to store, manage, and process data at scale. However, the design, procurement and deployment of a Hadoop cluster can be both complex and time consuming. Thus, this white paper outlines reference architectures for heterogeneous clusters of varying sizes with HDP 2.2 on HP infrastructure and management software. Guidelines for optimizing HDP software settings are provided. To address modern, ever-increasing demands for data backup and archiving, the ultra-dense HP Apollo 4510 was introduced as an optional backup server. As a result, the ratio of cost to TB of storage has never looked better. 21
22 Advantages of HP BDRA Extremely elastic Because HP BDRA is built around a two-tier system that de-couples compute and storage resources, the earlier commitment to fixed compute/storage ratios has been discarded; and tiers can now scale independently. You only need to grow your cluster where needed, thus maximizing solution density while minimizing Total Cost of Ownership (TCO). Moreover, nodes can be allocated by time-of-day or even for a single job without redistributing data. Consolidation HP BDRA provides a distributed file system that has enough performance and can scale to large enough capacities to become the single source for big data within your organization. You no longer need to maintain multiple pools of data for various big data projects; now, you can consolidate that data into a single solution that can be shared by a broad range of data management tools. This consolidation can reduce your operational and management costs, while ensuring that the data does not become fragmented. Optimization and unification of big data projects HP BDRA provides an ideal platform to deliver support for different types of workloads. Compute cores can be allocated as needed to provide better capacity management; and, since storage nodes are a smaller subset of the cluster, they are less costly to overprovision. Implementing a proof-of-concept As a matter of best practice for all deployments, HP recommends implementing a proof-of-concept using a test environment that matches as closely as possible the planned production environment. In this way, appropriate performance and scalability characterizations can be obtained. For help with a proof-of-concept, contact an HP Services representative ( or your HP partner. 22
23 Appendix A: Alternate parts for storage nodes This appendix provides BOMs for alternate processors, memory, and disk drives for the SL4540 servers used as storage nodes. Table A-1. Alternate processors Quantity per node Part number L B L B L B21 Description HP SL4540 Gen8 E5 2470v2 FIO Kit 10 cores at 2.4GHz HP SL4540 Gen8 E5-2470v2 Kit HP SL4540 Gen8 E5 2450v2 FIO Kit 8 cores at 2.5GHz HP SL4540 Gen8 E5-2450v2 Kit HP SL4540 Gen8 E5 2440v2 FIO Kit 8 cores at 1.9GHz HP SL4540 Gen8 E5-2440v2 Kit Table A-2. Alternate memory SL4540 Quantity per node Part number Description B21 HP 16GB 2Rx4 PC3L-12800R-11 Kit for 96GB of Memory B21 HP 16GB 2Rx4 PC3L-12800R-11 Kit for 192GB of Memory Table A-3. Alternate disk drives Quantity per node Part number Description B21 HP 2TB 6G SAS 7.2K 3.5in SC MDL HDD B21 HP 3TB 6G SAS 7.2K 3.5in SC MDL HDD B21 HP 4TB 6G SAS 7.2K 3.5in SC MDL HDD Appendix B: HP value added services and support In order to help you jump-start your Hadoop solution development, HP offers a range of big data services, which are outlined in this appendix. Factory Express Services Factory-integration services are available for customers seeking a streamlined deployment experience. With the purchase of Factory Express services, your Hadoop cluster will arrive racked and cabled, with software installed and configured per an agreed-upon custom statement of work, for the easiest deployment possible. You should contact HP Technical Services for more information and for assistance with a quote. Technical Services Consulting Reference Architecture Implementation Service for Hadoop (Hortonworks) With HP Reference Architecture Implementation Service for Hadoop, HP can install, configure, deploy, and test a Hadoop cluster that is based on HP BDRA. Experienced consultants implement all the details of the original Hadoop design: naming, hardware, networking, software, administration, backup, disaster recovery, and operating procedures. Where options exist, or the best choice is not clear, HP works with you to configure the environment to meet your goals and needs. HP also conducts an acceptance test to validate that the system is operating to your satisfaction. 23
24 Technical Services Consulting Big Data Services HP Big Data Services can help you reshape your IT infrastructure to corral increasing volumes of data from s, social media, and website downloads and convert them into beneficial information. These services encompass strategy, design, implementation, protection, and compliance. Delivery is in the following three steps: 1. Architecture strategy: HP defines the functionalities and capabilities needed to align your IT with your big data initiatives. Through transformation workshops and roadmap services, you ll learn to capture, consolidate, manage and protect business-aligned information, including structured, semi-structured, and unstructured data. 2. System infrastructure: HP designs and implements a high-performance, integrated platform to support a strategic architecture for big data. Choose from design and implementation services, reference architecture implementations, and integration services. Your flexible, scalable infrastructure will support big data variety, consolidation, analysis, share, and search on HP platforms. 3. Data protection: Ensure the availability, security, and compliance of your big data systems. HP can help you safeguard your data and achieve regulatory compliance and lifecycle protection across your big data landscape, while also enhancing your approach to backup and business continuity. For additional information, visit hp.com/services/bigdata. HP Support options HP offers a variety of support levels to meet your needs. More information is provided below. HP Support Plus 24 HP can provide integrated onsite hardware/software support services, available 24x7x365, including access to HP technical resources, four-hour response onsite hardware support and software updates. HP Proactive Care HP Proactive Care provides all of the benefits of proactive monitoring and reporting, along with rapid reactive support through HP s expert reactive support specialists. You can customize your reactive support level by selecting either six-hour call-to-repair or 24x7 with four-hour onsite response. HP Proactive Care helps prevent problems, resolve problems faster, and improve productivity. Through analysis, reports, and update recommendations, you are able to identify and address IT problems before they can cause performance issues or outages. HP Proactive Care with the HP Personalized Support Option Adding the Personalized Support Option for HP Proactive Care is highly recommended. This option builds on the benefits of HP Proactive Care Service, providing you an assigned Account Support Manager who knows your environment and can deliver support planning, regular reviews, and technical and operational advice specific to your environment. HP Proactive Select To address your ongoing/changing needs, HP recommends adding Proactive Select credits to provide tailored support options from a wide menu of services that can help you optimize the capacity, performance, and management of your environment. These credits may also be used for assistance in implementing solution updates. As your needs change over time, you have the flexibility to choose the services best suited to address your current challenges. HP Datacenter Care HP Datacenter Care provides a more personalized, customized approach for large, complex environments, providing a single solution for reactive, proactive, and multi-vendor support needs. You may also choose the Defective Media Retention (DMR) option. Other offerings HP highly recommends HP Education Services (customer training and education) and additional Technical Services, as well as in-depth installation or implementation services when needed. More information For additional information, visit: HP Education Services: HP Technology Consulting Services: hp.com/services/bigdata HP Deployment Services: hp.com/services/deployment 24
25 For more information Hortonworks: hortonworks.com Hortonworks partner site: hortonworks.com/partner/hp/ HP Solutions for Apache Hadoop: hp.com/go/hadoop HP Insight Cluster Management Utility (Insight CMU): hp.com/go/cmu HP 5930 Switch Series: hp.com/networking/5930 HP ProLiant servers: hp.com/go/proliant HP Moonshot system: hp.com/go/moonshot HP Enterprise Software: hp.com/go/software HP Networking: hp.com/go/networking HP Integrated Lights-Out (HP ilo): hp.com/servers/ilo HP Product Bulletin (QuickSpecs): hp.com/go/quickspecs HP Services: hp.com/go/services HP Support and Drivers: hp.com/go/support HP Systems Insight Manager (HP SIM): hp.com/go/hpsim HP Trafodion: To help us improve our documents, please provide feedback at hp.com/solutions/feedback. About Hortonworks Hortonworks develops, distributes and supports the only 100% open source Apache Hadoop data platform. Our team comprises the largest contingent of builders and architects within the Hadoop ecosystem who represent and lead the broader enterprise requirements within these communities. The Hortonworks Data Platform provides an open platform that deeply integrates with existing IT investments and upon which enterprises can build and deploy Hadoop-based applications. Hortonworks has deep relationships with the key strategic data center partners that enable our customers to unlock the broadest opportunities from Hadoop. For more information, visit hortonworks.com. Sign up for updates hp.com/go/getupdated Copyright Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft and Windows are trademarks of the Microsoft group of companies. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Red Hat is a registered trademark of Red Hat, Inc. in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. 4AA5-6136ENW, August 2015, Rev. 1
HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation
Technical white paper HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation HP Big Data Reference Architecture with Hortonworks Data Platform for Apache Hadoop
HP Verified Reference Architecture for running HBase on HP BDRA
HP Reference Architectures HP Verified Reference Architecture for running HBase on HP BDRA Configuration guide for running HBase on HP Big Data Reference Architecture Table of contents Executive summary...
HP Big Data Reference Architecture for Apache Spark
HP Reference Architectures HP Big Data Reference Architecture for Apache Spark Using the HP Apollo 4200 server, Apollo 2000 system, and Hortonworks HDP 2.x Table of contents Executive summary... 2 Introduction...
HP Reference Architecture for Hortonworks Data Platform on HP ProLiant SL4540 Gen8 Server
Technical white paper HP Reference Architecture for Hortonworks Data Platform on HP Server HP Converged Infrastructure with the Hortonworks Data Platform for Apache Hadoop Table of contents Executive summary...
HP Reference Architecture for Cloudera Enterprise on ProLiant DL Servers
Technical white paper HP Reference Architecture for Cloudera Enterprise on ProLiant DL Servers HP Converged Infrastructure with Cloudera Enterprise for Apache Hadoop Table of contents Executive summary...
HP Big Data Reference Architecture: A Modern Approach
Technical white paper HP Big Data Reference Architecture: A Modern Approach HP BDRA with Apollo 2000 System and Apollo 4200 Servers Table of contents Executive summary... 2 Introduction... 2 Breaking the
HP Reference Architecture for Cloudera Enterprise
Technical white paper HP Reference Architecture for Cloudera Enterprise HP Converged Infrastructure with Cloudera Enterprise for Apache Hadoop Table of contents Executive summary 2 Cloudera Enterprise
HP RA for SAS Visual Analytics on HP ProLiant BL460c Gen8 Servers running Linux
Technical white paper HP RA for SAS Visual Analytics on Servers running Linux Performance results with a concurrent workload of 5 light users and 1 heavy user accessing a 112GB dataset Table of contents
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
HP ConvergedSystem 300 for Microsoft Analytics Platform
Technical white paper HP ConvergedSystem 300 for Microsoft Analytics Platform Data Integration Platform Gen9 reference guide Table of contents Executive summary... 2 Overview of services for the Data Integration
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
Annex 9: Private Cloud Specifications
Annex 9: Private Cloud Specifications The MoICT Private Cloud Solution is based on the Data Center Services (DCS) offering from CMS. DCS is comprised of a fabric in the form of one or more resource pools
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS)
HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS) Building blocks at 1000, 3000, and 5000 mailboxes Table of contents Executive summary...
Installing Hadoop over Ceph, Using High Performance Networking
WHITE PAPER March 2014 Installing Hadoop over Ceph, Using High Performance Networking Contents Background...2 Hadoop...2 Hadoop Distributed File System (HDFS)...2 Ceph...2 Ceph File System (CephFS)...3
HP recommended configuration for Microsoft Exchange Server 2010: ProLiant DL370 G6 supporting 1000-2GB mailboxes
HP recommended configuration for Microsoft Exchange Server 2010: ProLiant DL370 G6 supporting 1000-2GB mailboxes Table of contents Executive summary... 2 Introduction... 3 Tiered solution matrix... 3 Recommended
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Base Rack - Front View
Overview 1. HP InfiniBand FDR 36 Port Switch (2) 2. HP Ethernet Switch 5900-48 G -4XG-2QSFP+ (2) Base Rack - Front View 3. PDW Region - Orchestration & Failover : HP ConvergedSystem 300 2.0 for Microsoft
HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture
WHITE PAPER HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture Based on Microsoft SQL Server 2014 Data Warehouse
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
HP ProLiant DL380p Gen8 1000 mailbox 2GB mailbox resiliency Exchange 2010 storage solution
Technical white paper HP ProLiant DL380p Gen8 1000 mailbox 2GB mailbox resiliency Exchange 2010 storage solution Table of contents Overview 2 Disclaimer 2 Features of the tested solution 2 Solution description
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering
Technical white paper HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering Table of contents Executive summary 2 Fast Track reference
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node. HP Vertica Analytic Database
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node HP Vertica Analytic Database HP Big Data Foundations Document Release Date: February, 2015 Contents Using the HP DL380 Gen9 24-SFF
HP Reference Architecture for OpenStack on Ubuntu 14.04 LTS
Technical white paper HP Reference Architecture for OpenStack on Ubuntu 14.04 LTS Table of contents Acknowledgements... 2 Overview... 2 About OpenStack... 2 Purpose of this reference architecture... 2
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
QuickSpecs. What's New Support for two InfiniBand 4X QDR 36P Managed Switches
QuickSpecs Overview 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Control Rack - Front View InfiniBand 4X QDR 36P Managed Switch (2) Backup server and storage (configuration and space required dependent upon total capacity
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Hadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
Dell s SAP HANA Appliance
Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
s/n item Requirements 1 Upgrading of Network Core Switches and Microsoft Servers Virtualization
Date: 29 th January, 2015 Invitation for bids The Bank of South Sudan would like to invite sealed/electronic bids from qualified companies to submit it Financial and Technical proposals to the bank on
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
OPTIMIZING SERVER VIRTUALIZATION
OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)
QuickSpecs. What's New HP 1.2TB 6G SAS 10K rpm SFF (2.5-inch) SC Enterprise 3yr Warranty Hard Drive
Overview Serial Attached SCSI (SAS) provides a superior storage solution. With some storage requirements escalating and others becoming more complex, factors such as flexibility, performance, increased
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop
Lenovo ThinkServer and Cloudera Solution for Apache Hadoop For next-generation Lenovo ThinkServer systems Lenovo Enterprise Product Group Version 1.0 December 2014 2014 Lenovo. All rights reserved. LENOVO
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN Table of contents Executive summary... 2 Introduction... 2 Solution criteria... 3 Hyper-V guest machine configurations...
Hortonworks Data Platform Reference Architecture
Hortonworks Data Platform Reference Architecture A PSSC Labs Reference Architecture Guide December 2014 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.
How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (
Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...
HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture
Technical white paper HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture Table of contents Executive summary... 2 Solution overview... 3 Solution components... 4 Storage... 5 Compute...
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK KEY FEATURES PROVISION FROM BARE- METAL TO PRODUCTION QUICKLY AND EFFICIENTLY Controlled discovery with active control of your hardware Automatically
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
QuickSpecs. HP SATA Hard Drives. Overview
QuickSpecs Overview HP SATA drives are designed for the reliability and larger capacities demanded by today's entry server and external storage environments. The SATA portfolio is categorized into two
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
HP Client Virtualization SMB Reference Architecture for Windows Server 2012
Technical white paper HP Client Virtualization SMB Reference Architecture for Windows Server 212 Affordable, high-performing desktop virtualization on HP ProLiant DL38p Gen8 and Windows Server 212 Table
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a
SAP HANA - an inflection point
SAP HANA forms the future technology foundation for new, innovative applications based on in-memory technology. It enables better performing business strategies, including planning, forecasting, operational
<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures
1 Refreshing Your Data Protection Environment with Next-Generation Architectures Dale Rhine, Principal Sales Consultant Kelly Boeckman, Product Marketing Analyst Program Agenda Storage
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Intel RAID SSD Cache Controller RCS25ZB040
SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster
Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware
Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference
Microsoft s Open CloudServer
Microsoft s Open CloudServer Page 1 Microsoft s Open CloudServer How is our cloud infrastructure server design different from traditional IT servers? It begins with scale. From the number of customers
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
Microsoft Private Cloud Fast Track Reference Architecture
Microsoft Private Cloud Fast Track Reference Architecture Microsoft Private Cloud Fast Track is a reference architecture designed to help build private clouds by combining Microsoft software with NEC s
Michael Kagan. [email protected]
Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies [email protected] Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service
Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters
CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
QuickSpecs. What's New HP 3TB 6G SAS 7.2K 3.5-inch Midline Hard Drive. HP SAS Enterprise and SAS Midline Hard Drives. Overview
Overview Serial Attached SCSI (SAS) provides a superior storage solution. With some storage requirements escalating and others becoming more complex, factors such as flexibility, performance, increased
A very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
SUN ORACLE EXADATA STORAGE SERVER
SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
WHAT S NEW IN SAS 9.4
WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support
HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper
Technical white paper HP Moonshot System A new style of IT accelerating innovation at scale Table of contents Abstract... 2 Meeting the new style of IT requirements... 2 What makes the HP Moonshot System
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
HP StorageWorks P2000 G3 and MSA2000 G2 Arrays
HP StorageWorks P2000 G3 and MSA2000 G2 Arrays Family Data sheet How can the flexibility of the HP StorageWorks P2000 G3 MSA Array Systems help remedy growing storage needs and small budgets? By offering
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Copyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
