Intel Distribution for Apache Hadoop on Dell PowerEdge Servers

Size: px
Start display at page:

Download "Intel Distribution for Apache Hadoop on Dell PowerEdge Servers"

Transcription

1 Intel Distribution for Apache Hadoop on Dell PowerEdge Servers A Dell Technical White Paper Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution Architect Dell Solution Centers Dave Jaffe, Ph.D. Solution Architect Dell Solution Centers Rob Wilbert Solution Architect Dell Solution Centers

2 Executive Summary This document details the deployment of Intel Distribution for Apache Hadoop* software on the PowerEdge R720XD. The intended audiences for this document are customers and system architects looking for information on implementing Apache Hadoop clusters within their information technology environment for Big Data analytics. The reference configuration introduces all the high-level components, hardware, and software that are included in the stack. Each high-level component is then described individually. Dell developed this document to help streamline deployment, provide best practices and improve the overall customer experience. THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell. Dell, the DELL logo, and the DELL badge are trademarks of Dell Inc. Intel and Xeon are registered trademarks of Intel Corp. Red Hat is a registered trademark of Red Hat Inc. Linux is a registered trademark of Linus Torvalds. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own. July Dell Intel Distribution for Apache Hadoop

3 Table of Contents 1 Introduction Dell Solution Centers Dell s Point Of View on Big Data Intel Distribution for Apache Hadoop... 9 Hadoop Use-Cases Intel s Contributions to Open Source Intel Hadoop Solution Software Components Server Roles Best Practices for Running Intel Distribution of Apache Hadoop on Dell Node Count Recommendations Hardware Recommendations Monitoring Resiliency Performance Software Considerations Installation Environment Assumptions High Availability Installation Considerations Testing HiBench...21 Teragen / Terasort...21 Tested Configuration...21 Tuning and Optimization of Workloads Conclusions Resources Links Additional Whitepapers Tables Table 1. Recommended Cluster Sizes Table 2. Software Revisions...21 Table 3. PowerEdge R720 Infrastructure Node As Tested Configuration Dell Intel Distribution for Apache Hadoop

4 Table 4. PowerEdge R720XD Datanode As Tested Configuration Table 5. Key Hadoop Configuration Parameters Figures Figure 1. Dell Solution Centers Locations... 7 Figure 2. Big Data Demands... 8 Figure 3. Intel Foundational Technologies for Hadoop Performance... 9 Figure 4. Dell Big Data Cluster Logical Diagram...13 Figure 5. Ganglia Performance Monitor Tool (Included with IDH)...13 Figure 6. Cluster Network Diagram Figure 7. Dell s OpenManage Power Center Figure 8. Dell R720XD models with 2.5 and 3.5 inch drives Figure 9. The Role Assignment dropdown for HDFS roles Figure 10. Mount Points are configured below for the dfs.data.dir directories Figure 11. Intel Active Tuning Technology Dell Intel Distribution for Apache Hadoop

5 1 Introduction Hadoop is an Apache open source project being built and used by a global community of contributors, using the Java programming language. Hadoop s architecture is based on the ability to scale in a nearly linear capacity. By harnessing the power of this tool, many customers who previously would have had difficulty sorting through their complex data can now deliver value faster, provide deeper insight, and even develop new business models based off the speed and flexibility these analytics provide. However, installing, configuring and running Hadoop is not trivial. There are different roles and configurations that need to be deployed on various host computers. Designing, deploying and optimizing the network layer to match Hadoop s scalability requires consideration for the type of workloads that will be running on the Hadoop cluster. These issues are complicated by both the fast-moving pace of the core Hadoop project and the challenges of managing a system designed to scale to thousands of nodes in a cluster. Dell s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on highly scalable hardware. Dell listened to its customers and partnered with Intel to design a Hadoop solution that is unique in the marketplace, combining optimized hardware, software, and services to streamline deployment and improve the customer experience. Intel has created a high quality, controlled distribution of Hadoop and offers commercial management software, updates, support and consulting services. The Intel Distribution for Apache Hadoop (IDH) software includes: The Intel Manager for Apache Hadoop software to install, configure, monitor and administer the Apache Hadoop cluster Enhancements to HBase and Hive for improved query performance and end user experience Resource monitoring capability using Nagios and Ganglia in the Intel Manager Superior security and performance through tightly integrated encryption and compression, authentication and access control. Packaged Apache Hadoop ecosystem that includes HBase, Hive, and Apache Pig, among other tools This solution provides a foundational platform for Intel to offer additional solutions as the Apache Hadoop ecosystem evolves and expands. Aside from the Apache Hadoop core technology (HDFS, MapReduce, etc.) Intel has designed additional capabilities to address specific customer needs for Big Data applications such as: Optimal installation and configuration of the Apache Hadoop cluster Monitoring, reporting, and alerting of the hardware and software components Providing job-level metrics for analyzing specific workloads deployed in the cluster Infrastructure configuration automation 5 Dell Intel Distribution for Apache Hadoop

6 In recent tests in the Dell Solution Center, the Intel Distribution for Apache Hadoop Release was installed and tested on a cluster of Dell PowerEdge R720 servers, resulting in a set of best practices for installing IDH on Dell clusters. The next sections describe the role of the Dell Solution Centers and Dell s point of view on Big Data, followed by details of the IDH solution and IDH software components. Finally the best practices developed by the Solution Center and the results of the IDH on Dell tests are described. 6 Dell Intel Distribution for Apache Hadoop

7 2 Dell Solution Centers The Dell Solution Centers (DSC) are a global network of connected labs that allow Dell to help customers architect, validate and build solutions across Dell s entire enterprise portfolio. The Dell Intel Cloud Acceleration Program (DICAP), a team within the Dell Solution Centers, has the mission of providing customer engagements on the topics of Cloud and Big Data. With centers in every region, the DSC engages customers through informal minute briefings, longer half-day architectural design sessions, and one to two-week proof-ofconcept tests that enables customers to kick the tires of Dell solutions prior to purchase. Interested customers should engage with their Dell account team to access the services of the DSC. Figure 1. Dell Solution Centers Locations Sao Paulo and Dubai coming in the second half of Dell Intel Distribution for Apache Hadoop

8 3 Dell s Point Of View on Big Data Big Data is a term often hyped in the IT press. There are many different interpretations of what exactly this means. In Dell s point of view the methods and principles of Big Data aren t new to the computer industry. In High Performance Clustered Computing (HPCC), data warehouses, and traditional databases, Dell has been providing these solutions for years. What has changed is the scale at which such tools need to operate. Every new device in use in today s society gathers more and more data and the need to store, report and analyze it is paramount. The term big can be implied on a variety of different scales: (See Figure 2) Volume no longer in the realm of gigabytes, but rather terabytes or petabytes. Velocity devices now can generate more data in a small time than can be ingested using traditional means. Variety with the data types and schemas of all the various datasets differing so much, being able to use a common datastore and to query across them provides tremendous value. Figure 2. Big Data Demands 8 Dell Intel Distribution for Apache Hadoop

9 4 Intel Distribution for Apache Hadoop Dell continues to hear from customers about their Big Data challenges, specifically a need for solutions that allow flexibility and choice while enabling key insights from their data. Based on customer conversations and Dell s experience in providing Hadoop solutions, one size does not fit all. Each Hadoop distribution offers unique features and benefits. For this very reason, Dell is introducing the partnership with Intel for the Intel Distribution for Apache Hadoop* software on the PowerEdge R720XD. The Dell and Intel partnership is good for all customers that want value from their data. Both companies share a common goal to help build a robust Apache Hadoop ecosystem that is enterprise ready, allowing all customers to take advantage of this disruptive technology. The partnership provides stability to the Apache Hadoop open source project; both companies have long term strategies that will help drive the right capabilities and features bringing the most value to customers. Intel brings a unique value proposition for customers: the ability to enable an optimized solution from the CPU silicon all the way to the Hadoop distribution. Intel is is the only vendor that can marry CPU technologies, SSD technology and 10Gb Ethernet to benefit Hadoop performance. The Intel Distribution for Apache Hadoop software focuses on performance and security. The Dell and Intel strategy is to reinforce the Hadoop distribution by making it more enterprise ready and provide a viable platform for big data workloads in all IT environments. The Intel Distribution for Apache Hadoop software is especially suited for use cases where security and performance and ease of data management are key needs. Figure 3. Intel Foundational Technologies for Hadoop Performance 9 Dell Intel Distribution for Apache Hadoop

10 Hadoop Use-Cases The Intel Distribution for Apache Hadoop has been deployed in many different customer scenarios. A few use cases that stand out are in healthcare, telecommunications and smart-grid technology: Healthcare Customers use the massive database capabilities of IDH to store and process the human genome, evaluate pharmaceutical results and make patient care decisions. In genomic research,, the fact that each human genome consists of 3.2 billion base-pairs with upwards of 4 million variants, drives the need for a cost-effective, high performance, scalable data processing engine.. At the same time, the deep security enhancements IDH provides are of major importance to the healthcare industry s strict compliance regulations. Telecommunications More and more mobile devices are getting into the hands of people all over the world. The billing systems for mobile providers need to be able to track call lengths and durations, text messages and data usage. More importantly they need to be able to report on this in near real-time. Hadoop is used instead of traditional massively parallel processing (MPP) and data-warehouse (DW) technologies due to its lower total cost of ownership (TCO) and inherent fault-tolerance. Energy Smart-Grid Mobile devices aren t the only thing generating new data streams. Smart power meters generate large streams of sensor data that can be used by energy and utility companies to optimize service delivery. The ability to efficiently store this data is allowing these companies to increase the rate of collection and provide additional, more granular detail. Traditional databases are proving to be incapable of handling the ingestion rate of this data at an affordable cost. Intel s Contributions to Hadoop As with many other open source projects, Hadoop s power owes itself to the community that developed it. Contribution to open source projects, either directly, or by enhancing the ecosystem drives further adoption and deepens utilization. Intel has a long history of both contributing to core open source projects (Linux Kernel, Hadoop and KVM) as well as creation of complementary projects. Two key programs to note in the context of Hadoop are: 10 Dell Intel Distribution for Apache Hadoop

11 Project Rhino This Intel-driven project enhances the data protection capabilities of Hadoop to address the security and compliance challenges around emerging use-cases. More details can be found at Project Panthera This project s goal is to provide full SQL support to help companies integrate Hadoop more deeply with their existing data analytics processes. More details can be found at 11 Dell Intel Distribution for Apache Hadoop

12 5 Intel Hadoop Solution Software Components Hadoop Distributed File System (HDFS) This is the clustered file system that is at the core of the Hadoop software stack. When data is stored on this file system it s automatically distributed for both resiliency and redundancy. In the default configuration, every file is stored 3 times on 3 different nodes. With Intel Hadoop, tunable parameters can be set to increase or decrease the file replication level as the file access frequency increases or decreases. MapReduce This is the distributed batch-oriented parallel processing framework that enables data analysis at a large scale. This framework is accessed by writing Java-based MapReduce jobs that get executed against datasets in HDFS. Hive Hive makes accessing the power of MapReduce more familiar to existing database customers. It exposes the data that resides on HDFS as a SQL-like database. Standard SQL queries run against this data will be translated into MapReduce by Hive and executed behind the scenes. With Intel Hadoop, Hadoop queries can run faster on data sets in Hbase. HBase Some use-cases dictate the need for faster response times than a batch-based job through Hive or MapReduce. For these use cases, HBase provides a non-relational, column-based, distributed database that resides directly on top of HDFS. This allows users to leverage HDFS s massive scalability to provide service to emerging non-traditional databases. The Hbase distribution in IDH is tuned to perform ad hoc queries faster via Hive for large datasets. Server Roles Name Node/JobTracker(s) These nodes serve as control nodes for the HDFS, MapReduce, and HBase processes. For HDFS, they own the block map and directory tree for all the data on the cluster. With MapReduce, they own the Job Tracker daemon that handles job execution and monitoring. Lastly with HBase, these servers are responsible for running the monitoring processes as well as owning any metadata operations. Production environments should have a primary and at least one standby Name Node. Data Node(s) These are the nodes that hold the data as well as execute the MapReduce jobs. They are generally filled with large amounts of local disks, enabling the parallel processing and distributed storage features of Hadoop. The number of Data Nodes is dictated by use case. Adding additional Data Nodes increases both performance and capacity simultaneously. Edge Node(s) These servers lie on the perimeter of the dedicated Hadoop network. They are where external users and business processes interact with the cluster. Often times they will have a number of Network Interface Cards (NICs) attached to the Hadoop network as well as separate NICs attached to the enterprise s production IT network. More Edge nodes can be added as external access requirements increase. Intel Manager Node This node is where the installation of the Intel Manager software will reside. It runs the configuration management processes, web server software, and performance monitoring software. In production installations, a dedicated server should fulfill this task. In smaller installations such at the one employed by Dell in these tests, this role was shared with the Edge Node. 12 Dell Intel Distribution for Apache Hadoop

13 Figure 4. Dell Big Data Cluster Logical Diagram Figure 5. Ganglia Performance Monitor Tool (Included with IDH) 13 Dell Intel Distribution for Apache Hadoop

14 6 Best Practices for Running Intel Distribution on Dell Node Count Recommendations Dell recognizes that use-cases for Hadoop range from small development clusters all the way through large multi petabyte production installations. Dell has a Professional Services team that sizes Hadoop clusters for a customer s particular use. As a starting point three cluster configurations can be defined for typical use: Minimum Development Cluster This is targeted at functional testing and may even be built from existing equipment. However, the performance of these types of clusters can be significantly lower as they don t benefit from the highly distributed nature of HDFS. Recommended Small Cluster This is a good starting point for customers taking the initial steps into running IDH in production. It provides some layers of resiliency that is expected in today s production IT world. Recommended Production Cluster This configuration provides all the available options for resiliency both at a hardware layer and software layer. In addition, it allows for an adequate number of data nodes to demonstrate the performance benefits of distributed storage and parallel computing. Table 1. Recommended Cluster Sizes Minimum Development Cluster Recommended Small Cluster Recommended Production Cluster Name Node(s) Edge Node(s) Data Node(s) Intel Manager Node Dell Intel Distribution for Apache Hadoop

15 1 GbE Switches GbE Switches Rack Units 9U 20U 42U 1 In this case a single node serves as the Name, Job Tracker, Edge and Intel Manager Node. 2 In some cases a single server can serve as both the Name and Job Trackers Figure 6. Cluster Network Diagram Hardware Recommendations Dell s complete portfolio really shines when building on comprehensive solutions. From the servers to the switches and even on down to the Racks and monitoring tools, the value of deploying on Dell is readily apparent. Monitoring Using the Dell Remote Access Card (DRACs) in the servers Dell customer can identify increases in power consumption and temperature through as they exercise the disks and CPUs. One great tool to aid with this is Dell s OpenManage Power Center. This tool uses the Intel Network Node Manager technology built into Dell Remote Access Controller (DRAC) to provide metrics and trigger alert events based on customer criteria. 15 Dell Intel Distribution for Apache Hadoop

16 Figure 7. Dell s OpenManage Power Center Resiliency In production clusters it s imperative to keep an eye towards mitigating as many points of failure as possible. However, it is important to keep in mind that Hadoop (both through HDFS and MapReduce) is meant to be natively tolerant of failures and will take care of much of the needed underlying work. That said, when investing in building a robust and resilient configuration here are key areas to focus on: Switches Multiple stacked Force 10 switches should be used for high availability. Force 10 S60 1GbE switches utilize stacking modules which provide for easier switch management and faster inter-switch communication. On the Force 10 S4810s there is the option of either stacking via the 10 or 40 GbE ports (FW ) or implementing Virtual Link Trunking if you plan to scale beyond the stacking limitations (See switch documentation for configuration maximums). NICs Either two single-port NIC cards or two dual-port cards are recommended in the administration servers to guard against PCI-E slot failures. This is not as crucial on datanodes due to datanode redundancy. Disks RAID is only recommended in the administration servers such as the Namenode. In the Data nodes it s strongly recommended to put as many separate disks as possible (no RAID). The flexibility of the PowerEdge R720XD really shines here since it can hold either (12) 3.5 drives or (24) 2.5 drives. 16 Dell Intel Distribution for Apache Hadoop

17 Figure 8. Dell R720XD models with 2.5 and 3.5 inch drives Performance Performance optimization is a matter that varies greatly from customer to customer. There are a few principles that should be considered in order to optimize cluster performance. Network While 10 GbE isn t required, multiple bonded NICs of the fastest speed possible are strongly recommended for the data network. Workloads vary on whether or not they can truly benefit from a fast network, but with the prevalence of 10 GbE, it would be a wise idea to invest ahead of the curve. You ll also want enterprise-grade switches with deep per-port packet buffers in order to handle the volume and density of traffic Hadoop can generate. For 1 GbE Dell Force10 Series 60 work well and at 10 GbE Dell Force10 S4810s are optimal. Disks A key principle of performance tuning is to eliminate input/output (IO) starvation at the CPU layer and contention at the disk level. From this comes the initial recommendation of a 1:1 ratio of disk spindle to physical processor core (with hyperthreading counting as half of one physical core for this purpose). The correct choices of disks and processors totally depends on the workload, which can vary from the heavily storage -centric, with massive disks and few processors, to heavily processor centric, with many cores and PCI-E SSDs. The Dell Professional Services team can provide consultation and assessment to help customers achieve the proper balance. The Dell PowerEdge R720XD provides the excellent flexibility with regards to drive and socket configurations. Memory Few Hadoop use-cases will be memory constrained but administration servers should have sufficient memory for index caching (128GB for a robust configuration). For the data nodes, while, there are emerging use-cases that call for high amounts of memory, it s been determined through Hadoop customer engagements in the Dell Solution Centers that 64GB is a good target initially. CPUs As mentioned above, the use-case will determine the correct balance of CPU, Memory, and disk speed. In performance use-cases the most cores (balancing out spindle count if not SSD) and the highest possible frequency CPUs are recommended. However, if you were more interested in storage capacity, you could look at some of the Intel Xeon E5-2600L series processors that are more energy efficient. 17 Dell Intel Distribution for Apache Hadoop

18 Software Considerations Installation Environment Assumptions Updated Operating System the selected OS should have appropriate updates applied prior to IDH installation. The IDH documentation lists supported OS versions as well as required updates. Package Management As part of the installation an existing OS package repository needs to be referenced. Additionally, a new repo for IDH software needs to be created. In some cases (Red Hat Enterprise Linux) this may mean registering the OS with the proper credentials. DNS Forward and reverse name resolution are required for installation. Hosts to host communication will be handled by hostname so this becomes imperative. This can be accomplished via /etc/hosts or a DNS server. NIC Bonding In order to get as much bandwidth and resiliency as possible, Dell, recommends implementing bonding on the NICs. In these tests, mode 6 (balance-alb) was used. Production Network Connectivity The Edgenode needs to be connected to the user s existing network in order to facilitate access to the cluster. The speed of this link should meet the needs of the inbound data ingestion plans (both in number of users/processes as well as volume of data). High Availability Production Hadoop workloads require a high degree of resiliency to achieve desired uptime goals. In IDH High Availability (HA) is handled in an Active/Passive manner using a number of components:. Distributed Replicated Block Devices (DRBD) allows a logical device to be mirrored between two disparate systems Pacemaker a Cluster Resource Management (CRM) framework that starts, stops, monitors and migrate resources automatically. Corosync a messaging framework, which Pacemaker uses, for internode communication. These tools, when used together, provide layers of redundancy for both the HDFS NameNode service and the MapReduce JobTracker. In order to enable HA, additional hardware may be required in the namenodes including extra NICs, more memory, and additional disks. While both, Namenode HA service as well as Jobtracker HA service failover is completely automatic, once the failover completes, in-flight jobs will be required to be resubmitted. High availability will require some additional network configuration as well. Virtual hostnames and IP addresses for both the NameNode and the TaskTracker HA functions must be identified and recorded in all /etc/hosts files, or DNS tables. It s worthy of note that the IDH 2.4 release is based off of the 1.x Hadoop open source project that had no HA option inherent, but Intel s distribution adds this capability. 18 Dell Intel Distribution for Apache Hadoop

19 Installation Considerations Role Assignments During the installation, the setup wizard prompts for specific role assignments of the cluster servers. It s a good idea to use the Edit Roles button on the last page of the wizard to double-check that each of the parameters was set correctly, as shown in Figure 9. Figure 9. The Role Assignment dropdown for HDFS roles Mount Points Mount points are key, to properly configure an optimized cluster. It s always best practice to be following the installation guide, and prior to starting HDFS or any of the services, make sure that the values set for dfs.data.dir (Figure 10) and mapred.data.dir are set to the appropriate mount points. In the case below, there is one mount point per physical spindle allocated. 19 Dell Intel Distribution for Apache Hadoop

20 Figure 10. Mount Points are configured below for the dfs.data.dir directories 20 Dell Intel Distribution for Apache Hadoop

21 7 Testing Setup HiBench Hibench is a Hadoop benchmark framework that consists of 9 typical workloads representing common Hadoop workloads. These consist of micro benchmarks, HDFS benchmarks, web search benchmarks, machine learning benchmarks, and data analytics benchmarks. For this paper the most well-known subset of the HiBench suite, the Teragen / Terasort benchmark, was employed to test system IO. Teragen / Terasort These two HDFS / MapReduce benchmarks are used in conjunction with each other to stress Hadoop systems and provide valuable metrics with regards to network, disk and CPU utilization. By starting with these as a baseline, Hadoop administrators can tune Hadoop s wide variety of parameters to get the desired performance. Teragen starts by generating flat text files that contain pseudo-random data that Terasort then sorts. This type of sort / shuffle exercise is similar to what is done over and over by customers as they manipulate data through MapReduce jobs. Tested Configuration In these tests a small Hadoop cluster was employed as recommended in Table 1. The specific software revisions used in the test are shown in Table 2. The PowerEdge R720 and R70XD hardware configurations are shown in Table 3 and Table 4. The hardware listed should be used as initial guidance only. Additional configurations are very possible and will likely be required as each customer s environment and use-case is unique. Table 2. Software Revisions Component Revision Redhat Enterprise Linux 6.4 Intel Distribution for Apache (Build 16962) Hadoop Apache Hadoop (IDH is based on) Hbase Hive Zookeeper HiBench 2.2 Table 3. PowerEdge R720 Infrastructure Node As Tested Configuration Component Detail Height 2 Rack Units (3.5 ) Processor 2x Intel Xeon E GHz 8-core procs Memory 128 GB Disk 6x 600 GB 15K SAS Drives 21 Dell Intel Distribution for Apache Hadoop

22 Network RAID Controller Management Card 4x 1GbE LOMs, 2x 10GbE NICs PowerEdge RAID Controller H710 (PERC) Integrated Dell Remote Access Controller (idrac) Table 4. PowerEdge R720XD Datanode As Tested Configuration Component Detail Height 2 Rack Units (3.5 ) Processor 2x Intel Xeon E GHz 6-core procs Memory 64 GB Disk 24x 500GB 7200 RPM Nearline SAS drives Network 4x 1GbE LOMs, 2x 10GbE NICs RAID Controller PowerEdge RAID Controller H710 (PERC) Management Card Integrated Dell Remote Access Controller (idrac) Tuning and Optimization of Workloads The cluster configuration variables used in these tests (Table 5) are simply a starting spot. Parameters like dfs.block.size would be highly contingent on the type of data being stored and the use-case thereof. A Dell Professional Services engagement is recommended to achieve configurations optimized for the user s workload. Table 5. Key Hadoop Configuration Parameters Name Value dfs.block.size ipc.server.tcpnodelay FALSE ipc.client.tcpnodelay FASLE io.sort.factor 100 io.sort.mb 400 io.sort.spill.percent 0.8 io.sort.record.percent 0.05 mapred.child.java.opts 1024m mapreduce.tasktracker.outofband.heartbeat TRUE mapred.job.reuse.jvm.num.tasks 1 mapred.min.split.size mapred.reduce.parallel.copies 20 mapred.reduce.tasks.speculative.execution TRUE mapred.reduce.tasks 30* # Task Trackers mapred.map.tasks 20 * # of Task Trackers mapred.compress.map.output TRUE tasktracker.http.threads Dell Intel Distribution for Apache Hadoop

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure

Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure The Intel Distribution for Apache Hadoop* software running on 808 VMs using VMware vsphere Big Data Extensions and Dell

More information

Microsoft SharePoint Server 2010

Microsoft SharePoint Server 2010 Microsoft SharePoint Server 2010 Small Farm Performance Study Dell SharePoint Solutions Ravikanth Chaganti and Quocdat Nguyen November 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Optimizing SQL Server Storage Performance with the PowerEdge R720

Optimizing SQL Server Storage Performance with the PowerEdge R720 Optimizing SQL Server Storage Performance with the PowerEdge R720 Choosing the best storage solution for optimal database performance Luis Acosta Solutions Performance Analysis Group Joe Noyola Advanced

More information

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.

More information

Dell s SAP HANA Appliance

Dell s SAP HANA Appliance Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,

More information

Platfora Big Data Analytics

Platfora Big Data Analytics Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers

More information

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm ( Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Fast, Low-Overhead Encryption for Apache Hadoop*

Fast, Low-Overhead Encryption for Apache Hadoop* Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) WHITE PAPER Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) July 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table of Contents

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Private cloud computing advances

Private cloud computing advances Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture Dell Compellent Product Specialist Team THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL

More information

Hortonworks Data Platform Reference Architecture

Hortonworks Data Platform Reference Architecture Hortonworks Data Platform Reference Architecture A PSSC Labs Reference Architecture Guide December 2014 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0

Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0 Dell Virtual Remote Desktop Reference Architecture Technical White Paper Version 1.0 July 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES.

More information

Maximum performance, minimal risk for data warehousing

Maximum performance, minimal risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Real-Time Big Data Analytics for the Enterprise

Real-Time Big Data Analytics for the Enterprise White Paper Intel Distribution for Apache Hadoop* Big Data Real-Time Big Data Analytics for the Enterprise SAP HANA* and the Intel Distribution for Apache Hadoop* Software Executive Summary Companies are

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 This white paper discusses the SQL server workload consolidation capabilities of Dell PowerEdge R820 using Virtualization.

More information

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Table of Contents Introduction... 4 Certified Products... 4 Key Findings... 5 Solution

More information

Dell Desktop Virtualization Solutions Simplified. All-in-one VDI appliance creates a new level of simplicity for desktop virtualization

Dell Desktop Virtualization Solutions Simplified. All-in-one VDI appliance creates a new level of simplicity for desktop virtualization Dell Desktop Virtualization Solutions Simplified All-in-one VDI appliance creates a new level of simplicity for desktop virtualization Executive summary Desktop virtualization is a proven method for delivering

More information

DELL s Oracle Database Advisor

DELL s Oracle Database Advisor DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Enterprise Product Group (EPG) Dell White Paper By Todd Muirhead and Peter Lillian July 2004 Contents Executive Summary... 3 Introduction...

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Accelerate Big Data Analysis with Intel Technologies

Accelerate Big Data Analysis with Intel Technologies White Paper Intel Xeon processor E7 v2 Big Data Analysis Accelerate Big Data Analysis with Intel Technologies Executive Summary It s not very often that a disruptive technology changes the way enterprises

More information

Microsoft SharePoint Server 2010

Microsoft SharePoint Server 2010 Microsoft SharePoint Server 2010 Medium Farm Solution Performance Study Dell SharePoint Solutions Ravikanth Chaganti and Quocdat Nguyen August 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY,

More information

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers White Paper rev. 2015-11-27 2015 FlashGrid Inc. 1 www.flashgrid.io Abstract Oracle Real Application Clusters (RAC)

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge

More information

CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings

CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings WHITE PAPER CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings August 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers

Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers White Paper Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th As the deployment of bandwidth-intensive applications such as public and private cloud computing continues to increase, IT administrators

More information

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009

More information

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Minimize cost and risk for data warehousing

Minimize cost and risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Minimize cost and risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (55TB) Highlights Improve time to value for your data

More information

RSA Security Analytics Virtual Appliance Setup Guide

RSA Security Analytics Virtual Appliance Setup Guide RSA Security Analytics Virtual Appliance Setup Guide Copyright 2010-2015 RSA, the Security Division of EMC. All rights reserved. Trademarks RSA, the RSA Logo and EMC are either registered trademarks or

More information

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples

More information

Intel Platform and Big Data: Making big data work for you.

Intel Platform and Big Data: Making big data work for you. Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution Database Solutions Engineering By Subhashini Prem and Leena Kushwaha Dell Product Group March 2009 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Apache Hadoop Cluster Configuration Guide

Apache Hadoop Cluster Configuration Guide Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources

More information

Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization

Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization A Dell Technical White Paper Database Solutions Engineering Dell Product Group Anthony Fernandez Jisha J Executive

More information

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V Copyright 2011 EMC Corporation. All rights reserved. Published February, 2011 EMC believes the information

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Dell High Availability Solutions Guide for Microsoft Hyper-V

Dell High Availability Solutions Guide for Microsoft Hyper-V Dell High Availability Solutions Guide for Microsoft Hyper-V www.dell.com support.dell.com Notes and Cautions NOTE: A NOTE indicates important information that helps you make better use of your computer.

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide OPTIMIZATION AND TUNING GUIDE Intel Distribution for Apache Hadoop* Software Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide Configuring and managing your Hadoop* environment

More information

How To Backup And Restore A Database With A Powervault Backup And Powervaults Backup Software On A Poweredge Powervalt Backup On A Netvault 2.5 (Powervault) Powervast Backup On An Uniden Power

How To Backup And Restore A Database With A Powervault Backup And Powervaults Backup Software On A Poweredge Powervalt Backup On A Netvault 2.5 (Powervault) Powervast Backup On An Uniden Power Database Backup and Recovery using NetVault Backup and PowerVault MD3260 A Dell Technical White Paper Database Solutions Engineering Dell Product Group Umesh Sunnapu Mayura Deshmukh Robert Pound This document

More information

cloud functionality: advantages and Disadvantages

cloud functionality: advantages and Disadvantages Whitepaper RED HAT JOINS THE OPENSTACK COMMUNITY IN DEVELOPING AN OPEN SOURCE, PRIVATE CLOUD PLATFORM Introduction: CLOUD COMPUTING AND The Private Cloud cloud functionality: advantages and Disadvantages

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

New Features in SANsymphony -V10 Storage Virtualization Software

New Features in SANsymphony -V10 Storage Virtualization Software New Features in SANsymphony -V10 Storage Virtualization Software Updated: May 28, 2014 Contents Introduction... 1 Virtual SAN Configurations (Pooling Direct-attached Storage on hosts)... 1 Scalability

More information

The Methodology Behind the Dell SQL Server Advisor Tool

The Methodology Behind the Dell SQL Server Advisor Tool The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity

More information