Greenplum Analytics Workbench

Size: px
Start display at page:

Download "Greenplum Analytics Workbench"

Transcription

1 Greenplum Analytics Workbench Abstract This white paper details the way the Greenplum Analytics Workbench was designed and built to validate Apache Hadoop code at scale, as well as provide a large scale experimentation environment for mixed mode development that include various SQL and Non-SQL execution environments. It describes the core architectural components involved as well as highlights the benefits that an Enterprise can leverage to quickly and efficiently analyze Big Data. Table of Contents Introduction Partners Super Micro Micron... 3 Intel Seagate Mellanox Mellanox ConnectX -3 VPI Network Adapter Card Mellanox ConnectX -3 VPI card Specifications Mellanox SwitchX VPI Switches Mellanox SwitchX VPI Switch Specifications Mellanox Cables Mellanox FDR Passive Copper and Optical Cable Mellanox Unstructured Data Accelerator (UDA) Switch... 7 Rubicon Hadoop Software Overview Hadoop MapReduce Hadoop Distributed File System Hadoop Ecosystem Hadoop on the Greenplum Analytics Workbench TeraSort Example About Greenplum White Paper

2 Introduction Enterprises have been dealing with storing rapidly growing amounts of data truly big data available via traditional sources such as ERP, CRM, etc. to now social media, blogs, etc. The initial focus for the enterprise has been on efficient storage of big data but the focus has now shifted to analytics on the big data sets. Hadoop has emerged as the platform of choice for processing big data especially unstructured data. The out of box experience with Hadoop still has a lot to be desired as it lacks tools needed for ease of deployment and management of such an infrastructure; especially one needed to handle larger deployments. Enterprises are also quickly realizing that in order to maximize the analytics capabilities, a mixed mode environment is imperative. A mixed environment is where enterprises have an easy way of combining both the structured data sets (using traditional SQL) and unstructured data sets (using Hadoop/no- SQL) without having to rework existing processes. Greenplum Analytics Workbench is built to provide an environment that supports mixed mode development and validation at scale. The workbench is pre-configured with open and freely available data sets and has analysis software built-in for quick turnaround and rapid productivity. It will contain the entire Hadoop stack consisting of HDFS, M/R, PIG, HIVE, HBase, Mahout and augment it with SQL capabilities of Greenplum Database industry s leading MPP database; all deployed on the same nodes. Analytics Workbench provides a perfect experimentation platform for Greenplum s thought leadership in the Unified Analytics Platform. It also provides a tremendous learning opportunity for organizations that wish to build and operate a large Hadoop/Mixed mode cluster. Partners The Analytics Workbench was assembled with the help of strong partnership with some of the industries leading vendors of hardware and services. The Greenplum team forged close alliances with these partners to carefully assemble the hardware nodes, rack/stack and cable the nodes in a state of the art datacenter. An extremely fast network backbone and switching layer was designed to provide the cluster with blazing fast throughput for intra-cluster communication.

3 Supermicro Supermicro contributed 1,000 enterprise-ready server systems for all data-processing nodes to power the 24 peta bytes of storage available on the cluster. Supermicro fully assembled and tested the data-processing nodes in their Silicon Valley production center. The assembly process included integrating processors, memory, disk drives, and network cards from the other contributing partners. Supermicro s design team optimized the system configuration to address both data center space and power challenges. Supermicro servers maintain peak power efficiencies with platinumlevel 94% efficient power supplies. Supermicro then maximized node count per rack, reducing the overheads for power and cooling expenditure while delivering high performance and 24terabytes of storage requirements per data-processing node. The specifications of the Supermicro systems are as follows: 2U Greenplum Hadoop OEM Server - Model # PIO-626T-6RF-EM09B Supermicro Dual Processor motherboard supporting: - Dual Intel Xeon X5500/X5600 series processors - Up to 192GB RAM with 12 DDR3 RDIMMs - 5 expansion slots: 3 PCI-E 2.0 x8, 1 PCI-E 2.0 x4, 1 PCI-E x4 - Onboard LSI Gbps disk controller (IR mode) - Dual LAN with Intel Gigabit Ethernet Controller - Dedicated IPMI remote management port Supermicro 2U server chassis supporting: - 12 hot-swap 3.5 drive trays - Redundant 500 Watt platinum-level power supplies - 94+% efficiency rating - Active backplane with 6.0Gbps SAS/SATA expander - 7 low-profile PCI-E expansion slots Micron Micron contributed 6000 DDR3 RDIMM memory sticks with 8GB each. A total of 48TB of memory to be evenly distributed across 1000 nodes such that each node has a total of 48GB RAM. DDR3 functionality and operations supported as: 240-pin, registered dual in-line memory module (RDIMM) Fast data transfer rates: PC , PC3-8500, or PC GB (512 Meg x 72) VDD = 1.5V ±0.075V VDDSPD = V Supports ECC error detection and correction Nominal and dynamic on-die termination (ODT) for data, strobe, and mask signals Quad rank On-board I2C temperature sensor with integrated serial presence-detect (SPD) EEPROM Fixed burst chop (BC) of 4 and burst length (BL) of 8 via the mode register set (MRS) Selectable BC4 or BL8 on-the-fly (OTF) Gold edge contacts

4 Intel Intel contributed 2000 Westmere processors with the following specifications Processor Number X5670 # of Cores 6 # of Threads 12 Clock Speed 2.93 GHz Max Turbo Frequency 3.33 GHz Intel Smart Cache 12 MB Bus/Core Ratio 22 Intel QPI Speed 6.4 GT/s # of QPI Links 2 Instruction Set 64-bit Instruction Set Extensions SSE4.2 Embedded Options Available No Lithography 32 nm Max TDP 95 W Max Memory Size (dependent on memory type) 288 GB Memory Types DDR3-800/1066/1333 # of Memory Channels 3 Max Memory Bandwidth 32 GB/s Physical Address Extensions 40-bit ECC Memory Supported Intel Turbo Boost Technology Intel Hyper-Threading Technology Intel Virtualization Technology (VT-x) Intel Virtualization Technology for Directed I/O (VT-d) Intel Trusted Execution Technology AES New Instructions Intel 64 Idle States Enhanced Intel SpeedStep Technology Intel Demand Based Switching Thermal Monitoring Technologies No Execute Disable Bit Table 1: (courtesy:

5 Seagate Seagate contributed 12,000 2TB drives for a total of 24TB per node and 24PB raw storage across the cluster. The specification of the Seagate drives is as follows: Product Name 2TB Constellation ES 7200RPM 3.5 SATA 6GB/s 64MB Cache Hard Drive Product Type Hard Drive Buffer 64 MB Hard Drive Interface SATA/600 Compatible Drive Bay Width 3.5 SATA Pin 7-pin Height 1.0 Width 4.0 Depth 5.8 Product Series ES Form Factor Internal Product Model ST2000NM0011 Product Line Constellation Storage Capacity 2 TB Rotational Speed 7200 rpm Maximum External Data Transfer Rate 600 MBps (4.7 Gbps) Average Latency 4.16 ms Average Seek Time 9.50 ms Table 2: (courtesy: Mellanox Mellanox ConnectX -3 VPI Network Adapter Card Mellanox ConnectX -3 VPI card Specifications Part Number Supported Data Rates InfiniBand Supported Data Rates Ethernet MCX354A-FCBT FDR;QDR;DDR 40GbE;10GbE PCI Express generations support 3.0;2.0; 1.1 RDMA Support Supported Media Types Number of Ports and Types InfiniBand; RoCE Direct Attached Copper; Active Optical Cables, Optical Modules 2 Ports, QSFP+

6 Mellanox SwitchX VPI Switches Mellanox SwitchX VPI Switch Specifications Part Number Supported Data Rates InfiniBand Supported Data Rates Ethernet Port-to-Port Latency InfiniBand Port-to-Port Latency InfiniBand Blocking Ratio Number of Ports and Type Typical Power Consumption Supported Media Types MSX6036F-1SFR FDR;QDR;DDR 40GbE;10GbE 170ns 230ns 1:1 (Fully non-blocking) 36 Ports, QSFP+ 126W Direct Attached Copper; Active Optical Cables, Optical Modules Mellanox Cables Mellanox FDR Passive Copper and Optical Cable Greenplum Analytics Workbench connectivity is enabled by Mellanox s FDR cables. Both Passive Copper and Active Optical Cables are used to provide state-of-the-art cluster cabling solution as well as durability and ease of installation. Mellanox Unstructured Data Accelerator (UDA) Mellanox UDA, a software plugin, accelerates Hadoop network and improves the scaling of Hadoop clusters executing data analytics intensive applications. A novel data moving protocol which uses RDMA in combination with an efficient merge-sort algorithm enables Hadoop clusters based on Mellanox InfiniBand and 40/10GbE RoCE (RDMA over Converged Ethernet) adapter cards to efficiently move data between servers accelerating the Hadoop framework. The 1000 node Hadoop cluster is connected via blazing fast FDR, 56Gbps InfiniBand interconnect, using ConnectX -3 VPI cards and SwitchX VPI switches, as described above. The cluster is using 3 layers of switching: Node level switches that connect 20 servers in each rack to the aggregation layer using 4 FDR uplinks from each Top of Rack Switch (ToR). Aggregation layer switches that are connected to the core layer using 18 uplinks delivering full non-blocking InfiniBand network between the aggregation and core levels and the core level switches. The cluster utilizes IP over IB protocol to enable a more efficient connection to socket based portions of the framework. UDA provides the MapReduce portion of the framework the ability to utilize RDMA connectivity between nodes, reducing CPU overhead and enabling lower latency connections. The outcome of RDMA usage is significant reduction of processing time for similar size of datasets.

7 The interconnect layout of the network is as follows: Switch Switch is the state of the art datacenter in Las Vegas, NV where the Analytics Workbench is hosted. There are almost 3 full SCIFS consisting of 54 racks in all. Each rack holds 20 servers. A few racks are not completely filled leaving a bit of a room for expansion. Switch datacenter however will be able to accommodate future growth in other SCIFs with no seeming impact to the overall cluster. The racks are divided into data racks, core racks and infrastructure racks. Infrastructure racks hold servers for Puppet, Nagios, Ganglia, DNS, DHCP, etc. whereas the core racks hold the servers for name node, job tracker node, Zookeeper, HBase, etc. The rack layout is as follows:

8 Rubicon The Rubicon team which is a part of VMware provides the Tier-1 an Tier-2 support for the cluster. This includes monitoring the network, hardware and various system level monitoring checks. The team uses Zabbix for systems management and has developed sophisticated plug-ins and dashboard on top of Zabbix. Rubicon team has a local presence in Las Vegas and is possesses the ability to provide rapid response to critical issues within the cluster Hadoop Software Overview Hadoop is an industry leading open source distributed file system that is designed to scale with growing data storage and compute needs of an organization. By using the same nodes for both compute and storage, a cluster can scale in both the dimensions simultaneously and avoid traditional bottlenecks of NAS/SAN type architecture. Below are some of the key components of Hadoop: Hadoop MapReduce: the parallel task processing mechanism that takes a query (job) and runs it in parallel on multiple nodes. The parallelism provides much better throughput for unstructured data sets that can be independently processed. Hadoop Distributed File System (HDFS): the base file system layer that stores data across all of the nodes in the cluster. MapReduce as a computing paradigm was popularized by Google and Hadoop was written and open sourced by Yahoo as an implementation of that paradigm. Hadoop MapReduce Hadoop MapReduce is a software framework for easily writing applications which process large amounts of data in-parallel on large clusters of commodity compute nodes. The diagram below depicts the basics of MapReduce workflow A MapReduce job (query) usually splits the input data-set into independent chunks size of each chunk is dependent on the system wide setting (typically 64MB). Each block is processed by the map tasks in a completely parallel manner. The framework sorts the output of the maps, which are then used as input to the reduce tasks. Typically both the input and the output of the job are stored in the HDFS. The framework takes care of scheduling tasks, monitoring them and managing the re-execution of failed tasks.

9 Typically in a Hadoop cluster, the MapReduce compute nodes and the storage layer (HDFS) reside on the same set of nodes. The system is configured to be rack aware therefore making is possible for the framework to effectively schedule tasks on the nodes where data is already present minimizing having to move data within a cluster of nodes. This is the compute layer that derives key insight from the data that resides in the HDFS layer. Hadoop is completely written in Java but MapReduce applications do not need to be. MapReduce applications can utilize the Hadoop Streaming interface to specify any executable to be the mapper or reducer for a particular job. The MapReduce framework consists of the following: JobTracker: single master per cluster of nodes that schedules, monitors and manages jobs as well as its component tasks. TaskTracker: one slave TaskTracker per cluster node that execute that task components for a job as directed by the JobTracker. In the upcoming release of Hadoop, the resource management module will undergo drastic rework. It will maintain backwards compatibility while effectively splitting the resource management capabilities into a standalone module. Hadoop Distributed File System Hadoop Distributed File System (HDFS) is a block based file system that allows user data to be stored in files. It retains the look and feel of a linux file system so that users or applications can create or remove files and directories as well as move or rename files and directories. HDFS does not support setting hard or soft links. All HDFS communication is layered on top of the TCP/IP protocol. Below are the key components for HDFS: NameNode: single master metadata server that has in memory maps of every file, file locations as well as all the blocks within the files and their locations within the HDFS namespace. The upcoming release of Hadoop, NameNode HA feature will be introduced to release some stress of the existing design (such as a single point of failure). DataNode: one slave DataNode per cluster node that serves read/write requests as well as performs block creation, deletion and replication as directed by the NameNode. This is the storage layer where all the data resides before a MapReduce job can run on it. HDFS uses block mirroring to spread the data around in the Hadoop cluster for protection as well as data locality to run MapReduce jobs on the same data but on multiple compute nodes. The default block size is 64 MB and the default replication factor is 3x. The copies are written in a rack aware manner so that all 3 copies do not reside on the same rack. Central idea behind replication being that if a rack goes down, system will still have access to full data set as much as possible.

10 Hadoop Ecosystem Hadoop ecosystem consists of the following main blocks: Hive: a SQL-like adhoc querying interface for data stored in HDFS Hbase: a column oriented structured storage system for HDFS Pig: high level data flow language and execution framework for parallel computation Mahout: scalable machine learning algorithms using Hadoop The above is not an exhaustive list of all Hadoop ecosystem components. Analysis Mahout Workflow Mgmt. SPRING BATCH Oozie Languages HIVE M/R PIG Exec. Env. HBASE File System HDFS Hadoop on the Greenplum Analytics Workbench For most, a typical Hadoop cluster consists of a name node, a few other master nodes and a whole lot of data nodes. The diagram below shows the Hadoop data nodes and the corresponding master nodes. A few of the master roles are hosted on the same machine to begin with. This scenario may change depending on the load on the system.

11 In reality, a typical Hadoop cluster is supported by a number of additional roles as shown in the diagram below Internet The table below provides a brief description of the server roles Access Data ingestion Web based management Jenkins Ganglia, Nagios and Zabbix Plato server Kickstart, DNS, DHCP and NTP YUM repo, Puppet master, Kerberos UFM The nodes are used to access the cluster. There is no direct access to the data nodes from outside. Typically these nodes support ssh based connectivity Data ingestion nodes are used for bulk upload of data into the cluster. These nodes can be used as a staging area for further processing prior to loading into HDFS Web based management nodes are used for accessing the cluster via HTTP Jenkins is an open source continuous integration framework. The server is used to build Hadoop code on demand or on a predefined trigger and provides for a dashboard to view the results These systems are used to monitor the cluster. For the analytics workbench, Zabbix is currently used to monitor the system level statistics whereas Nagios and Ganglia combination is used for application level monitoring This is deployed to monitor the health of the disks. It is actively monitored by the Rubicon team Kickstart server is used to load the base OS onto the nodes whereas DNS, DHCP and NTP are used for network management YUM repo is used as a repository for RHEL packages. It is used by the puppet master and the puppet agent running on each data node to access the packages that the slaves need for deployment. Kerberos is used as an authentication mechanism (needed as a part of secure Hadoop implementation) This is used as a the unified fabric manager for the Mellanox network

12 Terasort Example Industry benchmark TeraSort was run on the cluster. The cluster configuration wasn t optimized to the best possible tuning. The intent of the run was to simply validate the general health of the cluster as well as measure the TeraSort run characteristics. The first run was against 1TB and the second run was against 10TB of data. There are plans to run 100TB and even 1PB sort in the near future The results are shown below: ABOUT GREENPLUM Greenplum, a division of EMC, is driving the future of Big Data analytics with breakthrough products that harness the skills of data science teams to help global organizations realize the full promise of business agility and become data-driven, predictive enterprises. The division s products include Greenplum Unified Analytics Platform, Greenplum Data Computing Appliance, Greenplum Database, Greenplum Analytics Lab, Greenplum HD and Greenplum Chorus. They embody the power of open systems, cloud computing, virtualization and social collaboration, enabling global organizations to gain greater insight and value from their data than ever before possible. Learn more at CONTACT US To learn more about how Greenplum products, services, and solutions can help you realize Big Data Analytics opportunities, visit us at Greenplum, a Division of EMC 1900 South Norfolk Street San Mateo, CA Tel: EMC 2, EMC, the EMC logo, and Greenplum are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. Copyright 2012 EMC Corporation. All rights reserved. Published in the USA. 05/12 White Paper

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

SX1024: The Ideal Multi-Purpose Top-of-Rack Switch

SX1024: The Ideal Multi-Purpose Top-of-Rack Switch WHITE PAPER May 2013 SX1024: The Ideal Multi-Purpose Top-of-Rack Switch Introduction...1 Highest Server Density in a Rack...1 Storage in a Rack Enabler...2 Non-Blocking Rack Implementation...3 56GbE Uplink

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

SX1012: High Performance Small Scale Top-of-Rack Switch

SX1012: High Performance Small Scale Top-of-Rack Switch WHITE PAPER August 2013 SX1012: High Performance Small Scale Top-of-Rack Switch Introduction...1 Smaller Footprint Equals Cost Savings...1 Pay As You Grow Strategy...1 Optimal ToR for Small-Scale Deployments...2

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

HUAWEI Tecal E6000 Blade Server

HUAWEI Tecal E6000 Blade Server HUAWEI Tecal E6000 Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The HUAWEI Tecal E6000 is a new-generation server platform that guarantees comprehensive and powerful

More information

Apache Hadoop Cluster Configuration Guide

Apache Hadoop Cluster Configuration Guide Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources

More information

Installing Hadoop over Ceph, Using High Performance Networking

Installing Hadoop over Ceph, Using High Performance Networking WHITE PAPER March 2014 Installing Hadoop over Ceph, Using High Performance Networking Contents Background...2 Hadoop...2 Hadoop Distributed File System (HDFS)...2 Ceph...2 Ceph File System (CephFS)...3

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Entering the Zettabyte Age Jeffrey Krone

Entering the Zettabyte Age Jeffrey Krone Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This document

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS APACHE HADOOP PLATFORM BIG DATA HARDWARE INFRASTRUCTURE SOLUTIONS 1 BIG DATA. BIG CHALLENGES. BIG OPPORTUNITY. How do you manage the VOLUME, VELOCITY & VARIABILITY of complex data streams in order to find

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions WHITE PAPER May 2014 Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions Contents Executive Summary...2 Background...2 Network Configuration...3 Test

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

How To Write An Article On An Hp Appsystem For Spera Hana

How To Write An Article On An Hp Appsystem For Spera Hana Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ

More information

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This

More information

Building a Scalable Storage with InfiniBand

Building a Scalable Storage with InfiniBand WHITE PAPER Building a Scalable Storage with InfiniBand The Problem...1 Traditional Solutions and their Inherent Problems...2 InfiniBand as a Key Advantage...3 VSA Enables Solutions from a Core Technology...5

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Cost Efficient VDI. XenDesktop 7 on Commodity Hardware

Cost Efficient VDI. XenDesktop 7 on Commodity Hardware Cost Efficient VDI XenDesktop 7 on Commodity Hardware 1 Introduction An increasing number of enterprises are looking towards desktop virtualization to help them respond to rising IT costs, security concerns,

More information

EMC ISILON NL-SERIES. Specifications. EMC Isilon NL400. EMC Isilon NL410 ARCHITECTURE

EMC ISILON NL-SERIES. Specifications. EMC Isilon NL400. EMC Isilon NL410 ARCHITECTURE EMC ISILON NL-SERIES The challenge of cost-effectively storing and managing data is an ever-growing concern. You have to weigh the cost of storing certain aging data sets against the need for quick access.

More information

Mellanox Academy Online Training (E-learning)

Mellanox Academy Online Training (E-learning) Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),

More information

SUN ORACLE EXADATA STORAGE SERVER

SUN ORACLE EXADATA STORAGE SERVER SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand

More information

Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U

Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U Twin Servers Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U New Generation TwinPro Systems SAS 3.0 (12Gbps) Up to 12 HDDs/Node FDR(56Gbps)/QDR InfiniBand 1TB DDR3 up to 1866 MHz in 16 DIMMs Redundant

More information

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm ( Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms IT@Intel White Paper Intel IT IT Best Practices: Data Center Solutions Server Virtualization August 2010 Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms Executive

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Cisco UCS B440 M2 High-Performance Blade Server

Cisco UCS B440 M2 High-Performance Blade Server Data Sheet Cisco UCS B440 M2 High-Performance Blade Server Product Overview The Cisco UCS B440 M2 High-Performance Blade Server delivers the performance, scalability and reliability to power computation-intensive,

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack May 2015 Copyright 2015 SwiftStack, Inc. swiftstack.com Page 1 of 19 Table of Contents INTRODUCTION... 3 OpenStack

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Mellanox Accelerated Storage Solutions

Mellanox Accelerated Storage Solutions Mellanox Accelerated Storage Solutions Moving Data Efficiently In an era of exponential data growth, storage infrastructures are being pushed to the limits of their capacity and data delivery capabilities.

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Hortonworks Data Platform Reference Architecture

Hortonworks Data Platform Reference Architecture Hortonworks Data Platform Reference Architecture A PSSC Labs Reference Architecture Guide December 2014 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.

More information

Intel Core i3-2310m Processor (3M Cache, 2.10 GHz)

Intel Core i3-2310m Processor (3M Cache, 2.10 GHz) Intel Core i3-2310m Processor All Essentials Memory Specifications Essentials Status Launched Compare w (0) Graphics Specifications Launch Date Q1'11 Expansion Options Package Specifications Advanced Technologies

More information

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs Protecting the Data That Drives Business SecureSphere Appliances Scalable. Reliable. Flexible. Imperva SecureSphere appliances provide superior performance and resiliency for demanding network environments.

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Cisco Unified Computing System and EMC VNXe3300 Unified Storage System

Cisco Unified Computing System and EMC VNXe3300 Unified Storage System Cisco Unified Computing System and EMC VNXe3300 Unified Storage System An Ideal Solution for SMB Server Consolidation White Paper January 2011, Revision 1.0 Contents Cisco UCS C250 M2 Extended-Memory Rack-Mount

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Cisco 7816-I5 Media Convergence Server

Cisco 7816-I5 Media Convergence Server Cisco 7816-I5 Media Convergence Server Cisco Unified Communications Solutions unify voice, video, data, and mobile applications on fixed and mobile networks, enabling easy collaboration every time from

More information

Understanding Hadoop Performance on Lustre

Understanding Hadoop Performance on Lustre Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15

More information

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater

More information

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity InfiniBand Switch System Family Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity Mellanox continues its leadership by providing InfiniBand SDN Switch Systems

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN Two parts: * technical setup * applications before starting Question: Hadoop experience levels from none to some to lots, and what

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs Protecting the Data That Drives Business SecureSphere Appliances Scalable. Reliable. Flexible. Imperva SecureSphere appliances provide superior performance and resiliency for demanding network environments.

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

ORACLE BIG DATA APPLIANCE X3-2

ORACLE BIG DATA APPLIANCE X3-2 ORACLE BIG DATA APPLIANCE X3-2 BIG DATA FOR THE ENTERPRISE KEY FEATURES Massively scalable infrastructure to store and manage big data Big Data Connectors delivers load rates of up to 12TB per hour between

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCI Express and Storage. Ron Emerick, Sun Microsystems Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature

More information

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers White Paper rev. 2015-11-27 2015 FlashGrid Inc. 1 www.flashgrid.io Abstract Oracle Real Application Clusters (RAC)

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

IBM System x family brochure

IBM System x family brochure IBM Systems and Technology System x IBM System x family brochure IBM System x rack and tower servers 2 IBM System x family brochure IBM System x servers Highlights IBM System x and BladeCenter servers

More information

Intel RAID SSD Cache Controller RCS25ZB040

Intel RAID SSD Cache Controller RCS25ZB040 SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture Ron Weiss, Exadata Product Management Exadata Database Machine Best Platform to Run the

More information

Accelerate Big Data Analysis with Intel Technologies

Accelerate Big Data Analysis with Intel Technologies White Paper Intel Xeon processor E7 v2 Big Data Analysis Accelerate Big Data Analysis with Intel Technologies Executive Summary It s not very often that a disruptive technology changes the way enterprises

More information

Virtual Compute Appliance Frequently Asked Questions

Virtual Compute Appliance Frequently Asked Questions General Overview What is Oracle s Virtual Compute Appliance? Oracle s Virtual Compute Appliance is an integrated, wire once, software-defined infrastructure system designed for rapid deployment of both

More information

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation 1. Overview of NEC PCIe SSD Appliance for Microsoft SQL Server Page 2 NEC Corporation

More information