Scaling Hadoop for Multi-Core and Highly Threaded Systems

Size: px
Start display at page:

Download "Scaling Hadoop for Multi-Core and Highly Threaded Systems"

Transcription

1 Scaling Hadoop for Multi-Core and Highly Threaded Systems Jangwoo Kim, Zoran Radovic Performance Architects Architecture Technology Group Sun Microsystems Inc.

2 Project Overview Hadoop Updates CMT Hadoop Systems Scaling Hadoop on CMT Virtualization Technologies Zones Logical Domains AGENDA Case Study: Discovery Conclusions 2

3 Project Overview Chip Multi-Threading (CMT) processors and Hadoop are designed for maximum throughput Sun's JVM optimized for CMT > Java has been widely deployed by many customers on CMT Hadoop is written with Java an ideal throughput candidate Seemed like a great fit for Hadoop with the potential for a greatly reduced footprint Related Work by Ning Sun and Lee Anne Simmons: > Blueprint: Using Logical Domains and CoolThreads Technology: Improving Scalability and System Utilization 3

4 700+ attendees.. 4

5 Hadoop Expands Beyond the Web... Some examples from the Summit Genetic Sequence Analysis Parallel Data Mining in Telco Natural language learning Business Fraud Detection Clinical Trials Retail Business Planning... 5

6 Map/Reduce Organization Input HDFS map copy sort/merge Output HDFS Split 0 reduce Part 0 Split 1 Split 2 map Split 3 sort/merge reduce Part 1 map 6

7 Next-Gen Hadoop Low Latency Focus Hadoop is traditionally optimized for throughput World Record Sort source code changes Winning a 60 Second Dash with a Yellow Elephant Reducer Improvements (Shuffle); memory to memory merge Fetch of multiple map outputs from the same node Reduces number of server connections Improved timeout behavior Better data corruption detection (CRC32 improvements) Map output compression (45% of the original size) Improved and multi-threaded data partitioning Lower latency with faster heartbeat 7

8 OpenSolaris OpenSolaris Moves Into Enterprise UltraSPARC T1 and T2 Support, Sun4u 5 Year Enterprise Support Datacenter-Ready Installation New and Modern Networking Stack Multi-Core Optimized Easy Network Virtualizetion and Resource Control Powerful, Built-in and Free Virtualization Techology 8

9 UltraSPARC T2 Processor 8 SPARC V9 1.4Ghz > 8 vertical threads per core > 2 execution pipelines per core > 1 instruction/cycle per pipeline > 1 FPU per core > 1 SPU (crypto) per core > 4MB 16-way 8-bank L2$ 64 threads 2.5Ghz x8 PCI-Express interface 2 x 10Gb on-chip Ethernet Crypto processor per core Power: 84 watts (typical) 42 GB/s read, 21 GB/s write 9

10 CMT Hadoop Systems T5440 4U 2P US-T2 Plus Platform & Sun Storage J4400 Blade U US-T2 Blades T5240 2U 2P US-T2 Plus Server 10

11 CMT Hadoop Node and Rack Specs * 11

12 Ideal Performance Model job start all maps start all maps finish mapping job completion shuffle start all reduces start all reduces finish shuffling reducing All tasks start and finish simultaneously 12

13 Performance Model with Serialized Tasks job start first map start last map starts last map finishes mapping job completion first map finishes last shuffle starts last reduce starts last reduce finishes shuffling reducing Launching many tasks can incur significant overhead 13

14 Distributed Performance Data Collection Created a set of scripts to facilitate distributed execution for performance data collection and analysis Based on traditional single-node system analysis tools mpstat, nicstat, iostat, vmstat,... Varaiable sampling frequency to monitor hardware utilization Pinpoint which resource is a bottleneck at any point CPU utilization, network, disk I/O Periods where no resource is fully utilized may indicate poorlytuned Hadoop configuration or other system issues Hadoop log processing to monitor Hadoop task timeline Examine startup rate, Hadoop phase overlap Scripts and details are available here: 14

15 Serialized Task Launching Overhead ( disks 30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 ( min ) Time Mapping Shuffling Reducing ( reduce (#map, # <60% CPU utilization Significant launching overhead limits scalability 15

16 10-Node 150G Sort Task Timeline ( threads Detailed Look: One T2 Blade (64 16

17 10-Node 150G Sort Utilization Stats 17

18 Intra-node Virtualization: ( LDOMs ) Logical Domains Hardware-assisted Virtualization Single hypervisor > OS-Level Isolation > Dedicated H/W threads and memory Logical Domain 0 Logical Domain 1 Logical Domain 2 Logical Domain N Job Tracker Task Tracker Task Tracker Task Tracker Name Node Data Node Data Node Data Node Hypervisor 18

19 Example LDOMs Configuration Single control domain > Virtual disk server (vds) > Virtual network switch (vsw) > Virtual console concentrator (vcc) Multiple logical domains ldm add-vcpu 8 ldom0 ldm add-memory 16G ldom0 ( cpu ) ( memory ) ( disk ) ldm add-vdisk vdisk0 control-vds ldom0 ldm add-vnet vnet0 control-vsw ldom0 Single control domain ldm bind ldom0 (bind) ldm start ldom0 (boot) > OS Install as usual ( network ) 19

20 Intra-node Virtualization: Zones ( Containers ) Software (OS) Virtualization Single operating system > Application-Level Isolation > No H/W threads and memory dedicated Zone 0 Zone 1 Zone 2 Zone N Job Tracker Task Tracker Task Tracker Task Tracker Name Node Data Node Data Node Data Node 20

21 Example Zones Configuration Create zones zonecfg z zone0 f zone0.config zone0.config: create; add net; set physical=interface; set address=ip;.. add fs; set dir=mount_path ; set raw=partition;.... Zone administration zoneadm z zone0 boot zoneadm list zoneadm z zone0 halt ( boot ) ( list ) ( halt ) 21

22 Example 4-LDOM Setup Evenly distributing H/W resources * LDOM/ZONE administration scripts and details available here: 22

23 Scaling Hadoop with Intra-node Virtualization ( disks 30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 ( min ) Time Mapping Shuffling Reducing ( nodes (#map, # reduce, #virtual ~100% CPU utilization with 4 logical domains 23

24 Scaling Sorting Workload ( Virtualization (Without Large data sorting performance ( disks/node (Sun Blade 6000: 10 nodes, 640 threads, 64GB RAM/node, 4 ( min ) Time Data Size CMT Hadoop systems scale nicely with larger datasets 24

25 Discovery Overview Preparing data for searching over large corpus Five phases with different MapReduce profiles 1. PipelineMapReduce Reads and parses 27GB of raw s 2. DocumentSeqFileToMapFile Prepares MapFile to retrieve data 3. PersonNormalization Groups data into unique entities 4. Consumer Creates indices 5. ThreadDetection Conversation threads detected Output is a set of shards used in an discovery search application 25

26 (/ ) Discovery 26

27 E-Discovery Results processing performance ( min ) Time 1 node 128 threads 1 node 256 threads 10 nodes 640 threads 15 nodes 60 EC2 units CMT Hadoop systems scale for throughput applications 27

28 Performance / 40U Rack processing performance normalized to a 40U rack 4.6X Relative performance X 3.1X 40 nodes 4 EC2 units / node 5 nodes 256 threads / node 40 nodes 64 threads / node 20 nodes 128 threads / node High performance with smaller datacenter footprint 28

29 MySQL Enterprise Solution Enterprise software, services delivered as annual subscription Database Monitoring Support Most up-to-date MySQL software Monthly rapid updates Quarterly service packs Hot-fix program Indemnification Subscription: MySQL Enterprise License (OEM): Embedded Server Support Virtual database assistant Global monitoring of all servers Web-based central console Built-in advisors, expert advice Problem query detection/analysis MySQL Cluster Carrier-Grade Training Consulting NRE Online self-help MySQL Knowledge Base 24/7 problem resolution with priority escalation Consultative help High-Availability and Scale-Out 29

30 Conclusions Hadoop and Java scale well on CMT systems Startup cost dominates performance on highly threaded systems (256 threads per node) Virtualization techniques enable good scalability, high system utilization and better performance > Parallelized startup > Less external node-to-node Ethernet traffic Hadoop consolidation on CMT systems reduces datacenter footprint, power and cooling costs Next-gen Hadoop focuses on performance and latency 30

31 Software Stack, Pointers to Download Sun CMT servers > Hadoop > JVM from Sun 1.6.0_13 > OpenSolaris for SPARC > LDOMs 1.1 > 31

32 Learn More Free Using LDom and CoolThreads Technology: Improving Scalability and Utilization Improving Database Scalability on T5440 Blueprint Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris Operating Systems Tech Resources tab at sun.com/mysqlsystems Try it Yourself Try free for 60 days: Sun Enterprise SPARC rack or blade systems and storage Test Hadoop on up to 128 threads 60 days to decide to buy Return and pay nothing not even shipping if you don't sun.com/tryandbuy 32

33 Scaling Hadoop for Multi-Core and Highly Threaded Systems ( [email protected] ) Jangwoo Kim ( [email protected] ) Zoran Radovic ( [email protected] ) Denis Sheahan ( [email protected] ) Joseph Gebis This is an extended version of our Hadoop Summit '09 presentation, Santa Clara, CA, June

Sun CoolThreads Servers and Zeus Technology Next-generation load balancing and application traffic management

Sun CoolThreads Servers and Zeus Technology Next-generation load balancing and application traffic management Sun CoolThreads Servers and Zeus Technology Next-generation load balancing and application traffic management < Rapid and unpredictable growth in Web services applications is driving the need for highly

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Breakthrough OLTP Database Performance and Efficiency with Sun CoolThreads Servers < Challenges for large-scale database tier infrastructures

Breakthrough OLTP Database Performance and Efficiency with Sun CoolThreads Servers < Challenges for large-scale database tier infrastructures Breakthrough OLTP Database Performance and Efficiency with Sun CoolThreads Servers Minimize cost, maximize scalability and security in the database tier. < The demand for reliable, high-performance Online

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Windows Server 2008 R2 Hyper-V Live Migration

Windows Server 2008 R2 Hyper-V Live Migration Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...

More information

Mark Bennett. Search and the Virtual Machine

Mark Bennett. Search and the Virtual Machine Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business

More information

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Scaling in a Hypervisor Environment

Scaling in a Hypervisor Environment Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled

More information

An Oracle White Paper June 2010. Consolidating Oracle Siebel CRM Environments with High Availability on Sun SPARC Enterprise Servers

An Oracle White Paper June 2010. Consolidating Oracle Siebel CRM Environments with High Availability on Sun SPARC Enterprise Servers An Oracle White Paper June 2010 Consolidating Oracle Siebel CRM Environments with High Availability on Sun SPARC Enterprise Servers Executive Overview... 1! Introduction... 2! Key Solution Technologies...

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

CON9577 Performance Optimizations for Cloud Infrastructure as a Service

CON9577 Performance Optimizations for Cloud Infrastructure as a Service CON9577 Performance Optimizations for Cloud Infrastructure as a Service John Falkenthal, Software Development Sr. Director - Oracle VM SPARC and VirtualBox Jeff Savit, Senior Principal Technical Product

More information

Windows Server 2008 R2 Hyper-V Live Migration

Windows Server 2008 R2 Hyper-V Live Migration Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

The Art of Virtualization with Free Software

The Art of Virtualization with Free Software Master on Free Software 2009/2010 {mvidal,jfcastro}@libresoft.es GSyC/Libresoft URJC April 24th, 2010 (cc) 2010. Some rights reserved. This work is licensed under a Creative Commons Attribution-Share Alike

More information

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

America s Most Wanted a metric to detect persistently faulty machines in Hadoop America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,[email protected] Presented at IFIP Workshop on Failure Diagnosis, Chicago June

More information

Liferay Performance Tuning

Liferay Performance Tuning Liferay Performance Tuning Tips, tricks, and best practices Michael C. Han Liferay, INC A Survey Why? Considering using Liferay, curious about performance. Currently implementing and thinking ahead. Running

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

SUN ORACLE EXADATA STORAGE SERVER

SUN ORACLE EXADATA STORAGE SERVER SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell

More information

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7 Introduction 1 Performance on Hosted Server 1 Figure 1: Real World Performance 1 Benchmarks 2 System configuration used for benchmarks 2 Figure 2a: New tickets per minute on E5440 processors 3 Figure 2b:

More information

Uptime Infrastructure Monitor. Installation Guide

Uptime Infrastructure Monitor. Installation Guide Uptime Infrastructure Monitor Installation Guide This guide will walk through each step of installation for Uptime Infrastructure Monitor software on a Windows server. Uptime Infrastructure Monitor is

More information

An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting [email protected]

MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com MySQL Cluster 7.0 - New Features Johan Andersson MySQL Cluster Consulting [email protected] Mat Keep MySQL Cluster Product Management [email protected] Copyright 2009 MySQL Sun Microsystems. The

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

The Methodology Behind the Dell SQL Server Advisor Tool

The Methodology Behind the Dell SQL Server Advisor Tool The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

How To Write An Article On An Hp Appsystem For Spera Hana

How To Write An Article On An Hp Appsystem For Spera Hana Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ

More information

HiBench Introduction. Carson Wang ([email protected]) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES

ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES White Paper May 2009 Abstract This paper describes reference configuration and sizing information for Oracle data warehouses on Sun servers

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

An Oracle White Paper April 2010. Oracle VM Server for SPARC Enabling a Flexible, Efficient IT Infrastructure

An Oracle White Paper April 2010. Oracle VM Server for SPARC Enabling a Flexible, Efficient IT Infrastructure An Oracle White Paper April 2010 Oracle VM Server for SPARC Enabling a Flexible, Efficient IT Infrastructure Executive Overview... 1 Introduction... 1 Improving Consolidation Strategies Through Virtualization...

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Management of VMware ESXi. on HP ProLiant Servers

Management of VMware ESXi. on HP ProLiant Servers Management of VMware ESXi on W H I T E P A P E R Table of Contents Introduction................................................................ 3 HP Systems Insight Manager.................................................

More information

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Migration Scenario: Migrating Batch Processes to the AWS Cloud Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform

More information

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Hadoop. History and Introduction. Explained By Vaibhav Agarwal Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow

More information

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.

More information

Introduction to the NI Real-Time Hypervisor

Introduction to the NI Real-Time Hypervisor Introduction to the NI Real-Time Hypervisor 1 Agenda 1) NI Real-Time Hypervisor overview 2) Basics of virtualization technology 3) Configuring and using Real-Time Hypervisor systems 4) Performance and

More information

Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring

Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring Document version 1.0 Gianluca Della Corte, IBM Tivoli Monitoring software engineer Antonio Sgro, IBM Tivoli Monitoring

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

James Serra Sr BI Architect [email protected] http://jamesserra.com/

James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/ James Serra Sr BI Architect [email protected] http://jamesserra.com/ Our Focus: Microsoft Pure-Play Data Warehousing & Business Intelligence Partner Our Customers: Our Reputation: "B.I. Voyage came

More information

Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades

Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades Executive summary... 2 Introduction... 2 Exchange 2007 Hyper-V high availability configuration...

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

USING SUN SYSTEMS TO BUILD A VIRTUAL AND DYNAMIC INFRASTRUCTURE. Jacques Bessoudo, Systems Technical Marketing. Sun BluePrints Online

USING SUN SYSTEMS TO BUILD A VIRTUAL AND DYNAMIC INFRASTRUCTURE. Jacques Bessoudo, Systems Technical Marketing. Sun BluePrints Online USING SUN SYSTEMS TO BUILD A VIRTUAL AND DYNAMIC INFRASTRUCTURE Jacques Bessoudo, Systems Technical Marketing Sun BluePrints Online Part No 820-6870-10 Revision 1.0, 12/17/08 Sun Microsystems, Inc. Table

More information

SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment

SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment Best Practices Guide www.suse.com SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment Written by B1 Systems GmbH Table of Contents Introduction...3 Use Case Overview...3

More information

Virtualization. Michael Tsai 2015/06/08

Virtualization. Michael Tsai 2015/06/08 Virtualization Michael Tsai 2015/06/08 What is virtualization? Let s first look at a video from VMware http://bcove.me/x9zhalcl Problems? Low utilization Different needs DNS DHCP Web mail 5% 5% 15% 8%

More information

VMware ESXi 3.5 update 2

VMware ESXi 3.5 update 2 VMware ESXi 3.5 update 2 VMware ESXi 3.5 Exec Summary What is it? What does it do? What is unique? Who can use it? How do you use it? Next generation, thin hypervisor for FREE Partitions servers to create

More information

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:

More information

Data Center Op+miza+on

Data Center Op+miza+on Data Center Op+miza+on Sept 2014 Jitender Sunke VP Applications, ITC Holdings Ajay Arora Sr. Director, Centroid Systems Justin Youngs Principal Architect, Oracle 1 Agenda! Introductions! Oracle VCA An

More information

Apache Hadoop Cluster Configuration Guide

Apache Hadoop Cluster Configuration Guide Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Latency Considerations for 10GBase-T PHYs

Latency Considerations for 10GBase-T PHYs Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Building a Scalable Big Data Infrastructure for Dynamic Workflows Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software 1 Who Am I? 20+ Years in Oracle & SQL Server DBA and Developer Worked for Oracle Consulting Specialize in Performance

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

http://support.oracle.com/

http://support.oracle.com/ Oracle Primavera Contract Management 14.0 Sizing Guide October 2012 Legal Notices Oracle Primavera Oracle Primavera Contract Management 14.0 Sizing Guide Copyright 1997, 2012, Oracle and/or its affiliates.

More information

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11 Oracle Primavera Contract Management 14.1 Sizing Guide July 2014 Contents Introduction... 5 Contract Management Database Server... 5 Requirements of the Contract Management Web and Application Servers...

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

An Oracle White Paper August 2011. Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

An Oracle White Paper August 2011. Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability An Oracle White Paper August 2011 Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability Note This whitepaper discusses a number of considerations to be made when

More information

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that

More information

HP Server Management Packs for Microsoft System Center Essentials User Guide

HP Server Management Packs for Microsoft System Center Essentials User Guide HP Server Management Packs for Microsoft System Center Essentials User Guide Part Number 460344-001 September 2007 (First Edition) Copyright 2007 Hewlett-Packard Development Company, L.P. The information

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information