Scaling Hadoop for Multi-Core and Highly Threaded Systems
|
|
|
- Christina Bond
- 10 years ago
- Views:
Transcription
1 Scaling Hadoop for Multi-Core and Highly Threaded Systems Jangwoo Kim, Zoran Radovic Performance Architects Architecture Technology Group Sun Microsystems Inc.
2 Project Overview Hadoop Updates CMT Hadoop Systems Scaling Hadoop on CMT Virtualization Technologies Zones Logical Domains AGENDA Case Study: Discovery Conclusions 2
3 Project Overview Chip Multi-Threading (CMT) processors and Hadoop are designed for maximum throughput Sun's JVM optimized for CMT > Java has been widely deployed by many customers on CMT Hadoop is written with Java an ideal throughput candidate Seemed like a great fit for Hadoop with the potential for a greatly reduced footprint Related Work by Ning Sun and Lee Anne Simmons: > Blueprint: Using Logical Domains and CoolThreads Technology: Improving Scalability and System Utilization 3
4 700+ attendees.. 4
5 Hadoop Expands Beyond the Web... Some examples from the Summit Genetic Sequence Analysis Parallel Data Mining in Telco Natural language learning Business Fraud Detection Clinical Trials Retail Business Planning... 5
6 Map/Reduce Organization Input HDFS map copy sort/merge Output HDFS Split 0 reduce Part 0 Split 1 Split 2 map Split 3 sort/merge reduce Part 1 map 6
7 Next-Gen Hadoop Low Latency Focus Hadoop is traditionally optimized for throughput World Record Sort source code changes Winning a 60 Second Dash with a Yellow Elephant Reducer Improvements (Shuffle); memory to memory merge Fetch of multiple map outputs from the same node Reduces number of server connections Improved timeout behavior Better data corruption detection (CRC32 improvements) Map output compression (45% of the original size) Improved and multi-threaded data partitioning Lower latency with faster heartbeat 7
8 OpenSolaris OpenSolaris Moves Into Enterprise UltraSPARC T1 and T2 Support, Sun4u 5 Year Enterprise Support Datacenter-Ready Installation New and Modern Networking Stack Multi-Core Optimized Easy Network Virtualizetion and Resource Control Powerful, Built-in and Free Virtualization Techology 8
9 UltraSPARC T2 Processor 8 SPARC V9 1.4Ghz > 8 vertical threads per core > 2 execution pipelines per core > 1 instruction/cycle per pipeline > 1 FPU per core > 1 SPU (crypto) per core > 4MB 16-way 8-bank L2$ 64 threads 2.5Ghz x8 PCI-Express interface 2 x 10Gb on-chip Ethernet Crypto processor per core Power: 84 watts (typical) 42 GB/s read, 21 GB/s write 9
10 CMT Hadoop Systems T5440 4U 2P US-T2 Plus Platform & Sun Storage J4400 Blade U US-T2 Blades T5240 2U 2P US-T2 Plus Server 10
11 CMT Hadoop Node and Rack Specs * 11
12 Ideal Performance Model job start all maps start all maps finish mapping job completion shuffle start all reduces start all reduces finish shuffling reducing All tasks start and finish simultaneously 12
13 Performance Model with Serialized Tasks job start first map start last map starts last map finishes mapping job completion first map finishes last shuffle starts last reduce starts last reduce finishes shuffling reducing Launching many tasks can incur significant overhead 13
14 Distributed Performance Data Collection Created a set of scripts to facilitate distributed execution for performance data collection and analysis Based on traditional single-node system analysis tools mpstat, nicstat, iostat, vmstat,... Varaiable sampling frequency to monitor hardware utilization Pinpoint which resource is a bottleneck at any point CPU utilization, network, disk I/O Periods where no resource is fully utilized may indicate poorlytuned Hadoop configuration or other system issues Hadoop log processing to monitor Hadoop task timeline Examine startup rate, Hadoop phase overlap Scripts and details are available here: 14
15 Serialized Task Launching Overhead ( disks 30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 ( min ) Time Mapping Shuffling Reducing ( reduce (#map, # <60% CPU utilization Significant launching overhead limits scalability 15
16 10-Node 150G Sort Task Timeline ( threads Detailed Look: One T2 Blade (64 16
17 10-Node 150G Sort Utilization Stats 17
18 Intra-node Virtualization: ( LDOMs ) Logical Domains Hardware-assisted Virtualization Single hypervisor > OS-Level Isolation > Dedicated H/W threads and memory Logical Domain 0 Logical Domain 1 Logical Domain 2 Logical Domain N Job Tracker Task Tracker Task Tracker Task Tracker Name Node Data Node Data Node Data Node Hypervisor 18
19 Example LDOMs Configuration Single control domain > Virtual disk server (vds) > Virtual network switch (vsw) > Virtual console concentrator (vcc) Multiple logical domains ldm add-vcpu 8 ldom0 ldm add-memory 16G ldom0 ( cpu ) ( memory ) ( disk ) ldm add-vdisk vdisk0 control-vds ldom0 ldm add-vnet vnet0 control-vsw ldom0 Single control domain ldm bind ldom0 (bind) ldm start ldom0 (boot) > OS Install as usual ( network ) 19
20 Intra-node Virtualization: Zones ( Containers ) Software (OS) Virtualization Single operating system > Application-Level Isolation > No H/W threads and memory dedicated Zone 0 Zone 1 Zone 2 Zone N Job Tracker Task Tracker Task Tracker Task Tracker Name Node Data Node Data Node Data Node 20
21 Example Zones Configuration Create zones zonecfg z zone0 f zone0.config zone0.config: create; add net; set physical=interface; set address=ip;.. add fs; set dir=mount_path ; set raw=partition;.... Zone administration zoneadm z zone0 boot zoneadm list zoneadm z zone0 halt ( boot ) ( list ) ( halt ) 21
22 Example 4-LDOM Setup Evenly distributing H/W resources * LDOM/ZONE administration scripts and details available here: 22
23 Scaling Hadoop with Intra-node Virtualization ( disks 30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 ( min ) Time Mapping Shuffling Reducing ( nodes (#map, # reduce, #virtual ~100% CPU utilization with 4 logical domains 23
24 Scaling Sorting Workload ( Virtualization (Without Large data sorting performance ( disks/node (Sun Blade 6000: 10 nodes, 640 threads, 64GB RAM/node, 4 ( min ) Time Data Size CMT Hadoop systems scale nicely with larger datasets 24
25 Discovery Overview Preparing data for searching over large corpus Five phases with different MapReduce profiles 1. PipelineMapReduce Reads and parses 27GB of raw s 2. DocumentSeqFileToMapFile Prepares MapFile to retrieve data 3. PersonNormalization Groups data into unique entities 4. Consumer Creates indices 5. ThreadDetection Conversation threads detected Output is a set of shards used in an discovery search application 25
26 (/ ) Discovery 26
27 E-Discovery Results processing performance ( min ) Time 1 node 128 threads 1 node 256 threads 10 nodes 640 threads 15 nodes 60 EC2 units CMT Hadoop systems scale for throughput applications 27
28 Performance / 40U Rack processing performance normalized to a 40U rack 4.6X Relative performance X 3.1X 40 nodes 4 EC2 units / node 5 nodes 256 threads / node 40 nodes 64 threads / node 20 nodes 128 threads / node High performance with smaller datacenter footprint 28
29 MySQL Enterprise Solution Enterprise software, services delivered as annual subscription Database Monitoring Support Most up-to-date MySQL software Monthly rapid updates Quarterly service packs Hot-fix program Indemnification Subscription: MySQL Enterprise License (OEM): Embedded Server Support Virtual database assistant Global monitoring of all servers Web-based central console Built-in advisors, expert advice Problem query detection/analysis MySQL Cluster Carrier-Grade Training Consulting NRE Online self-help MySQL Knowledge Base 24/7 problem resolution with priority escalation Consultative help High-Availability and Scale-Out 29
30 Conclusions Hadoop and Java scale well on CMT systems Startup cost dominates performance on highly threaded systems (256 threads per node) Virtualization techniques enable good scalability, high system utilization and better performance > Parallelized startup > Less external node-to-node Ethernet traffic Hadoop consolidation on CMT systems reduces datacenter footprint, power and cooling costs Next-gen Hadoop focuses on performance and latency 30
31 Software Stack, Pointers to Download Sun CMT servers > Hadoop > JVM from Sun 1.6.0_13 > OpenSolaris for SPARC > LDOMs 1.1 > 31
32 Learn More Free Using LDom and CoolThreads Technology: Improving Scalability and Utilization Improving Database Scalability on T5440 Blueprint Deploying Web 2.0 Applications on Sun Servers and the OpenSolaris Operating Systems Tech Resources tab at sun.com/mysqlsystems Try it Yourself Try free for 60 days: Sun Enterprise SPARC rack or blade systems and storage Test Hadoop on up to 128 threads 60 days to decide to buy Return and pay nothing not even shipping if you don't sun.com/tryandbuy 32
33 Scaling Hadoop for Multi-Core and Highly Threaded Systems ( [email protected] ) Jangwoo Kim ( [email protected] ) Zoran Radovic ( [email protected] ) Denis Sheahan ( [email protected] ) Joseph Gebis This is an extended version of our Hadoop Summit '09 presentation, Santa Clara, CA, June
Sun CoolThreads Servers and Zeus Technology Next-generation load balancing and application traffic management
Sun CoolThreads Servers and Zeus Technology Next-generation load balancing and application traffic management < Rapid and unpredictable growth in Web services applications is driving the need for highly
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Breakthrough OLTP Database Performance and Efficiency with Sun CoolThreads Servers < Challenges for large-scale database tier infrastructures
Breakthrough OLTP Database Performance and Efficiency with Sun CoolThreads Servers Minimize cost, maximize scalability and security in the database tier. < The demand for reliable, high-performance Online
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013
Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay
Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Scaling in a Hypervisor Environment
Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled
An Oracle White Paper June 2010. Consolidating Oracle Siebel CRM Environments with High Availability on Sun SPARC Enterprise Servers
An Oracle White Paper June 2010 Consolidating Oracle Siebel CRM Environments with High Availability on Sun SPARC Enterprise Servers Executive Overview... 1! Introduction... 2! Key Solution Technologies...
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
HADOOP PERFORMANCE TUNING
PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
CON9577 Performance Optimizations for Cloud Infrastructure as a Service
CON9577 Performance Optimizations for Cloud Infrastructure as a Service John Falkenthal, Software Development Sr. Director - Oracle VM SPARC and VirtualBox Jeff Savit, Senior Principal Technical Product
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
The Art of Virtualization with Free Software
Master on Free Software 2009/2010 {mvidal,jfcastro}@libresoft.es GSyC/Libresoft URJC April 24th, 2010 (cc) 2010. Some rights reserved. This work is licensed under a Creative Commons Attribution-Share Alike
America s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,[email protected] Presented at IFIP Workshop on Failure Diagnosis, Chicago June
Liferay Performance Tuning
Liferay Performance Tuning Tips, tricks, and best practices Michael C. Han Liferay, INC A Survey Why? Considering using Liferay, curious about performance. Currently implementing and thinking ahead. Running
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
SUN ORACLE EXADATA STORAGE SERVER
SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand
White Paper. Recording Server Virtualization
White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell
Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7
Introduction 1 Performance on Hosted Server 1 Figure 1: Real World Performance 1 Benchmarks 2 System configuration used for benchmarks 2 Figure 2a: New tickets per minute on E5440 processors 3 Figure 2b:
Uptime Infrastructure Monitor. Installation Guide
Uptime Infrastructure Monitor Installation Guide This guide will walk through each step of installation for Uptime Infrastructure Monitor software on a Windows server. Uptime Infrastructure Monitor is
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting [email protected]
MySQL Cluster 7.0 - New Features Johan Andersson MySQL Cluster Consulting [email protected] Mat Keep MySQL Cluster Product Management [email protected] Copyright 2009 MySQL Sun Microsystems. The
Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
The Methodology Behind the Dell SQL Server Advisor Tool
The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
Cloud Computing. Adam Barker
Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES
ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES White Paper May 2009 Abstract This paper describes reference configuration and sizing information for Oracle data warehouses on Sun servers
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
An Oracle White Paper April 2010. Oracle VM Server for SPARC Enabling a Flexible, Efficient IT Infrastructure
An Oracle White Paper April 2010 Oracle VM Server for SPARC Enabling a Flexible, Efficient IT Infrastructure Executive Overview... 1 Introduction... 1 Improving Consolidation Strategies Through Virtualization...
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Management of VMware ESXi. on HP ProLiant Servers
Management of VMware ESXi on W H I T E P A P E R Table of Contents Introduction................................................................ 3 HP Systems Insight Manager.................................................
Migration Scenario: Migrating Batch Processes to the AWS Cloud
Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform
Hadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.
Introduction to the NI Real-Time Hypervisor
Introduction to the NI Real-Time Hypervisor 1 Agenda 1) NI Real-Time Hypervisor overview 2) Basics of virtualization technology 3) Configuring and using Real-Time Hypervisor systems 4) Performance and
Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring
Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring Document version 1.0 Gianluca Della Corte, IBM Tivoli Monitoring software engineer Antonio Sgro, IBM Tivoli Monitoring
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
James Serra Sr BI Architect [email protected] http://jamesserra.com/
James Serra Sr BI Architect [email protected] http://jamesserra.com/ Our Focus: Microsoft Pure-Play Data Warehousing & Business Intelligence Partner Our Customers: Our Reputation: "B.I. Voyage came
Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades
Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades Executive summary... 2 Introduction... 2 Exchange 2007 Hyper-V high availability configuration...
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
USING SUN SYSTEMS TO BUILD A VIRTUAL AND DYNAMIC INFRASTRUCTURE. Jacques Bessoudo, Systems Technical Marketing. Sun BluePrints Online
USING SUN SYSTEMS TO BUILD A VIRTUAL AND DYNAMIC INFRASTRUCTURE Jacques Bessoudo, Systems Technical Marketing Sun BluePrints Online Part No 820-6870-10 Revision 1.0, 12/17/08 Sun Microsystems, Inc. Table
SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment
Best Practices Guide www.suse.com SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment Written by B1 Systems GmbH Table of Contents Introduction...3 Use Case Overview...3
Virtualization. Michael Tsai 2015/06/08
Virtualization Michael Tsai 2015/06/08 What is virtualization? Let s first look at a video from VMware http://bcove.me/x9zhalcl Problems? Low utilization Different needs DNS DHCP Web mail 5% 5% 15% 8%
VMware ESXi 3.5 update 2
VMware ESXi 3.5 update 2 VMware ESXi 3.5 Exec Summary What is it? What does it do? What is unique? Who can use it? How do you use it? Next generation, thin hypervisor for FREE Partitions servers to create
Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju
Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:
Data Center Op+miza+on
Data Center Op+miza+on Sept 2014 Jitender Sunke VP Applications, ITC Holdings Ajay Arora Sr. Director, Centroid Systems Justin Youngs Principal Architect, Oracle 1 Agenda! Introductions! Oracle VCA An
Apache Hadoop Cluster Configuration Guide
Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Latency Considerations for 10GBase-T PHYs
Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
Building a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software
Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software 1 Who Am I? 20+ Years in Oracle & SQL Server DBA and Developer Worked for Oracle Consulting Specialize in Performance
Data Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
http://support.oracle.com/
Oracle Primavera Contract Management 14.0 Sizing Guide October 2012 Legal Notices Oracle Primavera Oracle Primavera Contract Management 14.0 Sizing Guide Copyright 1997, 2012, Oracle and/or its affiliates.
Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11
Oracle Primavera Contract Management 14.1 Sizing Guide July 2014 Contents Introduction... 5 Contract Management Database Server... 5 Requirements of the Contract Management Web and Application Servers...
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
An Oracle White Paper August 2011. Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability
An Oracle White Paper August 2011 Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability Note This whitepaper discusses a number of considerations to be made when
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
HP Server Management Packs for Microsoft System Center Essentials User Guide
HP Server Management Packs for Microsoft System Center Essentials User Guide Part Number 460344-001 September 2007 (First Edition) Copyright 2007 Hewlett-Packard Development Company, L.P. The information
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
