Linux Performance Optimizations for Big Data Environments
|
|
- Gertrude Hutchinson
- 8 years ago
- Views:
Transcription
1 Linux Performance Optimizations for Big Data Environments Dominique A. Heger Ph.D. DHTechnologies (Performance, Capacity, Scalability) Data Nubes (Big Data, Hadoop, ML)
2 Performance & Capacity Studies Availability & Reliability Studies Systems Modeling Scalability & Speedup Studies Linux & UNIX Internals Design, Architecture & Feasibility Studies Systems Stress- Testing & Benchmarking Cloud Computing Research, Education & Training Machine Learning Operations Research BI, Data Analytics, Data Mining, Predictive Analytics Hadoop Ecosystem & MapReduce
3 Agenda Linux & Big Data (Hadoop Ecosystem) Performance Management Methodology Linux 3.x Task & I/O Framework Quantifying Linux & Application Performance Q&A
4 Linux Engineers Big Demand & Small Talent Pool Big Data, Hadoop Ecosystem & Cloud Computing in general is powered by Linux 91.4% of the top 500 supercomputers are Linux-based (Source TOP500, 2012) Linux Talent needed now A 2013 job report compiled by Dice showed that 93% of the contacted US companies (850 firms) are hiring Linux professionals this year (2013) The same study revealed that 90% of the firms stated that it is very difficult at the moment to even find Linux talent in the US. This number is up from 80% for the 2012 study According to Dice, the average salary increase for a Linux professional in the US is approximately 9% this year. At the same time, the average IT salary increase in the US is approximately 5%
5 Hadoop Ecosystem (Partial View) Twitter Real-Time Processing Data Handlers Data Serialization System Configuration Management Tools KAFKA Distributed Messaging System Schedulers RDBMS Database & No
6 Hadoop Linux Interaction Language Abstraction Java API MapReduce Framework(*) Hadoop Distributed Filesystem Hadoop (*) Some Hadoop Projects Bypass MapReduce Linux OS Node HW OS & Local FS HW Components
7 Hadoop MR2 Environment
8 Performance Management - Building Blocks Phase 1: Understand Goals & Objectives Phase 2: Phase 3: HW Profiles Workload Profiles Application & OS Traces Data Post-Processing Phase 4: Performance Study CSA Study Phase 5: Capacity Study Scalability Study Speedup Study
9 Performance Evaluation - Goals & Objectives Identify bottlenecks, predict future capacity shortcomings, and determine the most adequate (cost effective) way to configure, tune, and optimize computing environments to overcome performance problems and cope with increasing application workload demands. Combination of analytical, simulation, and empirical study based approaches that utilizes tracing techniques, HW profiles, actual application workload profiles, application log files, and performance data collected either in a Lab or production environment. If no empirical data is available, performance budgets are being used (PE). 9
10 Application Centric Systems Analysis System Hierarchy Application Abstraction Operating System Abstraction Hardware Abstraction Performance Hierarchy Application Vector Performance Code Path - Application to OS Interface Performance Code Path - High to Low Level OS Interface Performance Code Path - OS to HW Interface OS Vector Application Primitives High-Level OS Primitives Low-Level OS Primitives Hardware Primitives Process/Thread Monitors Application Trace Tools & Macro Benchmarks OS Performance Tools & Micro Benchmarks
11 Linux & Hadoop Tools & Techniques Linux Performance Evaluation Tools (Code Path Analysis) strace nmon, blktrace, blkparse, btt, blkiomon, iostat perf valgrind, kcachegrind Workload Generators (Macro & Micro Benchmarks) DHTUX toolset (Unix, Linux 46 systems benchmarks) TeraSort (Hadoop) K-Means Clustering (Hadoop) Bayesian Classification (Hadoop)
12 Performance by the Numbers (Ballpark Figures) L1 cache reference TLB miss Branch misprediction L2 cache reference Mutex lock/unlock Main memory reference Compress 1Kbytes with Zippy Send 2Kbytes over 1Gbps network Read 1MB sequentially from memory Round trip within same datacenter Disk seek (HD) Read 1MB sequentially from disk (HD) Send packet CA->UK->CA 1ns 4ns 5ns 7ns 25ns 100ns 3,000ns 20,000ns 250,000ns 500,000ns 10,000,000ns 20,000,000ns 150,000,000ns Execute Micro & Macro Benchmarks to Baseline the HW and the OS Hadoop MapReduce: With large-scale projects, the performance focus is on disk and interconnect/network performance rather than on the CPU and the DRAM subsystems
13 Micro & Macro Benchmarks Benchmarking & Stress-Testing the HW & the OS prior to deploying the Cluster Nodes Establish a Sound Performance Baseline
14 Application User Space Linux I/O Requests File System Layer Linux bio Layer Linux dequeue Function I/O Task Queue Linux enqueue Function Device Driver I/O Scheduler Disk/RAID/SAN Subsystem
15 Linux 3.x IO Schedulers (3.x) CFQ (default) synchronous verses asynchronous requests, IO priority, read favored over write requests, time-out value noop unordered FIFO queue, only merging, good for environments where IO is optimized at a lower level Deadline 5 IO queues, reorder requests, deadline value, read favored over write requests
16 Application Layer - strace
17 Kernel Layer - blktrace/blkparse
18 Kernel Layer - blktrace - Summary
19 Kernel Layer - btt
20 Kernel Layer - btt (time-line)
21 perf Linux Performance Tool
22 valgrin memcheck (Memory Leaks)
23 valgrin kcachegrin (Call Profiler)
24 Q & A
LARGE, DISTRIBUTED COMPUTING INFRASTRUCTURES OPPORTUNITIES & CHALLENGES. Dominique A. Heger Ph.D. DHTechnologies, Data Nubes Austin, TX, USA
LARGE, DISTRIBUTED COMPUTING INFRASTRUCTURES OPPORTUNITIES & CHALLENGES Dominique A. Heger Ph.D. DHTechnologies, Data Nubes Austin, TX, USA Performance & Capacity Studies Availability & Reliability Studies
More informationZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG
ZingMe Practice For Building Scalable PHP Website By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG Agenda About ZingMe Scaling PHP application Scalability definition Scaling up vs
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationEfficient Network Marketing - Fabien Hermenier A.M.a.a.a.C.
the road to cloud native applications Fabien Hermenier 1 cloud ready applications single-tiered monolithic hardware specific cloud native applications leverage cloud services scalable reliable 2 Agenda
More informationDatacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationPERFORMANCE TUNING ORACLE RAC ON LINUX
PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationPerformance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationPerformance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationNext Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
More informationAudit & Tune Deliverables
Audit & Tune Deliverables The Initial Audit is a way for CMD to become familiar with a Client's environment. It provides a thorough overview of the environment and documents best practices for the PostgreSQL
More informationPerformance Tuning and Optimizing SQL Databases 2016
Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students
More informationLinux Block I/O Scheduling. Aaron Carroll aaronc@gelato.unsw.edu.au December 22, 2007
Linux Block I/O Scheduling Aaron Carroll aaronc@gelato.unsw.edu.au December 22, 2007 As of version 2.6.24, the mainline Linux tree provides four block I/O schedulers: Noop, Deadline, Anticipatory (AS)
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More information2 Purpose. 3 Hardware enablement 4 System tools 5 General features. www.redhat.com
A Technical Introduction to Red Hat Enterprise Linux 5.4 The Enterprise LINUX Team 2 Purpose 3 Systems Enablement 3 Hardware enablement 4 System tools 5 General features 6 Virtualization 7 Conclusion www.redhat.com
More informationReference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware
Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference
More informationKVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011
KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS Mark Wagner Principal SW Engineer, Red Hat August 14, 2011 1 Overview Discuss a range of topics about KVM performance How to improve out of the box experience
More informationMixing Hadoop and HPC Workloads on Parallel Filesystems
Mixing Hadoop and HPC Workloads on Parallel Filesystems Esteban Molina-Estolano *, Maya Gokhale, Carlos Maltzahn *, John May, John Bent, Scott Brandt * * UC Santa Cruz, ISSDM, PDSI Lawrence Livermore National
More informationA Framework for Performance Analysis and Tuning in Hadoop Based Clusters
A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationMambo Running Analytics on Enterprise Storage
Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin 1, Gokul Soundararajan Advanced Technology Group 1 University of Utah Motivation No easy way to analyze data stored in enterprise storage
More informationStorage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
More informationBig Data Performance Growth on the Rise
Impact of Big Data growth On Transparent Computing Michael A. Greene Intel Vice President, Software and Services Group, General Manager, System Technologies and Optimization 1 Transparent Computing (TC)
More informationSTeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions
11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect,
More informationRed Hat Linux Internals
Red Hat Linux Internals Learn how the Linux kernel functions and start developing modules. Red Hat Linux internals teaches you all the fundamental requirements necessary to understand and start developing
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationExar. Optimizing Hadoop Is Bigger Better?? March 2013. sales@exar.com. Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000. www.exar.
Exar Optimizing Hadoop Is Bigger Better?? sales@exar.com Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000 March 2013 www.exar.com Section I: Exar Introduction Exar Corporate Overview Section II:
More informationPerformance Testing at Scale
Performance Testing at Scale An overview of performance testing at NetApp. Shaun Dunning shaun.dunning@netapp.com 1 Outline Performance Engineering responsibilities How we protect performance Overview
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace Gelato Cupertino, CA April 2006 Alan D. Brunelle Hewlett Packard Company Open Source and Linux Organization Scalability & Performance Group Alan.Brunelle@hp.com 1 Introduction
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationSystem Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationNoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
More informationHadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013
Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay
More informationWeb Application s Performance Testing
Web Application s Performance Testing B. Election Reddy (07305054) Guided by N. L. Sarda April 13, 2008 1 Contents 1 Introduction 4 2 Objectives 4 3 Performance Indicators 5 4 Types of Performance Testing
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationOperating System Components and Services
Operating System Components and Services Tom Kelliher, CS 311 Feb. 6, 2012 Announcements: From last time: 1. System architecture issues. 2. I/O programming. 3. Memory hierarchy. 4. Hardware protection.
More informationHiBench Installation. Sunil Raiyani, Jayam Modi
HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationFPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
More informationOptimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
More informationDSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationLoad Testing Analysis Services Gerhard Brückl
Load Testing Analysis Services Gerhard Brückl About Me Gerhard Brückl Working with Microsoft BI since 2006 Mainly focused on Analytics and Reporting Analysis Services / Reporting Services Power BI / O365
More informationOperating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationHDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.
HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework
More informationDEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER
DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND
More informationOPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationSupercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
More informationSQL Server Performance Tuning and Optimization
3 Riverchase Office Plaza Hoover, Alabama 35244 Phone: 205.989.4944 Fax: 855.317.2187 E-Mail: rwhitney@discoveritt.com Web: www.discoveritt.com SQL Server Performance Tuning and Optimization Course: MS10980A
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationClash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, and Fatma Özcan IBM Research China IBM Almaden
More informationBuilding All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun (dr.jhyun@sk.com) Storage Tech. Lab SK Telecom
Building All-Flash Software Defined Storages for Datacenters Ji Hyuck Yun (dr.jhyun@sk.com) Storage Tech. Lab SK Telecom Introduction R&D Motivation Synergy between SK Telecom and SK Hynix Service & Solution
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationDatacenter Software and Client Software 廖 世 偉
Datacenter Software and Client Software 廖 世 偉 November 2009 Announcement 12/7: Quiz to help recap MapReduce (Greg Malewicz) 12/14: Originally should be Harvard Professor HT Kung s special lecture. Now
More informationVirtualizing a Virtual Machine
Virtualizing a Virtual Machine Azeem Jiva Shrinivas Joshi AMD Java Labs TS-5227 Learn best practices for deploying Java EE applications in virtualized environment 2008 JavaOne SM Conference java.com.sun/javaone
More informationJun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationPERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India sudha.mooki@gmail.com 2 Department
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationWinning the J2EE Performance Game Presented to: JAVA User Group-Minnesota
Winning the J2EE Performance Game Presented to: JAVA User Group-Minnesota Michelle Pregler Ball Emerging Markets Account Executive Shahrukh Niazi Sr.System Consultant Java Solutions Quest Background Agenda
More informationStingray Traffic Manager Sizing Guide
STINGRAY TRAFFIC MANAGER SIZING GUIDE 1 Stingray Traffic Manager Sizing Guide Stingray Traffic Manager version 8.0, December 2011. For internal and partner use. Introduction The performance of Stingray
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationPerformance Architect Remote Storage (Intern)
Performance Architect Remote Storage (Intern) Samsung Semiconductor, Inc. is a world leader in Memory, System LSI and LCD technologies. We are currently looking for a Performance Architect (Intern) to
More informationIntroduction. Various user groups requiring Hadoop, each with its own diverse needs, include:
Introduction BIG DATA is a term that s been buzzing around a lot lately, and its use is a trend that s been increasing at a steady pace over the past few years. It s quite likely you ve also encountered
More informationVMware vsphere 4.1 with ESXi and vcenter
VMware vsphere 4.1 with ESXi and vcenter This powerful 5-day class is an intense introduction to virtualization using VMware s vsphere 4.1 including VMware ESX 4.1 and vcenter. Assuming no prior virtualization
More informationViolin: A Framework for Extensible Block-level Storage
Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada flouris@cs.toronto.edu Angelos Bilas ICS-FORTH & University of Crete, Greece
More informationPERFORMANCE TESTING. New Batches Info. We are ready to serve Latest Testing Trends, Are you ready to learn.?? START DATE : TIMINGS : DURATION :
PERFORMANCE TESTING We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : Performance
More informationWorkload Dependent Hadoop MapReduce Application Performance Modeling
Workload Dependent Hadoop MapReduce Application Performance Modeling Introduction In any distributed computing environment, performance optimization, job runtime predictions, or capacity and scalability
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationAbove the clouds: A Berkeley View of Cloud Computing
Partial Review-2 On the paper Above the clouds: A Berkeley View of Cloud Computing By Nikhil Ramteke Sr. No.- 07125 6. Cloud Computing Economics Observation in Cloud Economics mainly concerns with following
More informationBig Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big
More informationTask Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
More informationCloud Operating Systems for Servers
Cloud Operating Systems for Servers Mike Day Distinguished Engineer, Virtualization and Linux August 20, 2014 mdday@us.ibm.com 1 What Makes a Good Cloud Operating System?! Consumes Few Resources! Fast
More informationChapter 3 Operating-System Structures
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
More informationTomcat Tuning. Mark Thomas April 2009
Tomcat Tuning Mark Thomas April 2009 Who am I? Apache Tomcat committer Resolved 1,500+ Tomcat bugs Apache Tomcat PMC member Member of the Apache Software Foundation Member of the ASF security committee
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationDuke University http://www.cs.duke.edu/starfish
Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists
More informationAn Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationMaximizing Hadoop Performance with Hardware Compression
Maximizing Hadoop Performance with Hardware Compression Robert Reiner Director of Marketing Compression and Security Exar Corporation November 2012 1 What is Big? sets whose size is beyond the ability
More informationMicrosoft SQL Server: MS-10980 Performance Tuning and Optimization Digital
coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and
More informationMySQL performance in a cloud. Mark Callaghan
MySQL performance in a cloud Mark Callaghan Special thanks Eric Hammond (http://www.anvilon.com) provided documentation that made all of my work much easier. What is this thing called a cloud? Deployment
More informationCOS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Introduction u Have
More information