Identifying Performance Bottlenecks in Hive: Use of Processor Counters

Similar documents
Benchmarking Cassandra on Violin

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Binary search tree with SIMD bandwidth optimization using SSE

Benchmarking Hadoop & HBase on Violin

FLOW-3D Performance Benchmark and Profiling. September 2012

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

BPOE Research Highlights

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

A Brief Introduction to Apache Tez

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

Can the Elephants Handle the NoSQL Onslaught?

Big Data Analytics - Accelerated. stream-horizon.com

Enterprise Applications

Unified Big Data Processing with Apache Spark. Matei

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Can t We All Just Get Along? Spark and Resource Management on Hadoop

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Application of Predictive Analytics for Better Alignment of Business and IT

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Big Fast Data Hadoop acceleration with Flash. June 2013

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Use of Hadoop File System for Nuclear Physics Analyses in STAR

arxiv: v1 [cs.dc] 20 Apr 2015

Architecture Support for Big Data Analytics

Specification and Implementation of Dynamic Web Site Benchmarks. Sameh Elnikety Department of Computer Science Rice University

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Evaluating HDFS I/O Performance on Virtualized Systems

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Moving From Hadoop to Spark

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Virtuoso and Database Scalability

Ali Ghodsi Head of PM and Engineering Databricks

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning

CSE-E5430 Scalable Cloud Computing Lecture 2

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Architectures for Big Data Analytics A database perspective

Introducing EEMBC Cloud and Big Data Server Benchmarks

Experiences with Lustre* and Hadoop*

Oracle Big Data SQL Technical Update

Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

SQL Server 2012 Performance White Paper

Dell Reference Configuration for Hortonworks Data Platform

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

2009 Oracle Corporation 1

Analysis of VDI Storage Performance During Bootstorm

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Delivering Quality in Software Performance and Scalability Testing

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

HP reference configuration for entry-level SAS Grid Manager solutions

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Intel RAID Performance 12Gb/s SAS RAID Controllers

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Conquering Big Data with BDAS (Berkeley Data Analytics)

Practical Performance Understanding the Performance of Your Application

SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

An Oracle White Paper August Oracle WebCenter Content 11gR1 Performance Testing Results

Automating Big Data Benchmarking for Different Architectures with ALOJA

Business white paper. HP Process Automation. Version 7.0. Server performance

Characterizing Task Usage Shapes in Google s Compute Clusters

How To Write A Bigbench Benchmark For A Retailer

Quantcast Petabyte Storage at Half Price with QFS!

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Unified Big Data Analytics Pipeline. 连 城

Best Practices for Hadoop Data Analysis with Tableau

Recommended hardware system configurations for ANSYS users

Transcription:

Identifying Performance Bottlenecks in Hive: Use of Processor Counters Alexander C Shulyak, Lizy K John Presented By: Shuang Song

Problem Businesses and online services increasingly rely on insights derived from data analytics applications Targeted promotional advertising Personalized content and experiences Streamlining business operations Sales and market analysis Amount of data being collected increasing exponentially Distributed SQL Query Engines (DSQEs) increasingly used as Decision Support System (DSS) to process large amounts of data at scale: Hive, Shark, Impala, etc. What are the performance bottlenecks of a DSQE running DSS queries? What are the performance trade-offs of a DSQE over a traditional Relational Database Management System? 2

Hive Overview Hadoop Architecture; from [1] Data warehousing and query processing framework for large database Uses Hadoop HDFS and MapReduce framework Hive converts SQL-like queries to a set of MapReduce jobs for Hadoop, monitors progress, returns result Yarn Resource Manager allows custom execution engines Tez execution engine improves performance by modeling query as DAG of MapReduce Jobs; optimizing execution of entire DAG [1] 3 Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, and Carlo Curino. Apache tez: A unifying framework for modeling and building data processing applications. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 15, pages 1357 1369, New York, NY, USA, 2015. ACM.

Previous Work Panda et al.; SBAC-PAD 2015; Performance Characterization of Modern Databases on Out-of-Order CPUs Performance analysis of: MySQL, Cassandra, MongoDB, VoltDB Ins/Data cache and TLB stressed significantly MySQL Comparatively best throughput and latency for all workloads Wouw et al.; ICPE 2015; An Empirical Performance Evaluation of Distributed SQL Query Engines Analyzed Shark, Hive with MapReduce, and Impala Propose Micro-benchmarking suite and empirical method for evaluation performance for Distributed SQL Query Engines (DSQEs) Hive with MapReduce is outperformed by all other database options; it experiences high network I/O and framework overhead. 4

Previous Work (Cont.) Floratou et al; VLDB 2014; Sql-on-hadoop: Full circle back to sharednothing database architectures Study included Hive with MapReduce, Hive with Tez, and Impala with 21-node cluster running 1TB TPC-H database Only Hive with MapReduce impacted by startup and scheduling overheads Impala shared-nothing SQL on Hadoop database vastly improved performance when workloads fit in memory Hive with MapReduce and Hive with Tez CPU bound, especially on scan operations 5

Motivation: Hive struggles to beat MySQL despite 6X available cores 250 221 Execution Time Hive still needed due to its scale-out potential 200 150 148 171 Algorithm, code base, OS, and CPU resources could cause computational performance differences between Hive and MySQL Root cause analysis needed Time = (1/f * CPI * Total Ins) / TLP 100 50 0 113 104.4 88.2 81 71 38 49 48 23 Query1 Query3 Query6 Query14 Query19 Average MySQL 10GB HIVE TEZ 10GB 6

Experimental Setup Database Details Hive 1.2.1 Hadoop 2.7.0; Tez 0.7.0 MySQL 14.14 Benchmark Details TPC-H 10GB database, queries 1, 3, 6, 14, 19 Server Setup Single-node 6 (12) core Intel Xeon E5-2430v2 @2.5Ghz 32KB private L1i and L1d, 256KB combined L2, 15MB LLC shared 64GB DDR3 @ 1600Mhz; 750GB HDD @7200rpm OpenJDK 1.8.0 7

Methodology Processor Performance Counters: Perf 3.19 IPC, CPI, MPKI, CS PKI Huge number of counters, hundreds of factors: Need for imperial method to identify bottleneck Top-Down SW Performance Analysis 1. Total Instruction Count and Average Thread Count 2. Query Plan Analysis 3. OS Events and Statistics 4. Instruction-Stream Statistics (Ins Mix, Ins/Data Footprint, Ins/Data access patterns) Top-Down HW Performance Analysis 1. System level utilization statistics: (CPU, Mem, Network I/O, Disk I/O) 2. Instructions per Cycle (IPC) 3. Top-Down Microarchitectural Analysis Method (TMAM) Aggregate statistics may hide phase behavior responsible for performance loss Counter statistics collected at 1-second interval 8

Insights 1. MySQL: relatively efficient DSS query execution engine 2. Hive shows difficulty converting SQL queries into a set of MapReduce Jobs 3. Hive s framework layers add code bloat, slow database traversal 4. Hive with Tez amortize JVM startup cost effectively across MapReduce Tasks; initial start-up period for a Hive is costly for queries under 300 seconds 5. Hive s highly parallel execution increases context switch rates which stresses the memory hierarchy 9

Instruction Count Hive must execute 1.7X more instructions on average Hadoop, MapReduce, Distributed Execution Large Overhead Abstract query from flow of execution Overhead slightly lower due to vector execution Variation in Overhead across queries due to query plan inefficiencies Normalized Instruction Count 8 7 6 5 4 3 2 1 0 Total Instructions: Normalized to MySQL 2.9 7.5 1.9 2.1 1.4 1 1 1 1 1 1 2.7 Query1 Query3 Query6 Query14 Query19 Average MySQL Hive 10

Query 1 SQL Query 19 SQL Query 1 reports the amount of business that was billed, shipped, and returned Query 19 reports the gross discounted revenue attributed to the sale of selected parts handled in a particular manner 11

Startup Overhead Start-up Period Static period of time observed in all Hive queries noted by poor microarchitectural performance 2 parts: initialization period, warm-up period Initialization Period IPC 3 2.5 2 1.5 1 0.5 0 Query 3 5E+10 4.5E+10 4E+10 3.5E+10 3E+10 2.5E+10 2E+10 1.5E+10 1E+10 5E+09 0 Ins Count Hive query plan is generated and sent to Hadoop. Hadoop is initialized 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 Low Instruction Count, Low IPC Warm-up Period Ave. IPC IPC Ave. IPC post-startup Total Instructions Hadoop Tez Containers are initialized, but JVM processes and CPU must warm-up to reach peak IPC High Instruction Count, Low IPC Because period is static, effect on execution time and average IPC dependent on total execution time Query 3 s IPC increases by 9% with Startup Period ignored Query 6 s IPC increases by 45% with Startup Period ignored Improvement over Hive with Built-in Hadoop MapReduce IPC 3 2.5 2 1.5 1 0.5 0 Query 6 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 4.5E+10 4E+10 3.5E+10 3E+10 2.5E+10 2E+10 1.5E+10 1E+10 5E+09 0 Ins Count Use of Tez s online or batch query processing mode can further amortize the start-up period costs Ave. IPC Ave. IPC post-starup IPC Total Instructions 12

IPC 2.50 2.00 1.50 IPC 1.00 0.50 0.00 IPC 2.05 1.87 1.80 1.84 1.67 1.70 1.74 1.63 1.55 1.55 1.46 1.43 1.46 1.45 1.50 1.39 1.42 1.29 Query1 Query3 Query6 Query14 Query19 Average MySQL Hive 6Thread Hive 6Thread Post-Startup Hive Instruction execution rate (IPC) improves based on execution time. When start-up period ignored, Query 6 and Query 14 outperform MySQL counterpart Relative IPC across queries does not correlate between MySQL and Hive. Each database has different inherent microarchitectural bottlenecks 13

LLC Misses: MySQL Performance Bottleneck CPI 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 LLC MPKI vs. CPI 0 0.5 1 1.5 2 2.5 LLC MPKI MySQL Hive 1 Thread Hive 6 Threads Linear (MySQL) MySQL microarchitectural performance correlated to LLC MPKI Data driven performance Fewer instruction between unique entries in database Hive has little variation in LLC MPKI Data traversal indicative of underlying frameworks Little difference in LLC MPKI rates between 1 and 6 threaded Hive setups (despite differences in CPI); therefore, context switches effecting 1 st level caches far more than LLC. 14

Context Switches: Hive Performance Bottleneck CPI increases as the number of threads and subsequently number of context switches increase. Indicates bottleneck of system MySQL has far fewer context switches then Hive even with 1 thread. MySQL is 1 threaded, and there is no apparent correlation between context switches and CPI CPI 0.75 0.7 0.65 0.6 0.55 0.5 0.45 CS VS. CPI WITH DIFFERENT NUMBER OF THREADS Hive Query 1 Hive Query 3 Hive Query 6 Hive Query 14 Hive Query 19 MySQL Queries 0.4 0.00E+00 5.00E-05 1.00E-04 1.50E-04 2.00E-04 2.50E-04 CS PKI 15

Conclusion MySQL executes queries as efficiently as possible Low instruction count Microarchitectural performance differences dependent on how the data is traversed and how the memory hierarchy is stressed by that algorithm. Hive s large code base and generic execution framework primary performance bottleneck Query plan inefficiency and increased instruction count Hive s startup period hurts the performance of short running queries Hive s higher context switch rates directly impact microarchitectural performance Improvements: Amortize startup costs over more queries with batch or online execution Decrease parallelization per node to improve microarchitectural performance Resort to Distributed SQL Query Engine only if database size too large for 1 node 16

17 QUESTIONS?

Overview Objective: Root Cause Analysis of performance discrepancies between MySQL and Hive MySQL: traditional Relational Database Management System (RDMS), to scale-out approach like Hive Hive: Hadoop-based DSQE Use Decision Support Benchmark (DSB), TPC-h as database benchmark Identify computational overheads associated with Distributed set-ups 18

Previous Work (Cont.) Floratou et al; VLDB 2014; Sql-on-hadoop: Full circle back to sharednothing database architectures Hive without Tez impacted by startup and scheduling overheads Impala shared-nothing SQL on Hadoop database vastly improved performance when workloads fit in memory Hive on MapReduce and Hive on Tez CPU bound, especially on scan operations Jia et al; IISWC 2013; Characterizing data analysis workloads in data centers Data analysis workloads have higher IPC than data serving workloads, while lower than that of computation-intensive HPCC workloads Both data analysis workloads and data serving workloads suffer from noticeable front-end stalls which they blame on larger code footprint causing inefficient L1I cache and itlb performance. 19

20 TPC-H Database

MySQL Query Plans Query Table Type Clauses Query 1 lineitem ALL Using Where Using Temporary Using Filesort Query 3 orders ALL Using Where Using Temporary Using Filesort customers eq_ref Using where lineitem ref Using where Query 6 lineitem ALL Using where Query 14 lineitem ALL Using where part eq_ref lineitem ALL Using where Query 19 part eq_ref Using where 21

Hive Query Plans Query Vertex Source Type Partitions Filter Query 1 Map 1 lineitem Select Group By 12 Reduce 2 Map 1 Group By 11 Reduce 3 Reduce 2 Select 1 Map 1 customers Filter 1 Map 6 orders Filter 4 Map 7 lineitem Filter 15 Reduce 2 Map 1 Map 6 Merge/Join 1 Query 3 Merge/Join Map 7 Reduce 3 Select Reduce 2 Group By 2 Reduce 4 Reduce 3 Group By Select 2 Reduce 5 Reduce 4 Select 1 Filter Query 6 Map 1 lineitem Select 12 Group By Reduce 2 Map 1 Group By 1 Map 1 part Filter 1 Map 4 lineitem Filter 15 Query 14 Reduce 2 Map 1 Merge/Join Map 4 Group By 1 Reduce 3 Reduce 2 Group By Select 1 Map 1 lineitem Filter 15 Map 4 parts Filter 1 Merge/Join Map 1 Query 19 Filter Reduce 2 Select Map 4 Group By 4 Reduce 3 Reduce 2 Group By 1 22