Modern Processors using BigDataBench

Similar documents
Big Data Benchmark Suite

BPOE Research Highlights

Benchmarking and Ranking Big Data Systems

BigDataBench: a Big Data Benchmark Suite from Internet Services

LeiWang, Jianfeng Zhan, ZhenJia, RuiHan

CloudRank-D:A Benchmark Suite for Private Cloud Systems

BigDataBench. Khushbu Agarwal

HiBench Introduction. Carson Wang Software & Services Group

On Big Data Benchmarking

Architecture Support for Big Data Analytics

Introducing EEMBC Cloud and Big Data Server Benchmarks

Evaluating Task Scheduling in Hadoop-based Cloud Systems

On Big Data Benchmarking

HiBench Installation. Sunil Raiyani, Jayam Modi

Characterizing Task Usage Shapes in Google s Compute Clusters

Energy Efficient MapReduce

Big Data Simulator version

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

Evaluating HDFS I/O Performance on Virtualized Systems

Enterprise Applications

Big Data Explained. An introduction to Big Data Science.

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System V I J A Y R A O

Transforming the Telecoms Business using Big Data and Analytics

COMP9321 Web Application Engineering

The Internet of Things and Big Data: Intro

Resource Efficient Computing for Warehouse-scale Datacenters

CS54100: Database Systems

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

MapReduce Evaluator: User Guide

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Benchmarking Cassandra on Violin

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Rackspace Cloud Databases and Container-based Virtualization

Applied Storage Performance For Big Analytics. PRESENTATION TITLE GOES HERE Hubbert Smith LSI

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Putting it all together: Intel Nehalem.

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Oracle Big Data SQL Technical Update

Big Data and Analytics: Challenges and Opportunities

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Virtuoso and Database Scalability

Big Data Analytics - Accelerated. stream-horizon.com

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Big Fast Data Hadoop acceleration with Flash. June 2013

Topics in basic DBMS course

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

High Performance Computing in CST STUDIO SUITE

Mark Bennett. Search and the Virtual Machine

Capacity Planning for Microsoft SharePoint Technologies

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

BIG DATA TRENDS AND TECHNOLOGIES

Energy Constrained Resource Scheduling for Cloud Environment

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:

Performance White Paper

Performance And Scalability In Oracle9i And SQL Server 2000

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Benchmarking Hadoop & HBase on Violin

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Intel Xeon +FPGA Platform for the Data Center

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Generational Performance Comparison: Microsoft Azure s A- Series and D-Series. A Buyer's Lens Report by Anne Qingyang Liu

Big Data Analytics - Accelerated. stream-horizon.com

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

BIG DATA ANALYTICS For REAL TIME SYSTEM

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Parallel Computing. Benson Muite. benson.

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Hadoop & its Usage at Facebook

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Application of Predictive Analytics for Better Alignment of Business and IT

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Integrating Apache Spark with an Enterprise Data Warehouse

Data Mining in the Swamp

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. Version 1.1 (June 19, 2012)

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Big Data Research in the AMPLab: BDAS and Beyond

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Performance and scalability of a large OLTP workload

Large-Scale Data Processing

Binary search tree with SIMD bandwidth optimization using SSE

How To Scale Out Of A Nosql Database

Transcription:

Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/bigdatabench Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences HPBDC 2015 Ohio, USA INSTITUT TE OF COMPUTING TECHNOLOGY

Outline BigDataBench Overview Workload characterization Multi-tenancy version Processors evaluation

What is BigDataBench? An open source big data benchmarking project http://prof.ict.ac.cn/bigdatabench Search Google using BigDataBench

BigDataBench Detail Methodology Five application domains Propose benchmark specifications for each domain Implementation 14 Real world data sets & 3 kinds of big data generators 33 Big data workloads with diverse implementation Specificpurpose Version BigDataBench subset version

Nucleot tides (billio on) Five Application Domains DDBJ/EMBL/GenBank / database Growth Taking up 80% of Nucleotides Entries internet services Internet Search Engine ServiceSocial Network 200 Multimedia according to pg page Search Electronic engine, Social Commerce new network, E commerce Media Mdi Streaming 180 new 180 views and daily visitors Others 160 VIDEOS 160 on YouTube 15% hours MUSIC streaming PHOTOS on FLICKR every 5% 140 every eey 140minute on PANDORA 40% every minute minute 120 120 15% 100 100 80 80 60 25% data growth 60 VIDEO 40 feeds from 40 minutes Bioinformatics VOICE calls on are surveillance 20 cameras 20 Skype every minute IMAGES, VIDEOS, doc 0 Top 20 websites uments, 0 http://www.oldcolony.us/wp content/uploads/2014/11/whatisbigdata DKB v2.pdf http://www.alexa.com/topsites/global;0 p// / p /g http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth e.html#dbgrowth graph Entrie es (million)

Benchmark specification Guidelines for BigDataBench implementation Data model workloads Describe data model Model typical application scenarios Extract important workloads

BigDataBench Details Methodology Five application domains Benchmark specification for each domain Implementation 14 Real world data sets & 3 kinds of big data generators 33 Big data workloads with diverse implementation Specificpurpose Version BigDataBench subset version

BigDataBench Summary BDGS(Big Data Generator Suite) for scalable data Wikipedia Entries Amazon Movie Reviews Google Web Graph Facebook Social NetworkE commerce Transaction ImageNet English broadcasting audio ProfSearch Resumes DVD Input Streams Image scene SoGou Data Genome sequence data Assembly of the human genome MNIST 14 Real world Data Sets Search Engine Multimedia Social E-commerce Network Bioinformatics 33 Workloads NoSql Impala Shark Hadoop RDMA MPI DataMPI Software Stacks

Big Data Generator Tool 3 kinds of big data generators Preserving original characteristics of real data Text/Graph/Table generator

BigDataBench Details Methodology Five application domains Benchmark specification for each domain Implementation 14 Real world data sets & 3 kinds of big data generators 33 Big data workloads with diverse implementations Specificpurpose Version BigDataBench subset version

BigDataBench Subset Motivation Expensive to run all the benchmarks for system and architecture researches multiplied by different implementations BigDataBench 3.0 provides about 77 workloads Eliminate the correlation data Identify workload characteristics from a specific perspective Dimension reduction (PCA) Clustering (K Means) Subset

Why BigDataBench? Specifi Application Workload Work Scalable data Multiple Multite Subs Simulat cation domains Types loads sets s (from real impleme pe e nancy ets es or data) ntations version BigDataBench Y Five Four [1] 33 8 Y Y Y Y BigBench Y One Three 10 3 N N N N Cloud Suite N N/A Two 8 3 N N N Y HiBench N N/A Two 10 3 N N N N CALDA Y N/A One 5 N/A Y N N N YCSB Y N/A One 6 N/A Y N N N LinkBench Y N/A One 10 N/A Y N N N AMP Benchmarks Y N/A One 4 N/A Y N N N [1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service

BigDataBench Users http://prof.ict.ac.cn/bigdatabench/users/ fi t /Bi t B h/ / Industry users Accenture, BROADCOM, SAMSUMG, Huawei, IBM China s first industry standard big data benchmark suite http://prof.ict.ac.cn/bigdatabench/industry standardp//p / / benchmarks/ About 20 academia groups published pp papers using BigDataBench

BigDataBench Publications BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA 2014). Characterizing data analysis workloads in data centers. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award) BigOP: generating comprehensive big data workloads as a benchmarking framework. 19th International Conference on Database Systems for AdvancedApplications Applications (DASFAA 2014) BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014) Identifying i Dwarfs Workloads in Big Data Analytics arxivpreprint i arxiv:1505.06872 BigDataBench MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads arxiv preprint arxiv:1504.02205

Outline BigDataBench Overview Workload characterization Multi-tenancy version Processors evaluation

System Behaviors Diversified system level behaviors: Pe ercentage Weighte ed I/O time ratio 100% 80% 60% 40% 20% 0% 100 10 1 0.1 0.01 CPU utilization I/O wait ratio H-Grep(7) S-Kmeans(1) S-PageRank(1) H-Grep(7) S-K Kmeans(1) S-Pag gerank(1) H-WordCount(1) H-Wor dcount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) I-S SelectQuery(9) H-Bayes(1) M-Bayes M-Kmeans M-P PageRank H- -Read(10) H-Diff ference(9) I-Select tquery(9) S-WordCount(8) S-Wor dcount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS- S- -Project(4) S-O OrderBy(3) S-Grep(1) M-Grep H -TPC-DS- I-OrderBy(7) I-O OrderBy(7) S-TPC-DS- S -TPC-DS- S-TPC-DS- S -TPC-DS- S-Sort(1) S-Sort(1) M-WordCount M-W WordCount M-Sort M-Sort AVG_S_BigData AVG_S S_BigData The Average Weighted disk I/O time ratio

System Behaviors Diversified system level behaviors: High CPU utilization & less I/O time Pe ercentage 100% 80% 60% 40% 20% 0% 100 H-Grep(7) S-K Kmeans(1) S-Pag gerank(1) H-Wor dcount(1) H-Bayes(1) M-Bayes M-Kmeans M-P PageRank H- -Read(10) H-Diff ference(9) I-Select tquery(9) CPU utilization I/O wait ratio S-Wor dcount(8) S- -Project(4) S-O OrderBy(3) S-Grep(1) M-Grep H -TPC-DS- I-O OrderBy(7) -TPC-DS- -TPC-DS- S-Sort(1) S S M-W WordCount M-Sort AVG_S S_BigData tio ed I/O time rat Weight 10 1 0.1 0.01 H-Grep(7) S-Kmeans(1) S-PageRank(1) H-WordCount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) I-S SelectQuery(9) S-WordCount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS- I-OrderBy(7) The Average Weighted disk I/O time ratio S-TPC-DS- S-TPC-DS- S-Sort(1) M-WordCount M-Sort AVG_S_BigData

System Behaviors Diversified system level behaviors: High CPU utilization & less I/O time Low CPU utilization relatively and lots of I/O time Pe ercentage Weight ted I/O time ra atio 100% 80% 60% 40% 20% 0% 100 10 1 0.1 0.01 CPU utilization I/O wait ratio H-Grep(7) S-Kmeans(1) S-PageRank(1) H-Grep(7) S-K Kmeans(1) S-Pag gerank(1) H-WordCount(1) H-Wor dcount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) I-S SelectQuery(9) H-Bayes(1) M-Bayes M-Kmeans M-P PageRank H- -Read(10) H-Diff ference(9) I-Select tquery(9) S-WordCount(8) S-Wor dcount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS- S- -Project(4) S-O OrderBy(3) S-Grep(1) M-Grep H -TPC-DS- I-OrderBy(7) I-O OrderBy(7) S-TPC-DS- S -TPC-DS- S-TPC-DS- S -TPC-DS- S-Sort(1) S-Sort(1) M-WordCount M-W WordCount M-Sort M-Sort AVG_S_BigData AVG_S S_BigData The Average Weighted disk I/O time ratio

System Behaviors Diversified system level behaviors: High CPU utilization & less I/O time Relatively low CPU utilization & lots of I/O time Mdi Medium CPU utilization & I/O time Pe ercentage ratio Weig ghted I/O time 100% 80% 60% 40% 20% 0% 100 10 1 0.1 0.01 H-Grep(7) S-K Kmeans(1) (7) (1) H-Grep S-Kmeans S-Pag gerank(1) S-PageRank k(1) H-Wor dcount(1) H-WordCount t(1) H-Bayes(1) M-Bayes H-Bayes (1) M-Bay yes M-Kmeans M-Kmea ans M-P PageRank M-PageRa ank H- -Read(10) H-Read(1 10) H-Diff ference(9) H-Difference (9) I-Select tquery(9) I-SelectQuery( (9) CPU utilization I/O wait ratio S-Wor dcount(8) S-WordCount t(8) S- -Project(4) S-Project t(4) S-O OrderBy(3) S-Grep(1) M-Grep S-OrderBy y(3) S-Grep (1) M-Gr rep H -TPC-DS- H-TPC-D DS- I-O OrderBy(7) I-OrderBy y(7) -TPC-DS- -TPC-DS- S-Sort(1) S S S-TPC-D DS- S-TPC-D DS- S-Sort t(1) M-W WordCount M-Sort M-WordCou unt M-S Sort AVG_S S_BigData AVG_S_BigDa ata The average Weighted disk I/O time ratios

Workloads Classification From perspective of system behaviors: System behaviors vary across different workloads Workloads are divided into 3 categories: Type Workloads CPU Intensive H Grep, S Kmeans, S PageRank, H WordCount, H Bayes, M Bayes, M Kmeans and M PageRank I/O Intensive H Read, H Difference, I SelectQuery, S WordCount, S Project, S OrderBy, M Grep and S Grep Hybrid H TPC DS query3, I OrderBy, S TPC DS query10, S TPC DS query8, S Sort, M WordCount and M Sort

Off Chip Bandwidth Most of CPU intensive workloads have higher off chip bandwidth (3GB/s), Maximum is 6.2GB/s; Other workloads have lower off chip bandwidth (0.6GB/s). MPI based workloads need low memory bandwidth.

IPC of BigDataBench vs. other benchmarks 2 1.5 IPC 1 0.5 0 H-Gr rep(7) S-Kmea ans(1) S-PageRa ank(1) H-WordCou unt(1) H-Bay yes(1) M-B Bayes M-Km means Rank d(10) ce(9) ry(9) M-Page H-Rea H-Differen I-SelectQue S-WordCou unt(8) S-Proje ect(4) S-Order By(3) S-Gr rep(1) M- -Grep H- -TPC-DS-quer ry3(9) I-Order By(7) S-TPC C-DS- S- -TPC-DS-quer ry8(1) S-So ort(1) M-WordC Count M-Sort AVG_S_Big gdata TP PC-C AVG_Cloud Suite Avg_H HPCC Avg_PAR RSEC AVG_SP PECfp AVG_SPE ECint The average IPC of the big data workloads is larger than that of CloudSuite, SPECFP and SPECINT, similar with PARSEC and slightly lower than HPCC The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite Some workloads have high IPC (M_Kmeans, S TPC DS Query8)

Instructions Mix of BigDataBench vs. other benchmarks Big data workloads are data movement dominated dcomputing with ihmore branch operations 92% percentage in terms of instruction mix (Load + Store + Branch + data movements of INT)

Pipeline Stalls The service workloads have more RAT (Register Allocation Table) stalls The data analysis workloads have more RS (Reservation Station) and ROB (ReOrder Buffer) full stalls Notable front end stalls (i.e., instruction fetch stall)! Data analysis Service

Cache Behaviors of BigDataBench CPU intensive I/O intensive hybrid L1I MPKI Larger than traditional benchmarks, but lower than that of CloudSuite (12 vs. 31) Different among big data workloads CPU intensive(8), I/O intensive(22), and hybrid workloads(9) One order of magnitude differences among diverse implementations M_WordCount is 2, while H_WordCount is 17

Cache Behaviors L2 Cache: The IO intensive workloads undergo more L2 MPKI L3 Cache: The average L3 MPKI of the big data workloads is lower than all of the other workloads The underlying software stacks impact data locality MPI workloads have better data locality and less cache misses

TLB Behaviors CPU intensive I/O intensive hybrid 10 DTLB MPKI ITLB MPKI misses (MPKI) TLB 1 0.1 0.01 H-Grep(7) S-Km means(1) ITLB S-Page erank(1) H-Word dcount(1) Bayes(1) H- M-Bayes -Kmeans M agerank M-P H-R Read(10) H-Diffe erence(9) I-SelectQ Query(9) IO intensive workloads undergo more ITLB MPKI. DTLB CPU intensive workloads have more DTLB MPKI. S-Word dcount(8) S-P Project(4) rderby(3) S-Or S-Grep(1) M-Grep H-TPC-DS-q query3(9) rderby(7) I-Or S-TPC-DS-qu uery10(4) S-TPC-DS-q query8(1) S-Sort(1) M-Wo ordcount M-Sort AVG_S BigData TPC-C oudsuite AVG_Cl Av vg_hpcc Avg_P PARSEC AVG SPECfp AVG SPECint

Our observations from BigDataBench Unique characteristics Data movement dominated computing with more branch operations 92% percentage in terms of instruction mix Notable pipeline frontend stalls Different bh behaviors among Big Data workloads Disparity of IPCs and memory access behaviors CloudSuite is a subclass of Big Data Software stacks impacts The L1I cache miss rates have one order of magnitude differences among diverse implementations with different software stacks.

Correlation Analysis Compute the correlation coefficients of CPI with other micro architecture level metrics. Pearson s correlation coefficient: Absolute value (from 0 to 1) shows the dependency: The bigger the absolute value, the stronger the correlation.

Top five coefficients 1 5 Naive Bayes Grep WordCount Kmeans FKmeans PageRank Sort Hive IBCF HMM SVM

Insights Frontend stall does not havehighhigh correlation coefficient value for most of big data analytics workloads Frontend stall is not the factor that affects the CPI performance most. L2 cache misses and TLB misses have high correlation coefficient values. The long latency memory accesses (access L3 cache or memory) affect the CPIperformance most and should be the optimization point with highest priority.

Outline BigDataBench Overview Workload characterization Multi-tenancy version Processors evaluation

Cloud Data Centers Two class of popular workloads Long running g services Search engines, E commerce sites Short term term data analytic jobs Hadoop MapReduce, Spark jobs

Problem Existing benchmarks focus on specific types of workload Scenarios are not realistic Does not match ththe typical ldt data center scenario that mixes different percentages of tenants and workloads shareing the same computing infrastructure

Purpose of BigDataBench MT Developing realistic benchmarks to reflect such practical scenarios of mixed workloads. Both service and data analytic workloads Dynamic scaling up and down The tool is publicly available from http://prof.ict.ac.cn/bigdatabench/multip//p / / tenancyversion

What can you do with it? We consider two dimensions of the benchmarking scenarios From tenants perspectives From workloads perspectives

You can specify the tenants The number of tenantst Scalability Benchmark: How many tenants are able run in parallel? The priorities of tenants Fi Fairness Benchmark: How fair fiis the system, i.e., are the available resources equally available to all tenants? If tenants have different priorities? Time line How the number and priorities of tenants change over time?

You can specify the workloads Dt Data characteristics ti Data type, source Input/output data volumes, distributions Computation semantics Source code Big data softwarestacksstacks Job arrival patterns Arrival rate Arrival sequence

Two major challenges Ht Heterogeneity of real workloads Different workload types e.g. CPU or I/O intensive workloads Different software stacks e.g. Hadoop, Spark, MPI Workload dynamicity hidden in real world traces Arrival patterns Request/Job submitting time and sequences Job input sizes e.g. ranging from KB to ZB

Existing big data benchmarks Benchmarks Actual workloads Real workload traces Mixed workloads AMPLab benchmark, Linkbench, Bigbench, YCSB, CloudSuite Yes No No GridMix, SWIM No Yes No How to generate real workloads on the basis of real workload traces, is still an open question.

System Overview Three modules Benchmark User Portal A visual interface Combiner of Workloads and Traces A matcher of real workloads and traces Multi tenant Workload Generator A multi tenant workload generator

Key technique: Combination of real and synthetic data analytic jobs Goal: Combining the arrival patterns extracted t dfrom real traces with real workloads. Problem: Workload traces only contain anonymous jobswhose workload types and/or input data are unknown.

Solution: the first step Deriving the workload characteristics of both real and anonymous jobs TABLE. Metrics to represent workload characteristics Metric Execution time CPU usage Memory usage CPI MAI Description Measured in seconds Total CPU time per second Measured in GB Cycles per instruction The number of memory accesses per instruction

Solution: the second step Matching both types of jobs whose workload characteristics are sufficiently similar

An Example An example of matching Hadoop workloads Mining Facebook/Google workload trace (Exact workload characteristics information Profiling Hadoop workloads from BigDataBench (Collect workload characteristics information) Workload matching using k means clustering Matching result: replaying basis Job type Input Starting size (GB) Time (minutes) Bayes 2 10 Sort 1 20 K means 0.5 25 Bayes 5 30 Sort 1 40

System demostration Three steps to generate a mix of search service and Hadoop MapReduce jobs Traces : 24 hour Sogou user query logs and Google cluster trace. Step 1 Specification of tested machines and workloads Step2 2 Selection ofbenchmarking period and scale Step 3 Generation of mixed workloads

Workloads and traces in BigDataBench MT Multi tenancy t V1.0 releases: Workloads Software stack Workload trace Nutch Web Apache Tomcat Sogou Search 6.0.26, Search (http://www.sogou.com/labs/dl Server(Nutch) /q e.html) Hadoop Hadoop 1.0.2 Facebook (https://github.com/swimproje ctucb/swim/wiki) Shark Shark 0.8.0 0 Google data center (https://code.google.com/p/goo gleclusterdata/)

Outline BigDataBench Overview Workload characterization Multi-tenancy version Processors evaluation

Core Architecture Multi brawny core (Xeon E5645, 2.4 GHz) 6 Out of Order cores Dynamic Multiple Issue (supper scalar) Dynamic Overclocking Simultaneous multithreading Many wimpy core architecture (Tile Gx36, 1.2 GHz): 36 In Order cores Static Multiple Issue (VLIW)

Experiment methodology User real hardware instead of simulation Real power consumption measurement instead of modeling Saturate CPU performance by: Isolate the processor behavior Over provisions the disk I/O subsystem by using RAM disk Optimize benchmarks Tune the software stack parameters JVM flags to performance

Execution time For Hadoop based sort, the performance gap is about 1.08. For the other workloads, more than 2 gaps exist between Xeon and Tilera. Normalized Tim me 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Xeon Tilera From the perspective of execution time, the Xeon processor is better than Tilera processor all the time.

Cycle Counts There are huge cycle count gaps between Xeon and Tilera ranging from 5.3 to 14. Tilera need more cycles to complete lt the same amount of work. 16 Xeon Tilera 14 Normaliz zed Cycles 12 10 8 6 4 2 0

Pipeline Efficiency The theoretical IPC: Xeon: 4 instructions per cycle Pipeline efficiency: Tilera: 1 instruction bundle per cycle ine Efficiency Pipeli 0.45 0.4 0.35 0.3 0.25 0.2 015 0.15 0.1 0.05 0 Tilera Xeon OoO pipelines are more efficient than in order ones

Power Consumption Tilera is power optimized. Xeon consumes morepower. alized Power Norm 1.8 Tilera Xeon 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

Energy Consumption Hadoop based sort consumes less energy on Tilera than on Xeon Hadoop sort is an extremely I/O intensive workloads. Tilera consumes moreenergythanxeontocompletethesameamountof energy to the amount of work for most big data workloads The longer execution time offsets the lower power design 35 3.5 Xeon Tilera zed Energy Normali 3 2.5 2 1.5 1 0.5 0

Total Cost of Ownership (TCO) Model [*] [] Three year depreciation cycle Hardware costs associated with individual components CPU Memory Disk Board Power Cooling [*] K. Lim et al. Understanding and designing new server architectures for emerging warehouse computing environments. ISCA 2008

Cost model The cost data originate from diverse sources: Different vendors Corresponding official websites Power and cooling: An activity factor of 0.75

Performance per TCO Haoop based Sort has higher performance per TCO on the Tilera. For other workloads, Xeon outperforms Tilera. ance per TCO Norm malized Perform 3.5 3 2.5 2 1.5 1 0.5 0 Tilera Xeon Turbo & HT enabled

Key Takeaways Try using an open source big data benchmark suite from http://prof.ict.ac.cn/bigdatabench Big Data: data movement dominated computing with more branch operations 92% percentage in terms of instruction mix Multi tenancy tenancy version: replaying mixed workloads according to publicly available workloads traces. Wimpy core processors only suit a part of big data workloads.