BPOE Research Highlights

Size: px
Start display at page:

Download "BPOE Research Highlights"

Transcription

1 BPOE Research Highlights Jianfeng Zhan ICT, Chinese Academy of Sciences INSTITUTE OF COMPUTING TECHNOLOGY

2 What is BPOE workshop? B: Big Data Benchmarks PO: Performance OpHmizaHon E: Emerging Hardware

3 MoHvaHon Big data cover many research fields. Researchers Specific research fields + main conferences Have few chance to know about each other Gap between Industry and Academia bringing researchers and prachhoners in related areas together

4 BPOE communihes CommuniHes of architecture, systems, and data management. discuss the mutual influences of architectures, systems, and data management. Bridge the gap of big data researches and prachces between industry and academia

5 BPOE: a series of workshops 1 st BPOE in conjunchon with IEEE Big Data 2013 Santa Clara, CA, USA Paper presentahons + 2 Invited talks 2 st BPOE in conjunchon with CCF HPC China 2013, Guilin, Guangxi, China 8 invited talks 3 st BPOE in conjunchon with CCF Big Data Technology Conference 2013 Beijing, China 6 invited talks

6 OrganizaHon Steering commi6ee Lizy K John University of Texas at AusHn Zhiwei Xu ICT, Chinese Academy of Sciences Cheng- zhong Xu, Wayne State University Xueqi Cheng, ICT, Chinese Academy of Sciences Jianfeng Zhan, ICT, Chinese Academy of Sciences Dhabaleswar K Panda Ohio State University PC Co- chairs Jianfeng Zhan, ICT, Chinese Academy of Sciences Weijia Xu, TACC, University of Texas at AusHn

7 Paper submissions 26 papers received Each one has at least five reviews 30 TPC members 16 paper accepted Two invited talks

8 Finalized programs Three sessions Performance ophmizahon of big data systems 3 from Intel, 1 from Hasso Plabner InsHtute, Germany Special Session: Big Data Benchmarking and Performance ophmizahon. 2 invited talks+ 1 from Academia Sinica Experience and evaluahon with emerging hardware for big data 1 from Japan, 3 from US

9 Two invited talks 13:20-14:05 Invited Talk, BigDataBench: Benchmarking big data systems, Yingjie Shi, Chinese Academy of Sciences 14:05-14:50 Invited Talk, Facebook: Using Emerging Hardware to Build Infrastructure at Scale, Bill Jia, PhD. Manager, Performance and Capacity Engineering, Facebook

10 Related work with Big Data BigDataBench Benchmarking Uses case on BigDataBench

11 What is BigDataBench? A Big Data Benchmark Suite, ICT, Chinese Academy of Sciences PresentaHon is available soon from hbp://prof.ict.ac.cn/bpoe2013/program.php EvaluaHng big data (hardware) systems and architectures Opensource project hbp://prof.ict.ac.cn/bigdatabench

12 Summary of BigDataBench

13 BigDataBench Methodology Representative Real Data Sets Data Types Structured data Semi-structured data Unstructured data Data Sources Text data Graph data Table data Extended Big Data Sets Preserving 4V Investigate Typical Application Domains Synthetic data generation tool preserving data characteristics BigDataBench Diverse and Important Workloads Application Types Offline analytics Realtime analytics Online services Basic & Important Operations and Algorithms Extended Represent Software Stack Extended Big Data Workloads

14 RepresentaHve Datasets Application Domain Data Type Data Source Dataset Search Engine E-commence unstructured data Text data Wikipedia Entries Semi-structured data unstructured data Graph data Table data Text data Google Web Graph Profsearch Person Resume Amazon Movie Reviews structured data Table data ABC Transaction Data Social Network unstructured data Graph data Facebook Social Graph

15 Chosen Workloads Micro Benchmarks Basic Datastore Operations Application Scenarios Relational Queries Search engine Social network Ecommerce system

16 Chosen Workloads Micro Benchmarks Application Domain Data Type Data Source Operations & Algorithms Software Stack Benchmark ID sort Hadoop Spark MPI 1-3 Micro Benchmarks un-structured text grep Hadoop Spark MPI 2-3 wordcount Hadoop 3-1 Spark 3-2 MPI 3-3 graph BFS MPI 4

17 Chosen Workloads Basic Datastore Application Domain Basic Datastore Operations Data Type semistructured Operations Data Source table Operations & Algorithms Read Write Scan Software Stack HBase Cassandra MongoDB MySQL HBase Cassandra MongoDB MySQL HBase Cassandra MongoDB MySQL Benchmark ID

18 Chosen Workloads Basic Relational Query Application Domain Data Type Data Source Operations & Algorithms Software Stack Benchmark ID Basic Relational Query structured Table Select query Aggregation query Join query Hive Impala Hive Impala Hive Impala 10-2

19 Chosen Workloads Service Application Domain Operations & Algorithms Data Type Data Source Software Stack Benchmark ID Search Engine Nutch server Pagerank Structured Un-structured Table Graph Hadoop Hadoop Index Un-structured text Hadoop 13 Social Network Olio server Kmeans Structured Un-structured Table Graph MySQL Hadoop Spark Connetcted components Un-structured graph Hadoop 16 E-commerce Rubis server Collaborative filtering Structured Un-structured Table text MySQL Hadoop Naïve bayas Un-structured text Spark 19

20 BigDataBench Related with Big Data Benchmarking Uses case on BigDataBench

21 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University

22 New Solutions of Big Data Systems 22/

23 A Tradeoff? Energy consumption 23/ Performance 23

24 " What is the performance " Evaluating three of different big data respective systems big under data types of applications? systems " Comparing two of them " What is the performance from performance of and different big data energy systems cost under different data volumes? " Analyzing the running " What is the energy features consumption of different big of different big data data system, systems? and the underlying reasons

25 Experiment Planorms Xeon Mainstream processor Atom Low power processor Tilera Many core processor Basic Comparison Configurations Hadoop Cluster Xeon VS Atom Xeon VS Tilera InformaXon CPU Type Intel OoO Xeon E5310 Intel ConnecXon Atom D510 Buffer Tilera TilePro36 Master/Slaves FPU 1/7 1/7 and 1/1TDP ExecuXon Mode Sharing CPU Core 4 cores All 1.6GHz comparison 2 cores 1.66GHz are 36 based cores 500MHz Xeon E5310 Yes Having Yes the Comprison BigDataBench same BUS Having No the same 80W core hardware thread number number L1 I/D Cache 32KB 24KB 16KB/8KB Atom D510 No Yes BUS No 13W TilePro36 L2 Hadoop Cache seqngyes 4096KB NoFollowing IMESH 512KB the Hadoop official Yes website 64KB16W

26 ImplicaHons from the Results Xeon vs. Atom Xeon is more powerful than Atom Atom is energy conservation than Xeon when dealing with some easy application Atom doesn t show energy advantage when dealing with complex application Xeon vs. Tilera Xeon is more powerful than Tilera Tilera is more energy conservation than Xeon when dealing with some easy application Tilera don t show energy advantage when dealing with complex application Tilera is more suitable to process I/O intensive application

27 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University

28 Greening Data Center IDC says: Digital Universe will be 35 Zettabytes by 2020 Nature says: Distilling the meaning from big data has never been in such urgent demand. The data centers consumed about 1.3% electricity of all the electricity use The energy bill is the largest single item in the total cost of ownership of a Data Center

29 Power Usage EffecHveness If you can not measure it, you can not improve it. Lord Kelvin PUE(Power usage effec,veness): a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the informahon technology equipment.

30 AxPUE PUE

31 ApPUE ApPUE (Application Performance Power Usage Effectiveness): a metric that measures the power usage effectiveness of IT equipments, specifically, how much of the power entering IT equipments is used to improve the application performance. Computation Formulas: Data processing performance of applications ApPUE = Application Performance IT Equipment Power The average rate of IT Equipment Energy consumed

32 AoPUE AoPUE (Application ): a metric that measures the power usage effectiveness of the overall data center system, specifically, how much of the total facility power is used to improve the application performance. Computation Formulas: AoPUE = Application Performance Total Facility Power AoPUE = ApPUE PUE The average rate of Total Facility Energy Used

33 The Roles of BigDataBench ConduHng the experiments based on BigDataBench to demonstrate the rahonality of the newly proposed AxPUE from two aspects: AdopHng the comprehensive workloads of BigDataBench to design the applicahon category sensihve experiment. AdopHng Sort of BigDataBench to design the algorithm complexity- sensihve experiment.

34 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University

35 MoHvaHon & ContribuHons MoXvaXon The properties of big data bring challenges for big data management. The performance diagnosis is of great importance to provide healthy big data systems. ContribuXons Propose a new performance anomaly detection method based on ARIMA model for big data applications. Introduce a signature-based approach employing MIC invariants to correlate a specific kind of performance problem. Propose an ensemble approach to diagnose the real causes of performance problems in big data platform.

36 The Roles of BigDataBench ConducHng the experiments based on BigDataBench to evaluate the efficiency and precision of proposed performance anomaly detechon method. Using the data generahon tool of BigDataBench to generate experiment data. Chosen workloads: Sort, Wordcount, Grep and Naïve Bayesian.

37 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University

38 Main Ideas Characterize 16 various typical workloads from BigDataBench and HiBench by micro- architecture level metrics. Analyze the similarity in these various workloads by stahshcal techniques such as PCA and clustering. Release two typical workloads related to trajectory data process in real- world applicahon domain.

39

40 Contact informahon Jianfeng Zhan hbp://prof.ict.ac.cn/jfzhan BPOE: hbp://prof.ict.ac.cn/bpoe2013 BigDataBench: hbp://prof.ict.ac.cn/bigdatabench

BigDataBench: a Big Data Benchmark Suite from Internet Services

BigDataBench: a Big Data Benchmark Suite from Internet Services BigDataBench: a Big Data Benchmark Suite from Internet Services Lei Wang 1,7, Jianfeng Zhan 1, Chunjie Luo 1, Yuqing Zhu 1, Qiang Yang 1, Yongqiang He 2, Wanling Gao 1, Zhen Jia 1, Yingjie Shi 1, Shujie

More information

CloudRank-D:A Benchmark Suite for Private Cloud Systems

CloudRank-D:A Benchmark Suite for Private Cloud Systems CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction

More information

BigDataBench. Khushbu Agarwal

BigDataBench. Khushbu Agarwal BigDataBench Khushbu Agarwal Last Updated: May 23, 2014 CONTENTS Contents 1 What is BigDataBench? [1] 1 1.1 SUMMARY.................................. 1 1.2 METHODOLOGY.............................. 1 2

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

On Big Data Benchmarking

On Big Data Benchmarking On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University [email protected], [email protected] Abstract Big data systems address

More information

On Big Data Benchmarking

On Big Data Benchmarking On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University [email protected], [email protected] Abstract Big data systems address

More information

Evaluating Task Scheduling in Hadoop-based Cloud Systems

Evaluating Task Scheduling in Hadoop-based Cloud Systems 2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy

More information

Benchmarking and Ranking Big Data Systems

Benchmarking and Ranking Big Data Systems Benchmarking and Ranking Big Data Systems Xinhui Tian ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences INSTITUTE OF COMPUTING TECHNOLOGY Outline n BigDataBench n BigDataBench

More information

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include: Introduction BIG DATA is a term that s been buzzing around a lot lately, and its use is a trend that s been increasing at a steady pace over the past few years. It s quite likely you ve also encountered

More information

HiBench Introduction. Carson Wang ([email protected]) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

HiBench Installation. Sunil Raiyani, Jayam Modi

HiBench Installation. Sunil Raiyani, Jayam Modi HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................

More information

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects

More information

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation Unlocking the Intelligence in Big Data Ron Kasabian General Manager Big Data Solutions Intel Corporation Volume & Type of Data What s Driving Big Data? 10X Data growth by 2016 90% unstructured 1 Lower

More information

LeiWang, Jianfeng Zhan, ZhenJia, RuiHan

LeiWang, Jianfeng Zhan, ZhenJia, RuiHan 2015-6 CHARACTERIZATION AND ARCHITECTURAL IMPLICATIONS OF BIG DATA WORKLOADS arxiv:1506.07943v1 [cs.dc] 26 Jun 2015 LeiWang, Jianfeng Zhan, ZhenJia, RuiHan Institute Of Computing Technology Chinese Academy

More information

Hadoop in the Enterprise

Hadoop in the Enterprise Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Open Source for Cloud Infrastructure

Open Source for Cloud Infrastructure Open Source for Cloud Infrastructure June 29, 2012 Jackson He General Manager, Intel APAC R&D Ltd. Cloud is Here and Expanding More users, more devices, more data & traffic, expanding usages >3B 15B Connected

More information

Architecture Support for Big Data Analytics

Architecture Support for Big Data Analytics Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1

More information

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Cyrus Shahabi, Ph.D. Professor of Computer Science & Electrical Engineering Director, Integrated Media Systems Center (IMSC)

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. Aayush Agrawal

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. Aayush Agrawal BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking Aayush Agrawal Last Updated: May 21, 2014 text CONTENTS Contents 1 Philosophy : 1 2 Requirements : 1 3 Observations : 2 3.1 Text Generator

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

Big Data Performance Growth on the Rise

Big Data Performance Growth on the Rise Impact of Big Data growth On Transparent Computing Michael A. Greene Intel Vice President, Software and Services Group, General Manager, System Technologies and Optimization 1 Transparent Computing (TC)

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide OPTIMIZATION AND TUNING GUIDE Intel Distribution for Apache Hadoop* Software Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide Configuring and managing your Hadoop* environment

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Big Data Simulator version

Big Data Simulator version Big Data Simulator version User Manual Website: http://prof.ict.ac.cn/bigdatabench/simulatorversion/ Content 1 Motivation... 3 2 Methodology... 3 3 Architecture subset... 3 3.1 Microarchitectural Metric

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

I/O Characterization of Big Data Workloads in Data Centers

I/O Characterization of Big Data Workloads in Data Centers I/O Characterization of Big Data Workloads in Data Centers Fengfeng Pan 1 2 Yinliang Yue 1 Jin Xiong 1 Daxiang Hao 1 1 Research Center of Advanced Computer Syste, Institute of Computing Technology, Chinese

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015 E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing

More information

Optimization of Distributed Crawler under Hadoop

Optimization of Distributed Crawler under Hadoop MATEC Web of Conferences 22, 0202 9 ( 2015) DOI: 10.1051/ matecconf/ 2015220202 9 C Owned by the authors, published by EDP Sciences, 2015 Optimization of Distributed Crawler under Hadoop Xiaochen Zhang*

More information

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

Introducing EEMBC Cloud and Big Data Server Benchmarks

Introducing EEMBC Cloud and Big Data Server Benchmarks Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Welcome to the 6 th Workshop on Big Data Benchmarking

Welcome to the 6 th Workshop on Big Data Benchmarking Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK Please note! This workshop

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES 1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure

Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure The Intel Distribution for Apache Hadoop* software running on 808 VMs using VMware vsphere Big Data Extensions and Dell

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

Maximizing Hadoop Performance with Hardware Compression

Maximizing Hadoop Performance with Hardware Compression Maximizing Hadoop Performance with Hardware Compression Robert Reiner Director of Marketing Compression and Security Exar Corporation November 2012 1 What is Big? sets whose size is beyond the ability

More information

Characterizing Task Usage Shapes in Google s Compute Clusters

Characterizing Task Usage Shapes in Google s Compute Clusters Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) ! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design

More information

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database Performance Advantages for Oracle Database At a Glance This Technical Brief illustrates that even for smaller online transaction processing (OLTP) databases, the Sun 8Gb/s Fibre Channel Host Bus Adapter

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Computing Issues for Big Data Theory, Systems, and Applications

Computing Issues for Big Data Theory, Systems, and Applications Computing Issues for Big Data Theory, Systems, and Applications Beihang University Chunming Hu ([email protected]) Big Data Summit, with CyberC 2013 October 10, 2013. Beijing, China. Bio of Myself Chunming

More information

Very Large Enterprise Network, Deployment, 25000+ Users

Very Large Enterprise Network, Deployment, 25000+ Users Very Large Enterprise Network, Deployment, 25000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Memory System Characterization of Big Data Workloads

Memory System Characterization of Big Data Workloads 2013 IEEE International Conference on Big Data Memory System Characterization of Big Data Workloads Martin Dimitrov*, Karthik Kumar*, Patrick Lu**, Vish Viswanathan*, Thomas Willhalm* *Software and Services

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information