BPOE Research Highlights
|
|
|
- Allen Cobb
- 10 years ago
- Views:
Transcription
1 BPOE Research Highlights Jianfeng Zhan ICT, Chinese Academy of Sciences INSTITUTE OF COMPUTING TECHNOLOGY
2 What is BPOE workshop? B: Big Data Benchmarks PO: Performance OpHmizaHon E: Emerging Hardware
3 MoHvaHon Big data cover many research fields. Researchers Specific research fields + main conferences Have few chance to know about each other Gap between Industry and Academia bringing researchers and prachhoners in related areas together
4 BPOE communihes CommuniHes of architecture, systems, and data management. discuss the mutual influences of architectures, systems, and data management. Bridge the gap of big data researches and prachces between industry and academia
5 BPOE: a series of workshops 1 st BPOE in conjunchon with IEEE Big Data 2013 Santa Clara, CA, USA Paper presentahons + 2 Invited talks 2 st BPOE in conjunchon with CCF HPC China 2013, Guilin, Guangxi, China 8 invited talks 3 st BPOE in conjunchon with CCF Big Data Technology Conference 2013 Beijing, China 6 invited talks
6 OrganizaHon Steering commi6ee Lizy K John University of Texas at AusHn Zhiwei Xu ICT, Chinese Academy of Sciences Cheng- zhong Xu, Wayne State University Xueqi Cheng, ICT, Chinese Academy of Sciences Jianfeng Zhan, ICT, Chinese Academy of Sciences Dhabaleswar K Panda Ohio State University PC Co- chairs Jianfeng Zhan, ICT, Chinese Academy of Sciences Weijia Xu, TACC, University of Texas at AusHn
7 Paper submissions 26 papers received Each one has at least five reviews 30 TPC members 16 paper accepted Two invited talks
8 Finalized programs Three sessions Performance ophmizahon of big data systems 3 from Intel, 1 from Hasso Plabner InsHtute, Germany Special Session: Big Data Benchmarking and Performance ophmizahon. 2 invited talks+ 1 from Academia Sinica Experience and evaluahon with emerging hardware for big data 1 from Japan, 3 from US
9 Two invited talks 13:20-14:05 Invited Talk, BigDataBench: Benchmarking big data systems, Yingjie Shi, Chinese Academy of Sciences 14:05-14:50 Invited Talk, Facebook: Using Emerging Hardware to Build Infrastructure at Scale, Bill Jia, PhD. Manager, Performance and Capacity Engineering, Facebook
10 Related work with Big Data BigDataBench Benchmarking Uses case on BigDataBench
11 What is BigDataBench? A Big Data Benchmark Suite, ICT, Chinese Academy of Sciences PresentaHon is available soon from hbp://prof.ict.ac.cn/bpoe2013/program.php EvaluaHng big data (hardware) systems and architectures Opensource project hbp://prof.ict.ac.cn/bigdatabench
12 Summary of BigDataBench
13 BigDataBench Methodology Representative Real Data Sets Data Types Structured data Semi-structured data Unstructured data Data Sources Text data Graph data Table data Extended Big Data Sets Preserving 4V Investigate Typical Application Domains Synthetic data generation tool preserving data characteristics BigDataBench Diverse and Important Workloads Application Types Offline analytics Realtime analytics Online services Basic & Important Operations and Algorithms Extended Represent Software Stack Extended Big Data Workloads
14 RepresentaHve Datasets Application Domain Data Type Data Source Dataset Search Engine E-commence unstructured data Text data Wikipedia Entries Semi-structured data unstructured data Graph data Table data Text data Google Web Graph Profsearch Person Resume Amazon Movie Reviews structured data Table data ABC Transaction Data Social Network unstructured data Graph data Facebook Social Graph
15 Chosen Workloads Micro Benchmarks Basic Datastore Operations Application Scenarios Relational Queries Search engine Social network Ecommerce system
16 Chosen Workloads Micro Benchmarks Application Domain Data Type Data Source Operations & Algorithms Software Stack Benchmark ID sort Hadoop Spark MPI 1-3 Micro Benchmarks un-structured text grep Hadoop Spark MPI 2-3 wordcount Hadoop 3-1 Spark 3-2 MPI 3-3 graph BFS MPI 4
17 Chosen Workloads Basic Datastore Application Domain Basic Datastore Operations Data Type semistructured Operations Data Source table Operations & Algorithms Read Write Scan Software Stack HBase Cassandra MongoDB MySQL HBase Cassandra MongoDB MySQL HBase Cassandra MongoDB MySQL Benchmark ID
18 Chosen Workloads Basic Relational Query Application Domain Data Type Data Source Operations & Algorithms Software Stack Benchmark ID Basic Relational Query structured Table Select query Aggregation query Join query Hive Impala Hive Impala Hive Impala 10-2
19 Chosen Workloads Service Application Domain Operations & Algorithms Data Type Data Source Software Stack Benchmark ID Search Engine Nutch server Pagerank Structured Un-structured Table Graph Hadoop Hadoop Index Un-structured text Hadoop 13 Social Network Olio server Kmeans Structured Un-structured Table Graph MySQL Hadoop Spark Connetcted components Un-structured graph Hadoop 16 E-commerce Rubis server Collaborative filtering Structured Un-structured Table text MySQL Hadoop Naïve bayas Un-structured text Spark 19
20 BigDataBench Related with Big Data Benchmarking Uses case on BigDataBench
21 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
22 New Solutions of Big Data Systems 22/
23 A Tradeoff? Energy consumption 23/ Performance 23
24 " What is the performance " Evaluating three of different big data respective systems big under data types of applications? systems " Comparing two of them " What is the performance from performance of and different big data energy systems cost under different data volumes? " Analyzing the running " What is the energy features consumption of different big of different big data data system, systems? and the underlying reasons
25 Experiment Planorms Xeon Mainstream processor Atom Low power processor Tilera Many core processor Basic Comparison Configurations Hadoop Cluster Xeon VS Atom Xeon VS Tilera InformaXon CPU Type Intel OoO Xeon E5310 Intel ConnecXon Atom D510 Buffer Tilera TilePro36 Master/Slaves FPU 1/7 1/7 and 1/1TDP ExecuXon Mode Sharing CPU Core 4 cores All 1.6GHz comparison 2 cores 1.66GHz are 36 based cores 500MHz Xeon E5310 Yes Having Yes the Comprison BigDataBench same BUS Having No the same 80W core hardware thread number number L1 I/D Cache 32KB 24KB 16KB/8KB Atom D510 No Yes BUS No 13W TilePro36 L2 Hadoop Cache seqngyes 4096KB NoFollowing IMESH 512KB the Hadoop official Yes website 64KB16W
26 ImplicaHons from the Results Xeon vs. Atom Xeon is more powerful than Atom Atom is energy conservation than Xeon when dealing with some easy application Atom doesn t show energy advantage when dealing with complex application Xeon vs. Tilera Xeon is more powerful than Tilera Tilera is more energy conservation than Xeon when dealing with some easy application Tilera don t show energy advantage when dealing with complex application Tilera is more suitable to process I/O intensive application
27 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
28 Greening Data Center IDC says: Digital Universe will be 35 Zettabytes by 2020 Nature says: Distilling the meaning from big data has never been in such urgent demand. The data centers consumed about 1.3% electricity of all the electricity use The energy bill is the largest single item in the total cost of ownership of a Data Center
29 Power Usage EffecHveness If you can not measure it, you can not improve it. Lord Kelvin PUE(Power usage effec,veness): a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the informahon technology equipment.
30 AxPUE PUE
31 ApPUE ApPUE (Application Performance Power Usage Effectiveness): a metric that measures the power usage effectiveness of IT equipments, specifically, how much of the power entering IT equipments is used to improve the application performance. Computation Formulas: Data processing performance of applications ApPUE = Application Performance IT Equipment Power The average rate of IT Equipment Energy consumed
32 AoPUE AoPUE (Application ): a metric that measures the power usage effectiveness of the overall data center system, specifically, how much of the total facility power is used to improve the application performance. Computation Formulas: AoPUE = Application Performance Total Facility Power AoPUE = ApPUE PUE The average rate of Total Facility Energy Used
33 The Roles of BigDataBench ConduHng the experiments based on BigDataBench to demonstrate the rahonality of the newly proposed AxPUE from two aspects: AdopHng the comprehensive workloads of BigDataBench to design the applicahon category sensihve experiment. AdopHng Sort of BigDataBench to design the algorithm complexity- sensihve experiment.
34 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
35 MoHvaHon & ContribuHons MoXvaXon The properties of big data bring challenges for big data management. The performance diagnosis is of great importance to provide healthy big data systems. ContribuXons Propose a new performance anomaly detection method based on ARIMA model for big data applications. Introduce a signature-based approach employing MIC invariants to correlate a specific kind of performance problem. Propose an ensemble approach to diagnose the real causes of performance problems in big data platform.
36 The Roles of BigDataBench ConducHng the experiments based on BigDataBench to evaluate the efficiency and precision of proposed performance anomaly detechon method. Using the data generahon tool of BigDataBench to generate experiment data. Chosen workloads: Sort, Wordcount, Grep and Naïve Bayesian.
37 Case Studies based on BigDataBech The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC- based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu., C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
38 Main Ideas Characterize 16 various typical workloads from BigDataBench and HiBench by micro- architecture level metrics. Analyze the similarity in these various workloads by stahshcal techniques such as PCA and clustering. Release two typical workloads related to trajectory data process in real- world applicahon domain.
39
40 Contact informahon Jianfeng Zhan hbp://prof.ict.ac.cn/jfzhan BPOE: hbp://prof.ict.ac.cn/bpoe2013 BigDataBench: hbp://prof.ict.ac.cn/bigdatabench
BigDataBench: a Big Data Benchmark Suite from Internet Services
BigDataBench: a Big Data Benchmark Suite from Internet Services Lei Wang 1,7, Jianfeng Zhan 1, Chunjie Luo 1, Yuqing Zhu 1, Qiang Yang 1, Yongqiang He 2, Wanling Gao 1, Zhen Jia 1, Yingjie Shi 1, Shujie
CloudRank-D:A Benchmark Suite for Private Cloud Systems
CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction
BigDataBench. Khushbu Agarwal
BigDataBench Khushbu Agarwal Last Updated: May 23, 2014 CONTENTS Contents 1 What is BigDataBench? [1] 1 1.1 SUMMARY.................................. 1 1.2 METHODOLOGY.............................. 1 2
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
On Big Data Benchmarking
On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University [email protected], [email protected] Abstract Big data systems address
On Big Data Benchmarking
On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University [email protected], [email protected] Abstract Big data systems address
Evaluating Task Scheduling in Hadoop-based Cloud Systems
2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy
Benchmarking and Ranking Big Data Systems
Benchmarking and Ranking Big Data Systems Xinhui Tian ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences INSTITUTE OF COMPUTING TECHNOLOGY Outline n BigDataBench n BigDataBench
Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:
Introduction BIG DATA is a term that s been buzzing around a lot lately, and its use is a trend that s been increasing at a steady pace over the past few years. It s quite likely you ve also encountered
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
HiBench Installation. Sunil Raiyani, Jayam Modi
HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation
Unlocking the Intelligence in Big Data Ron Kasabian General Manager Big Data Solutions Intel Corporation Volume & Type of Data What s Driving Big Data? 10X Data growth by 2016 90% unstructured 1 Lower
LeiWang, Jianfeng Zhan, ZhenJia, RuiHan
2015-6 CHARACTERIZATION AND ARCHITECTURAL IMPLICATIONS OF BIG DATA WORKLOADS arxiv:1506.07943v1 [cs.dc] 26 Jun 2015 LeiWang, Jianfeng Zhan, ZhenJia, RuiHan Institute Of Computing Technology Chinese Academy
Hadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Open Source for Cloud Infrastructure
Open Source for Cloud Infrastructure June 29, 2012 Jackson He General Manager, Intel APAC R&D Ltd. Cloud is Here and Expanding More users, more devices, more data & traffic, expanding usages >3B 15B Connected
Architecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Cyrus Shahabi, Ph.D. Professor of Computer Science & Electrical Engineering Director, Integrated Media Systems Center (IMSC)
Automating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
Rackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. Aayush Agrawal
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking Aayush Agrawal Last Updated: May 21, 2014 text CONTENTS Contents 1 Philosophy : 1 2 Requirements : 1 3 Observations : 2 3.1 Text Generator
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
Big Data Performance Growth on the Rise
Impact of Big Data growth On Transparent Computing Michael A. Greene Intel Vice President, Software and Services Group, General Manager, System Technologies and Optimization 1 Transparent Computing (TC)
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide
OPTIMIZATION AND TUNING GUIDE Intel Distribution for Apache Hadoop* Software Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide Configuring and managing your Hadoop* environment
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
Big Data Simulator version
Big Data Simulator version User Manual Website: http://prof.ict.ac.cn/bigdatabench/simulatorversion/ Content 1 Motivation... 3 2 Methodology... 3 3 Architecture subset... 3 3.1 Microarchitectural Metric
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
I/O Characterization of Big Data Workloads in Data Centers
I/O Characterization of Big Data Workloads in Data Centers Fengfeng Pan 1 2 Yinliang Yue 1 Jin Xiong 1 Daxiang Hao 1 1 Research Center of Advanced Computer Syste, Institute of Computing Technology, Chinese
Big Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
Optimization of Distributed Crawler under Hadoop
MATEC Web of Conferences 22, 0202 9 ( 2015) DOI: 10.1051/ matecconf/ 2015220202 9 C Owned by the authors, published by EDP Sciences, 2015 Optimization of Distributed Crawler under Hadoop Xiaochen Zhang*
A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML
www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
Introducing EEMBC Cloud and Big Data Server Benchmarks
Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi
International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Welcome to the 6 th Workshop on Big Data Benchmarking
Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK Please note! This workshop
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
HYPER-CONVERGED INFRASTRUCTURE STRATEGIES
1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure
Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure The Intel Distribution for Apache Hadoop* software running on 808 VMs using VMware vsphere Big Data Extensions and Dell
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
Maximizing Hadoop Performance with Hardware Compression
Maximizing Hadoop Performance with Hardware Compression Robert Reiner Director of Marketing Compression and Security Exar Corporation November 2012 1 What is Big? sets whose size is beyond the ability
Characterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design
Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database
Performance Advantages for Oracle Database At a Glance This Technical Brief illustrates that even for smaller online transaction processing (OLTP) databases, the Sun 8Gb/s Fibre Channel Host Bus Adapter
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Computing Issues for Big Data Theory, Systems, and Applications
Computing Issues for Big Data Theory, Systems, and Applications Beihang University Chunming Hu ([email protected]) Big Data Summit, with CyberC 2013 October 10, 2013. Beijing, China. Bio of Myself Chunming
Very Large Enterprise Network, Deployment, 25000+ Users
Very Large Enterprise Network, Deployment, 25000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Memory System Characterization of Big Data Workloads
2013 IEEE International Conference on Big Data Memory System Characterization of Big Data Workloads Martin Dimitrov*, Karthik Kumar*, Patrick Lu**, Vish Viswanathan*, Thomas Willhalm* *Software and Services
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
Building Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Building an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
